## Abstract

Control parameter studies assist practitioners to select optimization algorithm parameter values that are appropriate for the problem at hand. Parameter values are well suited to a problem if they result in a search that is effective given that problem’s objective function(s), constraints, and termination criteria. Given these considerations a many-objective tuning algorithm named MOTA is presented. MOTA is specialized for tuning a stochastic optimization algorithm according to multiple performance measures, each over a range of objective function evaluation budgets. MOTA’s specialization consists of four aspects: (1) a tuning problem formulation that consists of both a speed objective and a speed decision variable; (2) a control parameter tuple assessment procedure that utilizes information from a single assessment run’s history to gauge that tuple’s performance at multiple evaluation budgets; (3) a preemptively terminating resampling strategy for handling the noise present when tuning stochastic algorithms; and (4) the use of bi-objective decomposition to assist in many-objective optimization. MOTA combines these aspects together with differential evolution operators to search for effective control parameter values. Numerical experiments consisting of tuning NSGA-II and MOEA/D demonstrate that MOTA is effective at many-objective tuning.

## 1 Introduction

The performance of an optimization algorithm differs depending on the control parameter values (CPVs) specified by the practitioner. For this reason control parameter studies are often conducted on well-understood testing problems, to better understand the effects of different CPVs on an algorithm’s search behavior. Two important considerations when performing control parameter studies are sensitivity to characteristics of the optimization problem, and sensitivity to the termination criteria used. The characteristics of the optimization problem, such as the objective function(s), the constraints imposed, and the initialization conditions, need to be taken into consideration when selecting CPVs, since different CPV tuples are better suited to certain characteristics than others. In particular, the search mechanics of an algorithm, which are controlled by the CPVs, can be beneficial or detrimental depending on the problem at hand (Wolpert and Macready, 1997). Similarly, CPV tuples well suited to certain termination criteria perform poorly for others; take, for example, the sensitivity of CPV performance to objective function evaluation (OFE) budgets (Dymond et al., 2013).

To aid control parameter studies, a new evolutionary algorithm named MOTA (many-objective tuning algorithm) is proposed. MOTA aims to efficiently tune an optimization algorithm according to multiple performance measures over a range of OFE budgets. Even though no such algorithm has been proposed before, tuning an optimization algorithm to multiple performance measures for multiple OFE budgets could be achieved by using existing tuning algorithms. Specifically, existing tuning algorithms can be used to solve multiple subproblems, where each subproblem is focused on a different performance measure preference articulation. However, segregating a multiobjective problem in this manner is wasteful, since no information is shared between the subproblems. Consider two subproblems, each focused on tuning an algorithm to the same problem but at different OFE budgets. Or consider the case where common CPV trends exist between these subproblems, such as a larger optimal population size as the OFE budget increases. For these scenarios, information flow between subproblems should be beneficial to the tuning process. MOTA overcomes these segregation limitations through the use of multiobjective optimization.

The design of MOTA is motivated by the in-depth control parameter studies a many-objective tuning algorithm would allow (Deb and Srinivasan, 2006). Consider studies investigating robust or generalist CPV tuples that perform well over numerous problems (Smit and Eiben, 2010b; Eiben and Smit, 2011). Multiobjective tuning can efficiently search for these robust CPV tuples by solving a tuning problem with an objective corresponding to each problem that the robust CPVs are required to perform well on. After tuning is completed, the generalist CPV tuples can be found by examining the CPVs tuples found during the multiobjective optimization, each of which is optimal for a different trade-off compromise among the tuning objectives. Furthermore, common practice when assessing a multiobjective algorithm’s performance is to make use of a series of unary performance indicators (Zitzler et al., 2003), each of which measures a different aspect of the solution quality. As such, tuning according to multiple performance indicators would allow multiobjective algorithms to be tuned more holistically compared to tuning them according to one performance metric only. Moreover, even if an optimization algorithm is to be tuned to multiple problems separately to determine CPVs well suited to individual problems only, tuning an algorithm to all problems congruently may result in a higher efficiency compared to handling each problem in isolation, since common CPV trends may be present.

The outline of this article is as follows. Related work and MOTA’s contribution are discussed in Section 2. The MOTA algorithm is described in Section 3. Thereafter, Section 4 presents the numerical setup used to assess MOTA’s performance, and Section 5 gives the results from those numerical experiments.

## 2 Related Work

The proposed tuning algorithm is related to the fields of control parameter tuning and many-objective optimization.

### 2.1 Control Parameter Tuning

Control parameter tuning entails adjusting an algorithm’s CPVs in order to improve performance (Smit and Eiben, 2009). Accordingly, the control parameters adjusted could refer to real-valued control parameters such as crossover probability, or option-based control parameters such as the crossover method used, or both. The MOTA algorithm is designed for real-valued control parameter tuning. Tuning is done according to a utility metric, which measures the performance of the algorithm being tuned as a function of the CPV tuple being assessed. Traditional examples of utility measures are the objective function value achieved on a given problem for a specified OFE budget, and the number of OFEs required to reach a specified solution accuracy on a given problem. Tuning an algorithm entails solving an optimization problem, where the decision variables are the CPVs, and the objective function consists of the utility metric.

Control parameter tuning differs from parameter control (Eiben et al., 1999). For parameter control an algorithm’s CPVs are varied throughout the optimization run according to a predefined strategy, whereas control parameter tuning aims to determine CPVs that remain constant throughout the course of an optimization run. Adaptive algorithms are built on the principle of parameter control, with adaptive algorithms adjusting their CPVs throughout the course of an optimization run in order to tune themselves to the problem being optimized. Superficially adaptive algorithms therefore eliminate the need for control parameter tuning, since CPVs are tuned online by using feedback from the optimization process itself. However, in reality, the practitioner’s task simply changes from selecting approximate CPVs to selecting appropriate parameter control strategies for the problem at hand (Pedersen, 2010). Moreover, since parameter control strategies can be expressed parametrically, the task of selecting appropriate control strategies can be expressed as a control parameter tuning problem itself.

Numerous applications of control parameter tuning have been conducted. Initially, Grefenstette (1986) used a genetic algorithm to tune another genetic algorithm to five testing problems. François and Lavergne (2001) proposed the use of statistical methods to model the relation between an algorithm’s CPVs and the resulting performance, with the aim of helping practitioners select CPVs for genetic algorithms. Bartz-Beielstein et al. (2005) presented the sequential parameter optimization (SPO) tuning framework, which has been used to tune numerous algorithms such as particle swarm optimization (PSO) algorithms (Eberhart and Kennedy, 1995). Similarly, Nannen and Eiben (2007) proposed the relevance estimation and value calibration (REVAC) tuning algorithm, and demonstrated its effectiveness by tuning the mutation and crossover rates of genetic algorithms. Hutter et al. (2009) presented the ParamILS framework and applied it to tune the CPLEX mixed-integer programming solver. Smit and Eiben (2010a) improved the performance of the winning algorithm from the CEC 2005 competition (Suganthan et al., 2005) using control parameter tuning. There have also been applications of control parameter tuning to investigate the speed versus accuracy trade-off present in many evolutionary algorithms using multiobjective optimization (Dréo, 2008, 2009; Ugolotti and Cagnoni, 2014).

Substantial work has gone into tuning stochastic algorithms. When tuning stochastic algorithms, the utility resulting from a CPV tuple forms a probabilistic distribution. Consequently, stochastic algorithms are typically tuned to the mean of the utility distribution. However, since analytical expressions for the utility distribution mean are normally unavailable or too difficult to calculate, numerical techniques are normally used to determine which CPV tuple results in the best mean utility. The resampling strategy (Beyer, 2000) is the easiest of these numerical techniques to implement, and entails running the algorithm being tuned multiple times for each CPV tuple assessed, to approximate the mean of the CPV tuple’s utility distribution. Although effective, the resampling strategy is prohibitively expensive, since the computational cost of assessing each CPV tuple is multiplied by the resampling sample size, a sample size that needs to be large to accurately approximate the mean of a CPV tuple’s utility distribution. Pedersen (2010) showed that the computational cost of resampling can be drastically reduced through the use of preemptive termination, whereby the sampling gathering process for a CPV tuple is interrupted if it is already clear that the CPV tuple being assessed is worse than CPV tuples previously evaluated. The F-race algorithm proposed by Balaprakash et al. (2007) makes use of such a preemptively terminating resampling technique. Starting with a large number of candidate CPV tuples, F-race generates one additional sample run for each CPV tuple still in the race. After each iteration of sample generation is completed, a Friedman statistical test is conducted on the sampled utility values. CPV tuples found to be inferior according to the Friedman test given a specified significance level, are then eliminated from the race. This process of generating additional samples and elimination continues until only one CPV tuple remains in the race. Since many CPV tuples are eliminated early in the race, a large reduction of computational expense is achieved.

Control parameter tuning is not intended to be applied directly to application problems, which typically consist of objective or constraint functions that are computationally expensive to evaluate. Rather, control parameter tuning is intended to be applied to computationally cheap testing problems to perform control parameter studies. The extent to which the results of the control parameter studies can be effectively applied to an unseen application problem depends upon various factors. Among these factors are similarity of the tuning problem’s objective and constraints to those of the application problem as well as the similarity in termination criteria used. Given these considerations, the applicability of tuning results generated using a single-objective utility is limited. Tuning according to multiple criteria via multiobjective optimization is therefore preferred.

Currently multiobjective tuning algorithms can be split into two groups, namely, tuning to multiple problems each for a single termination criterion, and tuning algorithms designed to tune to a single problem under multiple termination criteria. Smit et al. (2010) proposed the M-FETA algorithm, which is designed for tuning an optimization algorithm to multiple problems, each using one termination criterion. Branke and Elomari (2012) developed the Flexible Budget Method for tuning to a single problem under multiple OFE budgets. For most tuning setups when approximating the performance of a CPV tuple after a specified OFE budget, the algorithm being tuned needs to be run from initialization to that specified OFE budget. The Flexible Budget Method uses this history information and other heuristics to tune under multiple OFE budgets. In particular, each CPV tuple being assessed is run up to the maximum OFE budget of interest. The CPV tuple’s run history is then used to gauge performance at each OFE budget being tuned under. Dymond et al. (2015) proposed the tMOPSO algorithm for tuning a single problem under multiple OFE budgets. tMOPSO tunes an optimization algorithm according to a bi-objective utility measure consisting of the best objective function value found, and the number of OFEs used to generate that objective function value. Solving a tuning problem formulated using this bi-objective utility measure allows tMOPSO to determine multiple CPV tuples, each of which is optimal for a different OFE budget. Similarly to the Flexible Budget Method, tMOPSO uses the history information from the CPV assessment calculations to enhance efficiency. Additionally, tMOPSO uses Mann-Whitney U tests (MWUTs; Conover, 1999) specialized for its bi-objective formulation to efficiently perform preemptively terminating resampling.

Here, we propose an algorithm named MOTA, for tuning an algorithm to multiple performance measures under multiple OFE budgets. Such a tuning problem has a utility measure consisting of at least three objectives. When the utility measure consists of four or more objectives, then MOTA needs to solve a many-objective optimization problem.

### 2.2 Many-Objective Optimization

*f*maps a point in the searched decision vector space to a real value, , where

*n*is the dimensionality of the search space. For the case when the multiobjective optimization problem has constraints, the valid search space is typically expressed through

_{x}*n*inequality functions and

_{g}*n*equality functions. Formally, a constrained real-valued multiobjective minimization problem (Engelbrecht, 2007) is defined as where is the multiobjective function,

_{h}*n*is the number of objectives comprising , are the inequality constraints, and are the equality constraints.

_{f}The set of all nondominated decision vectors for a multiobjective optimization problem is referred to as the Pareto-optimal set (PS), where the PS is often of infinite size. The set of objective function values corresponding to the PS is referred to as the Pareto-optimal front (PF). Two special points in the objective space that are commonly used by multiobjective optimization algorithms are the utopia point and the nadir point , where for minimization problems and . When the optimization problem consists of two or three objectives, multiobjective evolutionary algorithms typically aim to determine a finite evenly spaced set of nondominated decision vectors to accurately approximate the entire PF. However, approximating the entire PF for many-objective optimization is intractable.

Approximating the entire PF for many-objective optimization problems is problematic for two reasons. First, the computational overhead of maintaining the PF approximations (Mostaghim and Teich, 2005) grows linearly as the size of the approximation increases. Consequently, the computational overhead requirements are too high to approximate the entire PF of many-objective optimization, since the size of the set required to represent the entire PF grows exponentially with the number of objectives. Second, even if a huge Pareto-optimal front approximation (PFA) could be maintained efficiently, the limiting factor would be the OFE budget of the multiobjective optimization algorithm. Suppose that an ultimate multiobjective optimization algorithm existed that with every new decision vector, evaluation was able to find a new nondominated decision vector. Even this ultimate algorithm’s PFA would be limited to the size of the OFE budget assigned to it.

Many-objective optimization algorithms therefore do not try to approximate the entire PF but rather make use of other criteria in addition to Pareto dominance to guide the optimization process. A commonly used approach to differentiate between Pareto nondominated decision vectors during the course of an optimization run is to make use of performance indicators (Zitzler and Künzli, 2004). An example is the SMS-EMOA algorithm proposed by Beume et al. (2007), which uses the hypervolume (HV) performance indicator in conjunction with Pareto dominance in order to optimize a multiobjective problem. Alternatively, Di Pierro et al. (2007) proposed a preference-ordering strategy that considers a decision vector’s dominance status according to various subsets of objectives, thereby allowing for further differentiation. Then there are decomposition-based approaches (Zhang and Li, 2007), for which the multiobjective problem is divided into subproblems that are all solved simultaneously. All these many-objective approaches ultimately require some a priori input from the practitioner as to which sections of the PF or preference articulations are more important than others. The indicator-based approaches favor decision vectors aligned with indicators chosen, whereas preference-ordering strategies favor decisions close to the center or the edges of the PF depending on the parameters specified, and decomposition-based approaches focus on areas of the PF specified in the subproblem construction. This a priori input is undesirable, as it breaks from the clean a priori free approach followed when three or fewer objectives are optimized. However, because of the intractability of approximating the entire PF for many-objective optimization problems, some a priori input is required.

Objective reduction approaches can in certain scenarios assist with many-objective optimization. For certain applications it may occur that not all the objectives are in conflict, in which case some objectives can be disregarded without changing the PS. For such scenarios, objective reduction approaches are able to identify and remove redundant objectives to make the optimization problem easier to solve. Brockhoff and Zitzler (2009) covered both the theoretical aspects of objective reduction and present -based objective reduction approaches. Saxena et al. (2013) presented a framework based on principal component analysis and maximum variance unfolding for objective reduction. An aspect of objective reduction approaches that can prove useful even when applied to nonredundant problems is the ability of these approaches to identify objectives that are only slightly in conflict. These slightly conflicting objectives can be disregarded without major changes to the PS. Objective reduction is not incorporated into the proposed MOTA algorithm, as MOTA is designed for tuning problems where all the objectives are in conflict.

## 3 MOTA Algorithm

The many-objective tuning algorithm (MOTA) is specialized for tuning stochastic algorithms to multiple criteria under multiple OFE budgets. MOTA’s specialization consists of four parts:

A tuning problem formulation that has both a speed objective and a speed decision variable (Section 3.1)

A CPV tuple assessment procedure that uses the history information from a CPV tuple assessment run in order to gauge performance simultaneously at multiple OFE budgets (Section 3.2)

A bi-objective decomposition approach to assist in the many-objective optimization (Section 3.3)

A preemptively terminating resampling strategy to handle the noise resulting from tuning stochastic algorithms (Section 3.4)

The discussion of these four aspects of MOTA’s specialization is followed by an algorithm overview in Section 3.5.

### 3.1 Tuning Problem Formulation

The choice of utility indicators for depends on the control parameter study being performed. When tuning a single-objective algorithm to multiple problems, a sensible choice for the utility indicators would be the lowest solution error achieved for each of those problems. Alternatively, when tuning a multiobjective optimization algorithm, a utility indicator for each unary performance indicator of interest could be used. The computational cost of evaluating the utility indicators should be accounted for by tuning practitioners when specifying how extensively the history information from a CPV tuple assessment run should be used.

### 3.2 Utilizing the History Information from a CPV Tuple Assessment Run

*u*’s) values using a sampling run, for an evolutionary algorithm with population size of

_{i}*N*. If the sample run is set up to record the best solution found after each iteration (i.e., every

*N*OFEs), utility values can then be determined for each OFE usage all the way up to the assessed OFE budget, . Expressed symbolically, one CPV tuple sampling run can be used to generate the following: where is the utility indicator value after

*j*OFEs. Another factor to consider when performing a sampling run to assess a CPV tuple’s utility is that utility values at OFE budgets higher than can be determined at a reduced cost compared to calculating them from scratch.

*every*OFE budget less than the maximum OFE budget of interest, . Normally, a tuning practitioner is only interested in a subset of OFE budgets , such as OFE budgets spaced logarithmically between the minimum OFE budget of interest and . Given this consideration, MOTA calculates the following utility indicator values where

*B*is the target OFE budgets selected by the tuning practitioner, and the overshoot budget specifies the maximum OFE budget for which the utility indicator values are to be calculated from the sampling runs. For MOTA, is calculated according to a user-specified function of , for example, . The optimal function for determining is expected to be problem-specific but is also not expected to drastically alter performance because of the noise-handling strategy used by MOTA (see Section 3.4).

By making use of the additional utility indicator values from a CPV tuple assessment sampling run, MOTA breaks from traditional multiobjective optimization where one decision vector evaluation results in one objective vector. Instead, one decision vector evaluation by MOTA results in multiple objective vectors. In particular, each CPV tuple assessment results in a multidimensional line of objective function values, where the OFE budget objective can be considered as the independent variable. Given this one-to-many relation, a 2D decomposition strategy is used by MOTA.

### 3.3 Bi-objective Decomposition

Solving a problem via decomposition entails expressing the original problem as a series of subproblems that when solved give the solution to the original problem. In the context of multiobjective optimization, a variety of approaches exist for decomposing a multiobjective problem into single-objective subproblems (Zhang and Li, 2007). Decomposition, however, need not be limited to breaking a problem down into single-objective subproblems. Zhang and Li (2007) argued that it is beneficial to decompose problems into single-objective subproblems, since a significant amount of work has been done in evolutionary computation on single-objective optimization and therefore decomposition into single-objective subproblems is favorable, since it allows for all this previous work to be utilized. Following this same argument, it is viable to decompose a problem into bi- or tri-objective subproblems because a substantial amount of work and success has been achieved when optimizing problems with only two or three objectives (Deb et al., 2002; Zitzler et al., 2001).

*z*values correspond to a chosen reference point. In order to make the process of selecting easier for tuning practitioners, MOTA by default normalizes the objective values passed to the selected scalarization function. Specifically, the

_{i}*u*values are normalized between the utopia and nadir points of MOTA’s current PFA. When the objective values are normalized, zero values are used for all the

_{i}*z*reference values in Eq. (10).

_{i}*j*’th subproblem, the corresponding bi-objective minimization function that needs to be minimized is defined as where are the normalized utility indicator values.

Another key aspect to decomposition-based approaches is neighborhoods. Neighborhoods are vital, since they are used to share information between subproblems, thereby differentiating decomposition-based approaches from approaches where subproblems are optimized in isolation from each other. Here, neighborhoods are split into two categories:

Candidate generation neighborhoods. When generating a candidate decision vector for a target subproblem, this neighborhood specifies the additional subproblems from which information is used for operations such as crossover and mutation.

Update neighborhoods. After evaluating a decision vector generated for a target subproblem, the resulting objective function values are also used to update the solutions of the subproblems in this neighborhood.

MOTA allows for the use of different neighborhoods for the purposes of generating candidate decision vectors and updating subproblem solutions. Having different neighborhoods for these two operations allows for a flexibility particularly well suited for tuning optimization algorithms. Consider tuning a single-objective optimization algorithm to multiple problems over multiple OFE budgets, with the focus on determining specialist CPVs well suited to each problem on its own. A sensible neighborhood configuration for this tuning problem would be to have a large neighborhood for candidate generation, together with zero-sized update neighborhoods. The large candidate generation neighborhood should be beneficial, since it allows for the CPV candidate generation process to exploit trends observed from other subproblems, while the zero-sized update neighborhoods save computational resources, since only one of *n _{u}* objectives needs to be evaluated.

### 3.4 Handling the Noise Resulting from Tuning Stochastic Algorithms

When tuning optimization algorithms with stochastic elements, standard utility indicator values become probabilistic distributions. A noise-handling strategy aims to approximate a probabilistic distribution when an analytical expression is unavailable, which is typically the case. Noise handling in the context of parameter tuning under multiple OFE budgets is complicated by varying noise distribution characteristics depending on the location in the objective space (Dymond et al., 2015). This variance rules out many specialized noise (Bui et al., 2009) or uncertainty (Xi et al., 2012) handling strategies, which assume a uniform uncertainty distribution throughout the objective space.

Given the varying uncertainty distributions throughout the objective space, MOTA uses a resampling approach (Beyer, 2000). Resampling-based approaches entail tuning algorithms according to the approximated mean of the utility indicator’s probabilistic distribution. Specifically, the resampling technique consists of performing multiple independent runs for the CPV tuple being assessed, and then using the resulting sample to approximate the mean utility indicator value. Based on tMOPSO (Dymond et al., 2015), MOTA uses a preemptively terminating resampling strategy whereby the sampling gathering processes is interrupted if Mann-Whitney U tests (MWUTs; Conover, 1999) indicate that the decision vector being assessed is unlikely to result in an improvement on the current approximation of the PF.

*X*, where each has an associated CPV tuple, assessment OFE budget, and target subproblem. Initially a small number of samples are generated for each , for the OFE budgets . As outlined in Section 3.2, is controlled by the user-specified target OFE budgets

*B*and the overshoot function as Resampling interruption checks are then conducted against ’s subproblem and ’s update neighborhood. Specifically, if

*j*is the subproblem index, and

*T*is the set of indexes of the subproblems in ’s neighborhood together with

_{u}*j*, then the approximation of the

*j*’th subproblem’s bi-objective decomposition is discarded if where denotes “likely to be dominated” given the significance level of , and

*P*is the

_{k}*k*’th subproblem’s PFA.

Depending on the computational cost of calculating the utility indicators compared to performing a CPV sampling run, two different options are available for conducting Pareto nondominance likelihood checks:

Removing the largest OFE budget in until an OFE budget is reached for which is not likely to be dominated by all

*P*for_{k}Checking all OFE budgets in , and eliminating the OFE budgets for which the decompositions are likely to be dominated by

*P*for all_{k}

For the case where a cheap utility indicator value is used, such as the objective solution error achieved by a single-objective algorithm, the option of reducing the maximum OFE is sensible. Alternatively, when an expensive utility indicator such as HV is used, then the option of checking all the OFE budgets is more appropriate.

Resampling interruption checks continue until either is empty or the desired resampling size *n _{s}* is reached. If the desired resampling size

*n*is reached, the approximated utility values are used to update the

_{s}*T*subproblems’ PFAs. MOTA users control the aggressiveness of the resampling interruption through two control parameters, namely, the number of sample increments between resampling interruption checks, , and the interruption significance level used, . The bi-objective nature of the subproblems is exploited in order to efficiently perform PFA dominance and dominance likelihood checks, as in Dymond et al. (2015). A flow chart for the CPV tuple assessment procedure for the

_{u}*check all*noise-handling approach is shown in Figure 2.

### 3.5 Algorithm Overview

*rand/1/bin*and

*best/1/bin*. The DE

*rand/1/bin*candidate decision vector generation process for the

*i*’th member of the population begins with generating a mutant vector as follows: where is the decision vector of the

*k*’th member of the population, the indexes

*r*

_{1},

*r*

_{2}, and

*r*

_{3}are randomly selected, and

*F*is the user-specified scaling factor. The population indexes

*r*

_{1},

*r*

_{2}, and

*r*

_{3}are randomly selected, with each member from the population having an equal likelihood of selection, subject to all the indexes being different and none being equal to

*i*. After mutation, crossover takes place to generate the candidate decision vector for the

*i*’th member of the population as follows: where

*k*is the dimension index, is a function that returns a number randomly between 0 and 1 with a uniform probability density,

*C*is the user-specified crossover rate,

_{r}*k*is the dimension that is forced to crossover, and is the

_{f}*i*’th population member’s current decision vector. Further information about DE and its strategies can be found in Das and Suganthan (2010).

*i*’th subproblem begins by randomly selecting three subproblems indexes

*s*

_{1},

*s*

_{2}, and

*s*

_{3}for the purposes of mutation. The pool from which

*s*

_{1},

*s*

_{2}, and

*s*

_{3}are randomly selected with equal likelihood consists of the indexes of the subproblems in the

*i*’th subproblem’s candidate generation neighborhood, together with

*i*’th subproblem’s index,

*i*. Contrary to the

*rand/1/bin*strategy, the constraints are omitted that

*s*

_{1},

*s*

_{2}, and

*s*

_{3}all be unique and not equal to

*i*. These constraints are omitted because the Pareto set of a subproblem consists of multiple decision vectors that can be used for the generation of a mutant vector. Three decision vectors, , , and , are then selected from the PFAs of the

*s*

_{1},

*s*

_{2}, and

*s*

_{3}subproblems, respectively, according to a target improvement OFE budget, . is selected randomly from the target OFE budgets

*B*, with each element in

*B*having equal likelihood of selection. is selected as the decision vector from the PFA of the

*s*

_{1}subproblem that performs best for an OFE budget of , where is perturbed about as follows: where is the user-specified perturbation factor with ,

*r*is a function that returns a value randomly generated using a Gaussian distribution with standard deviation of and a mean of , is the minimum OFE budget in

_{g}*B*, and is the maximum OFE budget in

*B*. and are selected in the same manner, using the same but different values to exploit any CPV versus OFE budget trends that may be present. Based upon the DE

*best/1/bin*strategy, MOTA’s mutant vector is generated as where the term is added to promote search diversity (Salehinejad et al., 2014). DE crossover is then conducted between the resulting mutant vector and the decision vector from the

*i*’th subproblem, which is optimal for an OFE budget of , as in Eq. (17). Last, the assessment OFE budget of the generated candidate decision vector is set to .

Constraint handling is achieved by regenerating candidate vectors until all constraints are satisfied. Although this approach is not viable for applications consisting of computationally expensive constraints, it is acceptable for tuning applications, since tuning constraints are usually computationally cheap, e.g., . MOTA does not enforce or have a specialized strategy for handling bound constraints, since for many tuning problems sensible CPV bounds are difficult to determine a priori. Additionally, MOTA has an internal constraint that requires that the candidate decision vector be different from the vector it is trying to improve upon. This internal constraint is necessary for the beginning stages of MOTA’s tuning optimization, for which the subproblem’s PFAs consist of a low number of unique decision vectors. If for a given subproblem, candidate generation using the DE operations in Eqs. (19) and (17) fails to satisfy the constraints times in a row during a single iteration, then a randomly generated valid decision vector is used, as outlined in Eq. (15). Once the candidate decision vectors are generated for all the subproblems, these candidate decision vectors are assessed to update the subproblem PFAs, as outlined in Section 3.4.

Generation of the candidate decision vectors continues until all the subproblems become inactive. A subproblem becomes inactive when one of its termination criteria is satisfied, signaling that no more candidate vectors should be generated for that subproblem. If a subproblem is inactive but is in the update neighborhood of an active subproblem, the inactive subproblem’s PFA is still updated when the active subproblem’s candidate vector is assessed. Per subproblem termination or inactivity criteria are appealing, since they allow for greater control compared to making all subproblems inactive at the same time. Take, for instance, allocating differing amounts of computational resources for each subproblem, or adding stagnation termination criteria whereby a subproblem becomes inactive if no substantial improvements has been made recently.

Our implementation of MOTA is available in the optTune Python package,^{1} and the pseudocode is given as Algorithm 1.

## 4 Numerical Setup

Numerical experiments are conducted to gauge the effectiveness of MOTA. Experiments are chosen based upon the potential benefits of many-objective tuning, benefits that motivated MOTA’s development. In the introduction it was proposed that a many-objective tuning algorithm could

be more efficient at tuning an optimization algorithm to each problem of a test suite compared to tuning an algorithm to each problem instance in isolation;

be better suited to determining generalist CPV tuples that perform well over an entire problem test suite; and

would be able to tune MOEAs more holistically by tuning them according to multiple unary performance metrics simultaneously.

MOTA is benchmarked with regard to the first two statements. For brevity, the tuning of multiobjective algorithms to multiple unary performance metrics is left for future work.

### 4.1 Tuning Problems Used

The tuning problems used are built around the commonly used ZDT (Zitzler et al., 2000), DTLZ (Deb et al., 2005), and WFG (Huband et al., 2006) multiobjective test problem suites. These test problem suites are designed to gauge an algorithm’s performance given a range of objective function characteristics. Furthermore, the ZDT problems are scalable in the number of decision variables, while the DTLZ and WFG problems are scalable in both the number of decision variables and the number of objectives. Since the numerical experiments entail tuning algorithms to these problem suites, lower-than-normal dimensional versions of the WFG and DTLZ problems are used, to make computational costs of the experiments more manageable. For the bi-objective ZDT problems, algorithms are tuned to ZDT problems 1, 2, 3, 4, and 6, where the standard setup of decision variables is used for problems 1, 2, and 3, while decision variables are used for problems 4 and 6. The fifth ZDT problem is omitted because it has binary decision variables. Regarding the WFG problems, two position decision variables, ten distance decision variables, and two objectives are used for all the problems. For the DTLZ problems the number of decision variables is kept at the commonly used values of 7, , , , , , and for problems 1 through 7, respectively, while the number of objectives is reduced from the standard three to only two.

*IGD*is the IGD achieved on the

_{WFGi}*i*’th WFG problem given the CPV tuples being assessed using an OFE budget of . Regarding the OFE budgets that algorithms are to be tuned under, OFE budgets spaced logarithmically between to 10,000 are used.

Separate problems are constructed for each algorithm tuned in order to have a tuning problem that entails determining specialist CPVs only, and to have a tuning problem for determining generalist and specialist CPVs together. For each specialist tuning problem, the bi-objective PF sections of interest correspond to each problem in the test suite in isolation together with the OFE budget objective. For example, the WFG specialist tuning problems have nine subproblems with the utility weight vectors of , , , . For the generalist tuning problems, additional preference articulations are added to determine CPV tuples that perform well for all utility objectives, i.e., , and for all leave-one-out combinations, i.e., , , , . All the generalist subproblems make use of weighted sum scalarization in the normalized objective space for their bi-objective decomposition. The leave-one-out preference articulations are added to enable scrutiny of the results of the articulation, to determine if one objective has a disproportionate effect even though scalarization uses a normalized objective space.

### 4.2 Algorithms Tuned

^{2}are used in these experiments. Selected real and integer CPVs are tuned for NSGA-II and MOEA/D, while option-based control parameters are left on their jMetal defaults. For NSGA-II the tuned control parameters are the population size

*N*, the crossover probability

*c*, and the mutation probability

_{p}*m*. The selection, crossover, and mutation operators are fixed to binary tournament selection, simulated binary crossover (Deb and Agrawal, 1994), and polynomial mutation, respectively. The tuning initialization bounds for NSGA-II are , , and . The tuning constraints for NSGA-II are Concerning MOEA/D, the control parameters tuned are the number of subproblems

_{p}*N*, the neighborhood size fraction

_{s}*T*, the DE crossover probability

_{f}*C*, and DE scaling factor

_{r}*F*. The neighborhood size fraction controls the size of MOEA/D subproblem neighborhoods, with the neighborhood size being equal to

*N*multiplied by

_{s}*T*. MOEA/D’s initialization bounds are , , , and . The tuning constraints of MOEA/D’s are For pragmatic reasons both NSGA-II’s

_{f}*N*and MOEA/D’s

*N*are restricted to a maximum of , since the computational overhead of NSGA-II and MOEA/D increases proportionately with

_{s}*N*and

*N*, respectively.

_{s}The computational budget allocated to the tuning problems corresponds to the upper limit of what is deemed to be the standard use-case scenario. This upper limit is chosen as the computational work produced by a high-end desktop or laptop computer left to tune overnight. Specifically, the allowable computational budget is hours of fully utilizing a fourth-generation Intel Core i7-4700MQ processor. For the tuning problems described, the WFG generalist tuning problems are the most computationally expensive. Given this limiting factor, a computational budget is assigned to each subproblem equivalent to assessing 1,000 CPV tuples up to the maximum OFE budget of 10,000 without resampling on a single problem of the relevant test suite, giving a tuning budget () of . Since evaluating a CPV tuple on a generalist subproblem entails assessing the CPV tuple on all problems in the test suite being tuned to, the specified allows for fewer CPV tuples to be evaluated compared to a specialist subproblem.

### 4.3 Tuning Algorithms Compared

*i*’th subproblem, using a uniform random distribution as follows: where is the target improvement OFE budget selected randomly from

*B*, with each OFE budget having the same likelihood of being selected, is the decision vector from the

*i*’th subproblem’s PFA, which performs the best at , and are the tuning problem’s initialization bounds, and is a random function returning a value between 0 and 1 using a uniform distribution. M-FETA (Smit et al., 2010) is not compared against MOTA because the M-FETA algorithm is not designed to tune under multiple OFE budgets.

In order to apply the bi-objective tMOPSO tuning algorithm to the many-objective tuning problems, each tuning problem is broken up into uncoupled bi-objective subproblems where no information is shared. The tMOPSO runs to determine the generalist CPV tuples occur last, since those runs require information on the nadir and utopia points, which are taken from the tMOPSO runs targeted on specialist CPVs. Regarding the generalist tuning problems, MOTA and RAND have an advantage over tMOPSO in that these algorithms share information between the solution approximations of the different subproblems. tMOPSO as a bi-objective tuning algorithm has no mechanics for sharing this many-objective information. MOTA and RAND use the same neighborhood topology for the tuning problems. In particular, the specialist subproblem and the overall generalist subproblem have a zero-sized update neighborhood, while each of the leave-one-out generalist subproblems have an update neighborhood of size 3, consisting of two other leave-one-out generalist subproblems as well as the overall generalist subproblem. Admittedly, the choice of the update neighborhoods for the leave-one-out generalist subproblems is rather arbitrary, being based only on what we deemed to be sensible based upon our experience with the problems. For all cases, MOTA and RAND use a candidate generation neighborhood consisting of all the subproblems.

The same resampling interruption procedure is used by the compared tuning algorithms. This commonality should ensure that any performance difference observed is not influenced by the use of difference noise-handling procedures. For MOTA, tMOPSO, and RAND a total sampling size of is used, with resampling interruption checks occurring after sampling increments of , and () using MWUTs given an interruption significance () of . Since the IGD calculations are of moderate computational cost, resampling interruption checks are conducted for each OFE budget constraint under which a CPV tuple is being assessed. A common constraint-handling approach is also used by the compared tuning algorithms. In particular, the candidate generation process repeats until a valid decision vector is found, subject to a threshold of attempts. After attempts, random values are generated inside the initialization bounds until a valid candidate is generated. Finally, all algorithms use an OFE budget overshoot function of .

*u*.

_{tPSO}*u*is a weighted sum performance aggregation over all the respective subproblems for the tuning problem. For the NSGA-II DTLZ specialist problem which has seven subproblems or preference articulations, tPSO minimizes where is the bi-objective HV of the

_{tPSO}*i*’th bi-objective subproblem’s PF, which has been normalized between a commonly used nadir and utopia point. The same setup is used for tuning to the NSGA-II ZDT specialist problem. To account for stochastic effects, resampling over independent runs is conducted. Based on Dymond et al. (2015), the tPSO settings are a swarm size of and an inertia factor of . The tPSO tuning budget is CPV tuple evaluations, which is equivalent to assessing CPV tuples using a resampling size of . tPSO performs preemptive resampling termination checks after every resampling iteration using MWUTs and an interruption significance level of . Regarding handling tuning constraints, if an invalid CPV tuple is generated, tPSO continues to regenerate that tuple until all constraints are satisfied.

The MOTA CPVs that are tuned by tPSO are the DE scaling factor *F*, the DE crossover rate *C _{r}*, and the OFE perturbation factor . tPSO initialization bounds are , , and . The tPSO tuning constraints for MOTA are , , and . For tMOPSO, the swarm size

*N*, the inertia factor , the personal acceleration constant

*c*, the global acceleration constant

_{p}*c*, and tMOPSO’s OFE perturbation factor are tuned by tPSO. The initialization bounds for tMOPSO CPVs are , , , , and . After initialization tPSO’s search constraints are , , , , and .

_{g}After the tPSO tuning is completed and effective CPVs for MOTA and tMOPSO are determined, MOTA, tMOPSO, and RAND are applied to the ZDT, DTLZ, and WFG tuning problems. To account for stochastic effects, comparison is conducted using a sample of independent runs per tuning problem.

## 5 Numerical Results

The results from the numerical experiments are presented in three parts. First the results are given from the tPSO tuning to determine effective CPVs for MOTA and tMOPSO. Thereafter, the numerical results from the specialist tuning problems are presented, followed by those of the generalist tuning problems. For brevity, extensive results are not included in this paper but are made available as supplementary material. Included in the supplementary material are the utopia and nadir points used in the result analysis.

When analyzing the results from the tuning problems, the objective function values of respective PFAs are normalized using a common objective normalization function. Use of a common objective normalization function is required, since some variance is expected in the utopia and nadir points approximations made by MOTA, tMOPSO, and RAND. Therefore the objective value results are renormalized between a common nadir and utopia point, for purposes of fair comparison. The comparison utopia and nadir points were calculated after MOTA, tMOPSO, and RAND were applied to the specialist tuning problems, by combining the tuning results. Specifically, a PFA was constructed for each tuning problem by combining all the results for that problem. Thereafter, the utopia and nadir points used to compare the tuning algorithms were taken from these constructed PFAs. This approach could not, however, be followed for the tPSO tuning of MOTA and tMOPSO, since tPSO needs to compare performances during the course of the tuning run, and as such cannot postpone the calculation of the normalization utopia and nadir points. Therefore tMOPSO was applied to the relevant specialist tuning problem to determine normalization utopia and nadir points for use in the tPSO tuning runs. In particular, the results from 10 independent runs of tMOPSO using the CPV settings from Dymond et al. (2015) were combined into one PFA to determine the tPSO normalization utopia and nadir points.

### 5.1 Selecting the CPVs for the Compared Tuning Algorithms

Three independent runs were conducted for each tPSO tuning of MOTA and tMOPSO, respectively, to check consistency among the tPSO results. Furthermore, the CPV recommendations from the three runs for tuning to the NSGA-II ZDT specialist problem are compared to those of the three runs for tuning to the NSGA-II DTLZ specialist problem. The aim of these consistency checks is to ensure that tPSO does not return an outlier CPV tuple that is unlikely to be reproduced by an independent third party conducting the same experiments.

Table 1 shows the CPV tuples that tPSO found to be effective for MOTA on the NSGA-II ZDT and DTLZ specialist problems. An acceptable level of consistency is observed among the tPSO CPV recommendations for MOTA, with similar values for *F*, *C _{r}*, and being returned. Regarding tMOPSO, Table 2 shows tPSO’s CPV recommendations. An acceptable level of consistency in terms of exploitation versus exploration is observed again, if the combined effects of , (Clerc and Kennedy, 2002) and the tMOPSO OFE perturbations factor are considered. For example, the tPSO recommendation that has a far higher than the other tPSO recommendations is accompanied by a lower value relative to the other tPSO runs. As expected, different CPV recommendations were made for the NSGA-II ZDT specialist problem compared to that of the NSGA-II DTLZ specialist problem.

Specialist Problem . | Run . | F . | Cr . | . |
---|---|---|---|---|

NSGA-II ZDT | best | 2.40 | 0.55 | 0.37 |

second best | 2.55 | 0.83 | 0.44 | |

worst | 3.02 | 0.80 | 0.40 | |

NSGA-II DTLZ | best | 1.94 | 0.73 | 0.17 |

second best | 1.99 | 0.85 | 0.24 | |

worst | 1.54 | 0.79 | 0.18 |

Specialist Problem . | Run . | F . | Cr . | . |
---|---|---|---|---|

NSGA-II ZDT | best | 2.40 | 0.55 | 0.37 |

second best | 2.55 | 0.83 | 0.44 | |

worst | 3.02 | 0.80 | 0.40 | |

NSGA-II DTLZ | best | 1.94 | 0.73 | 0.17 |

second best | 1.99 | 0.85 | 0.24 | |

worst | 1.54 | 0.79 | 0.18 |

Specialist Problem . | Run . | N
. | . | c
. _{p} | c
. _{g} | . |
---|---|---|---|---|---|---|

NSGA-II ZDT | best | 4 | 0.44 | 0.73 | 2.76 | 0.02 |

second best | 7 | 0.31 | 1.62 | 2.10 | 0.23 | |

worst | 6 | 0.30 | 1.82 | 1.83 | 0.29 | |

NSGA-II DTLZ | best | 5 | 0.28 | 1.35 | 2.34 | 0.02 |

second best | 7 | 0.73 | 0.89 | 1.59 | 0.15 | |

worst | 6 | 0.26 | 3.10 | 1.31 | 0.30 |

Specialist Problem . | Run . | N
. | . | c
. _{p} | c
. _{g} | . |
---|---|---|---|---|---|---|

NSGA-II ZDT | best | 4 | 0.44 | 0.73 | 2.76 | 0.02 |

second best | 7 | 0.31 | 1.62 | 2.10 | 0.23 | |

worst | 6 | 0.30 | 1.82 | 1.83 | 0.29 | |

NSGA-II DTLZ | best | 5 | 0.28 | 1.35 | 2.34 | 0.02 |

second best | 7 | 0.73 | 0.89 | 1.59 | 0.15 | |

worst | 6 | 0.26 | 3.10 | 1.31 | 0.30 |

Given that no outliers were observed, the CPVs used in the remainder of the experiments for MOTA and tMOPSO are taken from the best tPSO run for tuning to the NSGA-II DTLZ specialist problem. The tPSO results from the NSGA-II DTLZ specialist problem are chosen in place of the results from NSGA-II ZDT specialist problem, since the DTLZ specialist problem is considered more representative in terms of number of tuning objectives. Specifically, the DTLZ problems have eight tuning objectives if the speed objective is included, while the ZDT problems have six tuning objectives, and the WFG problems have ten tuning objectives.

### 5.2 Specialist Tuning Results

MOTA, tMOPSO, and RAND were applied to the specialist and generalist tuning problems in order to generate independent runs for each problem. Analysis of the results from these tuning problems is conducted according to the performance metric, where measures the HV achieved for a given bi-objective decomposition in the normalized objective space as outlined in Eq. (30).

Comparison according to performance on the NSGA-II specialist tuning problems is conducted quantitatively according to MWUTs, as summarized in Table 3. Two sample means are considered statistically similar if a MWUT indicates that the difference of the sample means is not statistically significant given a confidence level of 95%. On the NSGA-II ZDT specialist problem, tMOPSO outperforms MOTA on subproblems and is outperformed by MOTA on the subproblems, and performs statistically similar on the remaining subproblems. For the NSGA-II DTLZ specialist problem, tMOPSO beats MOTA on subproblems, is outperformed by MOTA on one subproblem, and is statistically similar on the other subproblems. Regarding the NSGA-II WFG specialist problem, tMOPSO outperforms MOTA on subproblems, is outperformed by MOTA on subproblems, and is statistically similar to MOTA on the remaining subproblems. RAND is outperformed by tMOPSO and MOTA on all the subproblems for the NSGA-II specialist tuning problems, with the exception of the DTLZ subproblem, where the difference in sample means is not statistically significant.

. | . | . | ||
---|---|---|---|---|

Suite . | Problem . | tMOPSO . | MOTA . | RAND . |

ZDT | 1 | |||

2 | ||||

3 | ||||

4 | ||||

6 | ||||

DTLZ | 1 | |||

2 | ||||

3 | ||||

4 | ||||

5 | ||||

6 | ||||

7 | ||||

WFG | 1 | |||

2 | ||||

3 | ||||

4 | ||||

5 | ||||

6 | ||||

7 | ||||

8 | ||||

9 |

. | . | . | ||
---|---|---|---|---|

Suite . | Problem . | tMOPSO . | MOTA . | RAND . |

ZDT | 1 | |||

2 | ||||

3 | ||||

4 | ||||

6 | ||||

DTLZ | 1 | |||

2 | ||||

3 | ||||

4 | ||||

5 | ||||

6 | ||||

7 | ||||

WFG | 1 | |||

2 | ||||

3 | ||||

4 | ||||

5 | ||||

6 | ||||

7 | ||||

8 | ||||

9 |

Notes: Friedman test: 24.100, *p*-value . Boldface entries indicate the best value in each row. Italic entries indicate samples whose difference in mean relative to the sample with the best mean is not statistically significant according to Mann-Whitney U-test with a 95% confidence.

For the MOEA/D specialist tuning problems, comparison is conducted in the same manner, with Table 4 summarizing the MWUT comparisons. On the MOEA/D ZDT specialist problem, MOTA and tMOPSO produce similar performances on all of the five subproblems. For the MOEA/D DTLZ specialist problem, tMOPSO outperforms MOTA on subproblems, while MOTA performs better on the remaining subproblems. Regarding the MOEA/D WFG specialist problem, tMOPSO outperforms MOTA on subproblems, while for the other subproblems MOTA and tMOPSO offer statistically similar performances. RAND is outperformed by MOTA and tMOPSO on all the subproblems for all the specialist tuning problems, with the exception of the subproblem focused on determining CPVs for WFG, for which statistically similar performance was recorded.

. | . | . | ||
---|---|---|---|---|

Suite . | Problem . | tMOPSO . | MOTA . | RAND . |

ZDT | 1 | |||

2 | ||||

3 | ||||

4 | ||||

6 | ||||

DTLZ | 1 | |||

2 | ||||

3 | ||||

4 | ||||

5 | ||||

6 | ||||

7 | ||||

WFG | 1 | |||

2 | ||||

3 | ||||

4 | ||||

5 | ||||

6 | ||||

7 | ||||

8 | ||||

9 |

. | . | . | ||
---|---|---|---|---|

Suite . | Problem . | tMOPSO . | MOTA . | RAND . |

ZDT | 1 | |||

2 | ||||

3 | ||||

4 | ||||

6 | ||||

DTLZ | 1 | |||

2 | ||||

3 | ||||

4 | ||||

5 | ||||

6 | ||||

7 | ||||

WFG | 1 | |||

2 | ||||

3 | ||||

4 | ||||

5 | ||||

6 | ||||

7 | ||||

8 | ||||

9 |

Notes: Friedman test: 25.200, *p*-value . Boldface entries indicate the best value in each row. Italic entries indicate samples whose difference in mean relative to the sample with the best mean is not statistically significant according to Mann-Whitney U-test with a 95% confidence.

The MWUT comparisons are supported through visual inspection of box plot comparisons, through visual inspection of the PFA determined by the compared tuning algorithms, and through Friedman tests. The box plot comparisons show that there is a clear difference between the two samples, where a MWUT has shown the means to be different given a 95% significance level. Comparing the bi-objective PFA of the subproblems shows that the performance metric is adequate for comparing tuning performance in these experiments. The box plot comparisons and PFAs found by the tuning algorithms are available in the supplementary material of this paper. Friedman test *p*-values, which are also available in Tables 3 and 4, show that the null-hypothesis of there being no difference between tMOPSO, MOTA, and RAND can be safely rejected.

Taking the specialist results into account as a whole, it is concluded that MOTA and tMOPSO offer similar performance, while both algorithms outperform the base-line RAND. It was postulated in the introduction that MOTA’s ability to share information among subproblems may be of benefit even for the case when an algorithm is to be tuned under multiple OFE budgets to each instance of a problem suite on an individual basis only. In particular, information-sharing could aid tuning by exploiting common trends among the tuning solutions to these problem instances. For these experiments, MOTA’s information-sharing strategy via candidate generation neighborhoods and DE operators is not able to outperform tMOPSO, even though CPV trends are present, as shown in the supplementary material. Whether these results are due to MOTA search mechanics or reflect the validity of the idea of sharing information to aid in determining specialist CPVs over multiple OFE budgets as a whole, is left as a question for future research.

### 5.3 Generalist Tuning Problems

On the generalist tuning problems, MOTA’s CPV tuple assessment procedure, which utilizes update neighborhoods, had a significant effect. On the NSGA-II generalist problems MOTA outperformed tMOPSO on all the ZDT, DTLZ, and WFG subproblems, as shown in Table 5. RAND is more competitive against MOTA on the NSGA-II generalist problems compared to the specialist problems. Specifically, on subproblems the difference in sample means between RAND and MOTA is not statistically significant according to MWUTs given a confidence level. MOTA outperformed RAND on the remaining NSGA-II generalist subproblems. Regarding the MOEA/D generalist problems MOTA outperformed tMOPSO on all but subproblems, as shown in Table 6. RAND outperformed MOTA on subproblems, while performing worse than MOTA on the remaining subproblems. Additional result verification similar to that done for the specialist tuning problems shows that MOTA’s superior performance according to the performance metric is supported by box plot comparisons, plots of the PFAs found by the tuning algorithms, and Friedman tests. The box plot comparisons and PFAs found for the generalist tuning problems are available in the supplementary material for this paper.