## Abstract

In this article, we attempt to understand and to contrast the impact of problem features on the performance of randomized search heuristics for black-box multiobjective combinatorial optimization problems. At first, we measure the performance of two conventional dominance-based approaches with unbounded archive on a benchmark of enumerable binary optimization problems with tunable ruggedness, objective space dimension, and objective correlation (MNK-landscapes). Precisely, we investigate the expected runtime required by a global evolutionary optimization algorithm with an ergodic variation operator (GSEMO) and by a neighborhood-based local search heuristic (PLS), to identify a (approximation of the Pareto set. Then, we define a number of problem features characterizing the fitness landscape, and we study their intercorrelation and their association with algorithm runtime on the benchmark instances. At last, with a mixed-effects multilinear regression we assess the individual and joint effect of problem features on the performance of both algorithms, within and across the instance classes defined by benchmark parameters. Our analysis reveals further insights into the importance of ruggedness and multimodality to characterize instance hardness for this family of multiobjective optimization problems and algorithms.

## 1 Introduction

### 1.1 Motivation

Many optimization problems arising in real-world applications are characterized by a discrete solution space, and by multiple objective functions, such as cost, profit, or risk, that are ill-defined, computationally expensive, or for which an analytical form is not available (Coello and Lamont, 2004). A difficult task is then to identify or approximate a set of solutions, known as the Pareto set, representing the possible optimal trade-offs among the objectives. But these black-box multiobjective combinatorial optimization problems severely limit the number of evaluations an optimizer can perform. Despite the increasing number of available general-purpose heuristics for evolutionary multiobjective optimization (EMO; see for example, Deb (2001); Coello et al. (2007)), we argue that their design and configuration is still mainly based on intuition, and that the understanding of their performance and design principles is still in its infancy and comparatively less advanced than in the single-objective case. A challenging issue is to identify a number of general-purpose features characterizing problem hardness, and to understand which problem features have an influence on the performance of EMO algorithms, as well as the differences and the similarities across different algorithm and problem classes.

### 1.2 Related Work

In single-objective combinatorial optimization, research on fitness landscape analysis aims at characterizing the fitness landscape associated with the optimization problem where the algorithm operates (Merz, 2004; Richter and Engelbrecht, 2014). Contrary to a complexity-theoretical perspective of convergence properties and runtime analysis, a fitness landscape analysis instead relies on a mathematical model that helps to understand the relation between the geometry of an optimization problem and the dynamics of a randomized search heuristic. Tools from graph theory, feature-based analysis, or correlation and regression analysis, are means to investigate the difficulties that an algorithm faces when solving a particular problem. This paradigm is particularly relevant for black-box optimization, for which problem-specific expert knowledge is usually difficult to obtain. More recently, benchmark parameters as well as problem and fitness landscape features have been used as input variables in statistical regression analysis in order to estimate, and then understand, their relationship with the performance of randomized search heuristics for single-objective optimization problems of continuous and combinatorial nature; see for example, Mersmann et al. (2011); Bischl et al. (2012); Daolio et al. (2012); Mersmann et al. (2012). In particular, although they only consider benchmark parameters in their analysis, it is worth noticing that Chiarandini and Goegebeur (2010) investigate mixed models in order to separate the effects of algorithm components and problem characteristics when analyzing local search algorithms for the 2-edge-connectivity augmentation problem.

In multiobjective combinatorial optimization, few attempts have been made at designing and analyzing problem and fitness landscape features, whereas there is a strong evidence that problem-related properties are known to largely affect the properties of the Pareto set (Mote et al., 1991) and the behavior of multiobjective optimization algorithms (Paquete and Stützle, 2006). One of the first studies on the distribution of local optima for the multiobjective traveling salesman problem is due to Borges and Hansen (1998), which shows the existence of a global convexity under a common neighborhood structure while covering the whole Pareto front by varying the scalarizing function parameters. Similarly, Paquete and Stützle (2009) have shown that nondominated solutions are strongly clustered with respect to the same neighborhood for the same problem class, while the degree of clustering highly depends on the instance structure for the multiobjective quadratic assignment problem. Knowles and Corne (2003) have proposed and investigated multiple fitness landscape features in order to distinguish the degree of difficulty of multiobjective quadratic assignment problems. As well, tools from fitness landscape analysis have been reviewed and adapted to multiobjective optimization by Garrett and Dasgupta (2007) in terms of scalarizing local optima. In another study, Garrett and Dasgupta (2009) have proposed to visualize a multiobjective fitness landscape as a neutral landscape divided into different fronts containing solutions within the same dominance rank. Aguirre and Tanaka (2007) have defined several problem features such as the number of fronts, the number of solutions within each front, the probability to pass from one front to another, and the hypervolume of the (exact) Pareto set, and related them with the design of EMO algorithms on enumerable multiobjective NK-landscapes. At last, the impact of the problem dimension, the degree of nonlinearity, the number of objectives, and the objective correlation of multiobjective NK-landscapes has been related to the structure of the Pareto set and to the number of Pareto local optima by Verel et al. (2013).

Overall, previous work on multiobjective fitness landscapes has often investigated one characteristic at a time, and rarely related problem and fitness landscape features to algorithm performance. Furthermore, we are not aware of any research on feature-based statistical or machine learning modeling aiming at estimating and analyzing the performance of EMO algorithms.

### 1.3 Contributions

This article attempts to bridge the gap between a fully theoretical work on runtime analysis and a more practical work on the performance analysis of multiobjective randomized search heuristics. By introducing new features, by explicitly defining features from the literature on multiobjective fitness landscape analysis, and by considering multiple problem and fitness landscape features simultaneously, a fundamental general-purpose statistical framework is proposed to better understand the difficulties that EMO algorithms might have to face. To the best of our knowledge, such a feature-based performance analysis is novel in the context of multiobjective optimization, in particular in that is uses a multilevel mixed-effects linear regression to model the performance of EMO algorithms. It is our hope that a systematic and thorough empirical study could bring valuable metaknowledge to the practitioner and to the algorithm designer. The research questions motivating and guiding the article are as follows:

**Question #1:***What features might characterize multiobjective combinatorial landscapes?***Question #2:***How do features relate to benchmark parameters? How do they relate to one another? Are they linearly dependent?***Question #3:***Which features are ordinally associated with algorithm performance?***Question #4:***How much of algorithm performance variance can features explain?***Question #5:***What is the conditional impact of each feature on algorithm performance? What are the significant common trends across instance groups?***Question #6:***Which features are relevant predictors of algorithm performance?***Question #7:***Does the impact of features on algorithm performance change with landscape ruggedness? Can ruggedness be used to explain changes across instance groups?*

In order to address these issues, we first identify a substantial number of existing and original problem properties and fitness landscape features for black-box multiobjective combinatorial optimization. They include benchmark instance parameters, such as variable correlation, objective correlation, and objective space dimension, as well as problem features from the Pareto set, the Pareto graph, and the ruggedness and multimodality of the fitness landscape. We report all these measures, together with a correlation analysis between them, on a large number of enumerable multi-objective NK-landscapes with objective correlation, that is MNK-landscapes (Verel et al., 2013). As in the single-objective case, the model of NK-landscapes allows one to describe and generalize a large family of unconstrained multimodal 0–1 optimization problems (Heckendorn and Whitley, 1997).

Next, we propose a general methodology to investigate the impact of problem features on the performance of multiobjective randomized search heuristics. More particularly, we analyze how strongly those proposed features are associated with algorithm performance, how the algorithm performance changes when varying each problem feature, and what is the relative importance of features in explaining the performance variance. To this end, we conduct an experimental analysis on the performance of two prototypical dominance-based EMO algorithms, namely the global simple EMO optimizer (GSEMO) (Laumanns et al., 2004) and the Pareto local search (PLS) algorithm (Paquete et al., 2004), of which we measure the estimated runtime to find a approximation of the Pareto set. Overall, the runtime of both approaches is impacted by each of the identified multiobjective problem features, and particularly by the ruggedness and the multimodality of the fitness landscape, and by the hypervolume value of the optimal Pareto set. Our study shows the relative influence of problem features on algorithm efficiency as well as the differences and similarities between both algorithms. As such, the emphasis of the present article is more on making inferences, see for example, Chiarandini and Goegebeur (2010), than on making predictions; see for example, Hutter et al. (2014). That is, we are concerned with modeling the empirical data to test hypothesis in the context of an appropriate statistical model. Hence, we value model interpretability over predictive power as long as model assumptions are acceptable. On these lines, we attempt to provide both insightful understandings and methodological suggestions about the performance analysis of multiobjective optimization algorithms.

### 1.4 Outline

The remainder of the article is organized as follows. In Section 2, we detail the background information about fitness landscape analysis, multiobjective optimization, MNK-landscapes, EMO algorithms, and their rating of performance. In Section 3, we analyze the empirical impact of MNK-landscape benchmark parameters on the performance of GSEMO and PLS. In Section 4, we identify relevant problem features, and report quantitative results and a correlation analysis for MNK-landscapes (Questions #1–2). In Section 5, we conduct an association and regression analysis in order to point out how problem features influence the performance of EMO algorithms (Questions #3–7). In Section 6, we conclude and suggest further research into feature-based performance analysis in EMO.

## 2 Preliminaries

In this section, we give a brief methodological context and the relevant definitions about the multiobjective combinatorial optimization problem under study, the EMO algorithms applied to it, and our performance measure of choice.

### 2.1 Fitness Landscape Analysis

In single-objective optimization, fitness landscape analysis allows one to study the topology of a combinatorial optimization problem by gathering important information such as ruggedness or multi-modality (Weinberger, 1990; Merz, 2004). A fitness landscape is defined by a triplet , where is a set of admissible solutions (the solution space), is a neighborhood relation, and is a (scalar) black-box fitness function, here assumed to be maximized. A *walk* over the fitness landscape is an ordered sequence of solutions from the solution space such that , and for all .

An *adaptive walk* is a walk such that for all , , as performed by a conventional hill-climbing algorithm. The number of iterations, or steps, of the hill-climbing algorithm defines the length of the adaptive walk. This length is an estimator of the diameter of local optima’s basins of attraction, characterizing a problem instance multimodality. Roughly speaking and assuming isotropy in the search space, the longer the length of adaptive walks, the larger the basin’s size, the lower the number of local optima. This allows us to estimate their number when the whole solution space cannot be enumerated exhaustively.

*random walk*over the solution space. The autocorrelation function and the correlation length of such a random walk allow one to measure the ruggedness of a fitness landscape (Weinberger, 1990). The random walk autocorrelation function of a (scalar) fitness function is defined as follows: where and are the expected value and the variance of , respectively. The autocorrelation coefficients can be estimated within a finite random walk of length : where , and . The longer the length of the random walk , the better the estimation. The correlation length measures how the autocorrelation function decreases. This characterizes the ruggedness of the landscape: the larger the correlation length, the smoother the landscape. Following Weinberger (1990), we define the correlation length by . This is based on the assumption that the autocorrelation function decreases exponentially.

### 2.2 Multiobjective Optimization

We are interested in maximizing a black-box objective function vector , which maps any solution from the solution space to a vector in the objective space , with , such that . We assume that the solution space is a discrete set , where is the problem size, that is the number of binary (zero-one) variables. An objective vector is dominated by an objective vector , denoted by , iff , and there is a such that . Similarly, a solution is dominated by a solution iff . An objective vector is nondominated if there does not exist any objective vector such that . A solution is nondominated, or Pareto-optimal, if is nondominated. The set of Pareto-optimal solutions is the Pareto set (PS); its mapping in the objective space is the Pareto front (PF). The goal of multiobjective optimization is to identify the Pareto set/front, or a good approximation of it for large-size and difficult problems.

### 2.3 MNK-Landscapes

*epistatic interactions*. These -values are uniformly distributed in the range . Thus, the individual contribution of a variable depends on its value and on the values of other variables . The problem can be formalized as follows:

In this work, the epistatic interactions, that is, the variables that influence the contribution of , are set uniformly at random among the variables other than , following the random neighborhood model from Kauffman (1993). By increasing the number of epistatic interactions from 0 to , problem instances can be gradually tuned from smooth to rugged. In MNK-landscapes, -values additionally follow a multivariate uniform distribution of dimension , defined by an positive-definite symmetric covariance matrix such that and for all with , where defines the correlation among the objectives; see Verel et al. (2013) for details. The positive (respectively, negative) objectives correlation decreases (respectively, increases) the degree of conflict between the different objective function values. The correlation coefficient is the same between all pairs of objectives, and the same epistatic degree and epistatic interactions are set for all the objectives.

### 2.4 Multiobjective Randomized Search Heuristics

In this article, we consider two randomized search heuristics: (i) Global SEMO (GSEMO) proposed by Laumanns et al. (2004), a simple elitist steady-state *global* EMO algorithm (see Algorithm 1); and (ii) Pareto local search (PLS) proposed by Paquete et al. (2004), a multiobjective *local* search (Algorithm 2). These algorithms extend to the multi-objective case two conventional search heuristics, namely the -evolutionary algorithm and the hill-climbing local search algorithm, which are often investigated in the theoretical literature on single-objective optimization. Though their design is guided by simple heuristic rules, these algorithms are components of many state-of-the-art multiobjective combinatorial optimization approaches (Andersen et al., 1996; Paquete et al., 2004; Paquete and Stützle, 2006; Liefooghe, Paquete, and Figueira, 2013) and their search dynamics typically reveal the complex behavior that we aim to further understand in this work. Both algorithms maintain an *unbounded* archive of mutually nondominated solutions. This archive is initialized with one random solution from the solution space. Then, at each iteration, one solution is selected at random from the archive, . In GSEMO, each binary variable from is independently flipped with rate in order to produce an offspring solution . The archive is then updated by keeping the nondominated solutions from . In PLS, the solutions located in the neighborhood of are evaluated. Let denote the set of solutions located at a Hamming distance 1. The nondominated solutions from are stored in the archive, and the current solution is then tagged as *visited* in order to avoid a useless revaluation of its neighborhood. This process is iterated until a stopping condition is satisfied. While for GSEMO there does not exist any explicit stopping rule (Laumanns et al., 2004), PLS has a natural stopping condition which is satisfied when all the solutions in the archive are tagged as *visited*.

In other words, while PLS is based on the exploration of the whole 1-bit-flip neighborhood from , GSEMO rather uses an ergodic operator, that is, an independent bit-flip mutation. This means that there is a non-zero probability of reaching any solution from the solution space at every GSEMO iteration. This makes GSEMO a *global* optimizer rather than a *local* optimizer as PLS. In this article, we are interested in the runtime, in terms of a number of function evaluations, until a approximation of the Pareto set is identified and is contained in the internal memory of the algorithm, subject to a maximum budget of function evaluations.

### 2.5 Estimated Runtime ()

Let be a constant value such that . The (multiplicative) -dominance relation () can be defined as follows (Laumanns et al., 2002). For , is -dominated by () iff , . The -value then stands for a relative tolerance that we allow within objective values. A set is a approximation of the Pareto set if for any solution , there is one solution such that . This is equivalent to finding an approximation set whose multiplicative epsilon quality indicator value with respect to the (exact) Pareto set is lower than (see e.g., Zitzler et al., 2003). Interestingly, under some general assumptions, there always exists a approximation, for any given , whose cardinality is both polynomial in the problem size and in (Papadimitriou and Yannakakis, 2000).

*restart*the algorithm at random. Thus we obtain a “

*simulated runtime*” (Auger and Hansen, 2005) from a set of independent trials on each instance. Such performance measure allows us to take into account both the success rate and the convergence speed of the algorithm with restarts. Precisely, after failures, each one requiring evaluations, and the final successful run of evaluations, the total runtime is . By taking the expectation and by considering independent trials as a Bernoulli process stopping at the first success, we have: In our case, the success rate is estimated with the ratio of successful runs over the total number of executions (), the expected runtime for unsuccessful runs is set as a constant limit on the number of function evaluation calls , and the expected runtime for successful runs is estimated with the average number of function evaluations performed by successful runs: where is the number of successful runs, and is the number of evaluations for successful run . For more details, we refer to Auger and Hansen (2005).

## 3 Experimental Analysis

In this section, we examine the performance of GSEMO and PLS depending on the epistatis, the objective space dimension, and the objective correlation of MNK-landscapes.

### 3.1 Experimental Setup

As problem instances, we consider MNK-landscapes with an epistatic degree , an objective space dimension , and an objective correlation , such that . This restriction on -values comes from the fact that the contributions of objective components (see Section 2.3), are sampled from a multivariate normal distribution whose covariance matrix has to be symmetric and positive-definite (Verel et al., 2013). The problem size is set to in order to enumerate the solution space exhaustively. The solution space size is then . A set of 30 different landscapes are independently generated at random for each parameter combination , , and , for a total of instances. They are made available at http://mocobench.sf.net.

The target tolerance is set at . The time limit is set to function evaluations without identifying a approximation. Each algorithm is executed 100 times *per* instance. From these 100 repetitions, the success rate and the expected number of evaluations for successful runs, hence the estimated runtime on the given instance, are computed. For the comparative analysis, we consider only pairwise-complete cases, that is, instances that have been solved by both algorithms. This brings the total number of available observations to *per* algorithm.

### 3.2 Exploratory Analysis

The estimated runtime () distribution across the experimental blocks that are defined by each combination of benchmark parameters is presented in Figure 1. For both algorithms, the clearly increases with the nonlinearity () and the number of objectives (), whereas the trend w.r.t. the objective correlation () is a bit more complex. Indeed, for a small and a large , the decreases when increases. On the contrary, for large , problem instances seem to get harder when the objectives are independent () rather than anticorrelated (). This shows in the inverted u-shape observed on the right side of the figure, which is particularly pronounced for PLS. This is surprising because the cardinality of the Pareto set increases when objectives are conflicting (Verel et al., 2013). However, we observed that the -value of random approximation sets follows a similar u-shaped trend w.r.t. objective correlation. Also, this -value tends to increase with . This holds for approximation sets containing a constant number of randomly generated solutions. Moreover, we know from Verel et al. (2013) that the number of Pareto local optimal solutions increases with and decreases with . This could explain the relative advantage of PLS on problem instances with positively correlated objectives. Notice also that the opposite is true for the connectedness of the Pareto set: the smaller and the larger , the more clustered Pareto optimal solutions are in the solution space.

Figure 2 displays the runtime aggregated over all the instance parameters (, M, K) but one. We clearly see that PLS is significantly outperforming GSEMO overall, and in particular for positively correlated objectives. In fact, the runtime of PLS is shorter than that of GSEMO in of the instances. Compared against PLS, GSEMO requires more than additional function evaluations on average to identify a 1.1–approximation of the Pareto set. The performance difference between the two algorithms seems to be constant, except for large and w.r.t. . Notably, the ruggedness of the underlying single-objective objective functions appears to have the highest impact on the search performance. In particular, the ruggedness seems to have more impact on the performance of GSEMO than PLS. In general, finding a approximation becomes harder as the number of objectives grows and much harder for highly-rugged instances, whereas the trend w.r.t. objective correlation is less clear, more algorithm-dependent.

In the following, we list the problem features that *intuitively* impact the performance of randomized search heuristics for the class of MNK-landscapes, and we explicitly assess their separate and joint effect on the runtime of PLS and GSEMO.

## 4 Features Characterizing Problem Difficulty

**Question #1:***What features might characterize multiobjective combinatorial landscapes?*

In this section, we identify a number of general-purpose features, either directly extracted from the problem instance, or computed from the fitness landscape. Then, we conduct a correlation analysis of feature pairs, showing how features relate to benchmark parameters and anticipating the interplay of those features in capturing the difficulties of a problem instance.

### 4.1 Benchmark Parameters

First, we consider the following parameters related to the definition of MNK-landscapes. Recall that in this analysis the problem size is kept constant to .

**Number of variable interactions ():**This gives the number of variable correlations in the construction of MNK-landscapes. As it will be detailed later, although the value of cannot be retrieved directly from a black-box instance, it can be precisely estimated by some of the problem features described below.**Number of objective functions ():**This parameter represents the dimension of the objective space in the definition of MNK-landscapes.**Correlation between the objective function values ():**This parameter allows us to tune the correlation between the objective function values in MNK-landscapes. In our analysis, the objective correlation is the same between all pairs of objectives.

### 4.2 Problem Features from the Pareto Set

The fitness landscape features considered in our analysis are described below. We start with some general features related to the Pareto set.

**Number of Pareto optimal solutions (****):**The number of Pareto optimal solutions enumerated in the instance under consideration simply corresponds to the cardinality of the (exact) Pareto set, i.e. . The approximation set manipulated by any EMO algorithm is directly related to the cardinality of the Pareto optimal set. For MNK-landscapes, the number of Pareto optimal solutions typically grows exponentially with the problem size, the number of objectives, and with the degree of conflict between the objectives (Verel et al., 2013).**Hypervolume of the Pareto set (****):**The hypervolume value gives the portion of the objective space that is dominated by the Pareto set (Zitzler et al., 2003). We take the origin as a reference point .**Average distance between Pareto optimal solutions (****):**This metric corresponds to the average distance, in terms of Hamming distance, between any pair of Pareto optimal solutions.**Maximum distance between Pareto optimal solutions (****):**This metric is the maximum distance between two Pareto optimal solutions in terms of Hamming distance. This feature is denoted as the diameter of the Pareto set by Knowles and Corne (2003).**Proportion of supported solutions (****):**Supported solutions are Pareto optimal solutions whose corresponding objective vectors are located on the convex hull of the Pareto front. Notably, nonsupported solutions are*not*optimal with respect to a weighted-sum aggregation of the objective functions, whatever the setting of the (positive) weighting coefficient vector. As a consequence, the proportion of supported solutions on the Pareto set has a direct impact on the ability of scalar approaches to find a proper Pareto set approximation. However, this feature is expected to have a low impact on the performance of dominance-based EMO approaches such as GSEMO and PLS.

### 4.3 Problem Features from the Pareto Graph

In the following, we describe some problem features related to the *connectedness* of the Pareto set (Ehrgott and Klamroth, 1997; Gorski et al., 2011). If all Pareto optimal solutions are connected with respect to a given neighborhood structure, the Pareto set is said to be *connected*, and local search algorithms would be able to identify all Pareto optimal solutions by starting with at least one of them; see for example, Andersen et al. (1996); Paquete et al. (2008); Paquete and Stützle (2009); Liefooghe, Paquete, and Figueira (2013). We follow the definition of *-Pareto graph* from Paquete et al. (2008). The -Pareto graph is defined as a graph , where the set of vertices contains all Pareto optimal solutions, and there is an edge between two nodes and if and only if the shortest distance between solutions and is below a bound , that is, . The distance is taken as the Hamming distance for MNK-landscapes. This corresponds to the *bit-flip* neighborhood operator. The connectedness-related problem features under investigation are given below.

**Relative number of connected components (****):**This metric is the number of connected components in the 1-Pareto graph (), normalized by the number of Pareto optimal solutions.**Proportional size of the largest connected component (****):**This corresponds to the proportion of Pareto optimal solutions that belong to the largest connected component in the 1-Pareto graph .**Minimum distance to connect the Pareto graph (****):**This measure corresponds to the smallest distance such that the -Pareto graph is connected; that is, for all pairs of vertices in , there exists a path between and .

### 4.4 Problem Features from Ruggedness and Multimodality

At last, we consider problem features related to the number of local optima, the length of adaptive walks, and the autocorrelation functions.

**Number of Pareto local optima (****):**A solution is a*Pareto local optimum*with respect to a neighborhood structure if there does not exist any neighboring solution such that (see e.g., Paquete et al., 2007). For MNK-landscapes, the neighborhood structure is taken as the*1-bit-flip*, which is directly related to a Hamming distance of 1. This metric reports the number of Pareto local optima enumerated on the MNK-landscape under consideration.**Length of a Pareto-based adaptive walk (****):**We compute here the length of adaptive walks by means of a very basic single solution-based*Pareto-based Hill-Climbing*(PHC) algorithm. The PHC algorithm is initialized with a random solution. At each iteration, the current solution is replaced by a random dominating neighboring solution. As a consequence, PHC stops on a Pareto local optimum. The number of iterations, or steps, of the PHC algorithm is the length of the Pareto-based adaptive walk. As in the single-objective case, the number of Pareto local optima is expected to increase exponentially when the adaptive length decreases for MNK-landscapes (Verel et al., 2013).**First autocorrelation coefficient of solution hypervolume (****):**The ruggedness is measured here in terms of the autocorrelation of the hypervolume along a random walk. As explained in Section 2.1, the correlation length measures how the autocorrelation function, estimated with a random walk, decreases. The autocorrelation coefficients are computed here with the following scalar fitness function : , where is the hypervolume of solution , the reference point being set to the origin. The random walk length is set to , and the neighborhood operator is the*1-bit-flip*.**First autocorrelation coefficient of local hypervolume (****):**This metric is similar to the previous one, except that the fitness function is based here on a local hypervolume measure. The local hypervolume is the portion of the objective space covered by non-dominated neighboring solutions; that is, for all , . Similarly to , the random walk length is set to , and the neighborhood operator is the*1-bit-flip*.

The benchmark parameters defining MNK-landscapes and a total of twelve general-purpose problem features are summarized in Table 1. Notice that most of those features require the solution space to be enumerated exhaustively, with the exception of benchmark parameters as well as , , and . For this reason, they are not practical for a performance prediction purpose. However, we decided to include them because our aim is to examine their impact on the algorithm performance.

Benchmark parameters (3) | ||

Correlation between the objective function values | ||

Number of objective functions | ||

Number of variable interactions (epistatis) | ||

Problem features (12) | ||

Number of Pareto optimal solutions | (Knowles and Corne, 2003) | |

Hypervolume (Zitzler et al., 2003) of the Pareto set | (Aguirre and Tanaka, 2007) | |

Average distance between Pareto optimal solutions | (Liefooghe, Verel, Aguirre, and Tanaka, 2013) | |

Maximum distance between Pareto optimal solutions | (Knowles and Corne, 2003) | |

Proportion of supported solutions in the Pareto set | (Knowles and Corne, 2003) | |

Number of Pareto local optima | (Paquete et al., 2007) | |

Length of a Pareto-based adaptive walk | (Verel et al., 2013) | |

Relative number of connected components | (Paquete and Stützle, 2009) | |

Proportional size of the largest connected component | (Verel et al., 2011) | |

Minimal distance to connect the Pareto graph | (Paquete and Stützle, 2009) | |

First autocorrelation coefficient of solution hypervolume | (Liefooghe, Verel, Aguirre, and Tanaka, 2013) | |

First autocorrelation coefficient of local hypervolume | (Liefooghe, Verel, Aguirre, and Tanaka, 2013) |

Benchmark parameters (3) | ||

Correlation between the objective function values | ||

Number of objective functions | ||

Number of variable interactions (epistatis) | ||

Problem features (12) | ||

Number of Pareto optimal solutions | (Knowles and Corne, 2003) | |

Hypervolume (Zitzler et al., 2003) of the Pareto set | (Aguirre and Tanaka, 2007) | |

Average distance between Pareto optimal solutions | (Liefooghe, Verel, Aguirre, and Tanaka, 2013) | |

Maximum distance between Pareto optimal solutions | (Knowles and Corne, 2003) | |

Proportion of supported solutions in the Pareto set | (Knowles and Corne, 2003) | |

Number of Pareto local optima | (Paquete et al., 2007) | |

Length of a Pareto-based adaptive walk | (Verel et al., 2013) | |

Relative number of connected components | (Paquete and Stützle, 2009) | |

Proportional size of the largest connected component | (Verel et al., 2011) | |

Minimal distance to connect the Pareto graph | (Paquete and Stützle, 2009) | |

First autocorrelation coefficient of solution hypervolume | (Liefooghe, Verel, Aguirre, and Tanaka, 2013) | |

First autocorrelation coefficient of local hypervolume | (Liefooghe, Verel, Aguirre, and Tanaka, 2013) |

### 4.5 Correlation between Problem Features

**Question #2:***How do features relate to benchmark parameters? How do they relate to one another? Are they linearly dependent?*

The correlation matrix between each pair of features is reported in Figure 3. In view of fitting linear models in the next stage of our analysis, we use here the Pearson linear correlation coefficient. First, the number of objectives is moderately correlated with the cardinality of the Pareto set . So is the objective correlation (the absolute correlation coefficient is around 0.5 in both cases). Surprisingly, none of these features alone ( or ) can explain the number of nondominated solutions. This means that the objective space dimension does not justify by itself the large amount of nondominated solutions found in many-objective optimization problems (Wagner et al., 2007). As pointed out by Verel et al. (2013), the degree of conflict between the objective function values has also to be taken into account. In fact, a multilinear regression to predict based on both and can explain of the variance, with a high correlation coefficient (0.84) between measured and fitted values. The regression coefficients show that the number of Pareto optimal solutions increases with the number of objectives and decreases with the objective correlation. As the combination of and allows one to predict a large part of the (log-transformed) Pareto set cardinality variance, we believe that, more generally, the impact of many-objective fitness landscapes on the search process cannot be analyzed properly without taking the objective correlation into account. Furthermore, the number of objectives is also highly correlated with the hypervolume of the Pareto set (the absolute correlation coefficient is 0.89), whereas the features having the highest absolute correlation with are the fraction of supported solutions and the number of Pareto local optima (the correlation coefficients are 0.86 and , respectively).

Problem features from the Pareto graph aim to characterize the topology of Pareto optimal solutions. The relative number of connected components is positively correlated with the minimal distance to connect all the components (the correlation coefficient is 0.65), whereas the relative size of the largest component is negatively correlated with the number of components (). The minimal distance is also highly correlated with the maximal Hamming distance between Pareto optimal solutions (over 0.9). Connectedness metrics appear to be mostly more correlated with benchmark parameter . For instance, the correlation coefficient between the relative number of components and is 0.77. The number of Pareto optimal solutions or the hypervolume of the Pareto front are not clearly correlated with Pareto graph features. Indeed, the logarithm of the Pareto set size is negatively correlated with the relative number of connected components (), but it is not correlated with the size of the largest component.

Several features relate to and can characterize the ruggedness and the multi-modality of the fitness landscape. For instance, the number of Pareto optimal solutions and of Pareto local optima are highly correlated (the correlation coefficient of their log-transformed values is 0.79). Not surprisingly, the number of local optima increases with the number of non-dominated solutions. Unfortunately, the number of Pareto local optima cannot be computed without the full enumeration of the set of feasible solutions. Nevertheless, its log-transformed value is highly linearly correlated to the length of a Pareto-based adaptive walk (the absolute correlation coefficient is 0.99). This potentially allows one to estimate the number of Pareto local optima for large-size problem instances; see Verel et al. (2013). On the contrary, the correlation between the number of variable interactions (epistatis) and the number of Pareto global or local optima is low. Although those important features certainly depend on the benchmark parameter , that is not a direct linear relation. Other features are required to fully explain the number of optima. On these benchmark instances, the number of epistatic interactions can be estimated by hypervolume-based autocorrelation measures from random walks and (the absolute correlation coefficients are close to 1.0). Indeed, the autocorrelation coefficient is higher when the number of epistatic interactions is low, as in the single-objective case (Weinberger, 1991).

This correlation matrix gives a “big picture” of some of the features that can describe the search space structure of a multiobjective combinatorial optimization problem. In the next section, we relate the value of those features for enumerable MNK-landscapes to the performance of both GSEMO and PLS on the same landscapes.

## 5 Feature-Based Analysis

In this section we link problem features to the estimated runtime of EMO algorithms. First, we measure how strongly each individual feature is associated to performance via a correlation analysis. Then, aiming to disentangle features contributions and their importance in explaining search performance, we assess their conditional impact on algorithm runtime by means of a multilevel, multilinear regression model.

### 5.1 Correlation between Problem Features and Algorithm Performance

**Question #3:***Which features are ordinally associated with algorithm performance?*

A first assessment of the dependency of the search performance on instance features can be done through visual inspection of scatter plots, supported by a correlation analysis. Naturally, *correlation* does not imply *causation* and we do not draw any *direct* link between each considered feature and the algorithm runtime, even if in our case the eventual link could only go in one direction. Instead, we restrict ourself to measure the association of each feature to the performance metric (). We quantify the strength of this dependency via the Kendall’s statistic (McLeod, 2011), since we want to assess the accordance between the variation in algorithm performance and the variation in problem features. This non-linear rank-correlation measure is based on the ordering of all possible pairs, and its value is proportional to the difference between the number of concordant and discordant pairs among all possible pairwise comparisons. As such, when the null hypothesis of mutual independence () is rejected, can be directly interpreted as the probability of observing agreement () or disagreement () between the ranks of paired values.

The scatter plots and the regressions (with local polynomial fitting) of the average runtime of both algorithms as a function of the instance features are provided in Figure 4. In addition, Kendall’s coefficients are given by the red points in Figure 5. For both algorithms, the average distance between Pareto optimal solutions () is highly positively correlated with : the larger this distance, the longer the runtime. On the contrary, both ruggedness-related features based on measures of hypervolume autocorrelation (, ), and one feature related to the connectedness, that is, the size of the largest cluster in the Pareto set (), are highly negatively correlated. As expected, ruggedness and connectedness play a major role for both algorithms: the runtime decreases with and , and when a large number of Pareto optimal solutions are connected in the solution space.

Some features have a different impact on the two algorithms, possibly highlighting their respective strengths and weaknesses. In particular, the runtime of PLS increases with the number of Pareto optimal and locally optimal solutions. Contrastingly, the scatter plots show that having a high number of Pareto local optima has less impact on GSEMO than on PLS. Moreover, the runtime of GSEMO is correlated with three other features related to the distance and the connectedness of Pareto optimal solutions (, , ). Indeed, topological relationships between Pareto optimal solutions have a large effect on the runtime of GSEMO, especially when the distance between those solutions is large. Surprisingly, the runtime of PLS does not increase when nondominated solutions are disconnected.

However, we have to be careful when drawing conclusions by aggregating data from different areas of the instance parameters space, since feature values and their range depend, in turn, on the levels of , , and . This can be visually appreciated in Figure 4: the autocorrelation measures and , for example, are clearly clustered around five levels that actually correspond to the different -values. Similarly, we are able to distinguish three clusters in the hypervolume metric , which actually follow the objective space dimension .

Therefore, we deepen the analysis by evaluating the correlation *within* the instance groups defined by each possible combination of the -values under study. Black points and lines in Figure 5 display the average -value within groups, together with the confidence interval associated with the mean. By comparing them with red points, we can clearly notice how data aggregation slightly enhances the correlation statistic in the and case, leading to the same inference nonetheless. On the contrary, although hypervolume is very weakly associated with runtime overall, and that its impact is contradictory between GSEMO and PLS, group results are more consistent, showing a strong positive association between and for both algorithms. Unfortunately, as for features related to the connectedness of the Pareto set, our confidence on the average correlation within groups is too low to make further comments, mainly due to the fact that, in many cases, we could not reject the null hypothesis of mutual independency at the group level. Nevertheless, the previous observations on instance groups and their possible effect motivate the remainder of our analysis.

### 5.2 Linear Mixed Model

**Question #4:***How much of algorithm performance variance can features explain?*

In this section, we aim to quantify the impact of instance features, and possibly disentangle their individual contribution to the performance variance, by taking into account the dependency among measurements that is induced by the experimental plan. Our goal is precisely to generalize from it as much as possible, in order to make inferences about the effect and relative importance of features. Let be the log-transformed estimated runtime () on the -th problem instance. We treat as an observation from a random variable with expectation , that is, where are taken to be independent, identically distributed, and zero-mean.

*fixed*) slopes and

*random*intercepts. Notice that estimates and residuals will likely not be the same as in the previous model. Notice also that performance observations are now i.i.d.

*conditionally*on the grouping factor , whereas the unconditional model carries a dependency between measurements of the same group. Hereby we also obtain the aforementioned variance deconstruction that shows which part of the observed performance variance can be ascribed to the grouping of problem instances (Chiarandini and Goegebeur, 2010).

The usual approach to estimate the parameters of the unconditional model is the restricted maximum likelihood method; see Faraway (2005) and Bates et al. (2014) for theoretical and implementation details. In the following, we present the results of such estimation on the full model comprising all instance features and for both algorithms. In particular, each estimated is tested against the null hypothesis , whereas as for the group effect we only need to check . In the multilinear framework, the regression coefficients that are statistically significant allow us to assess the runtime effect of each feature conditionally on all the others and, given the random effect formulation, across experimental blocks.

Finally, in order to assess the accuracy of a regression model, the conventional (ratio of variance explained by the regression) can be extended by taking into account the variance decomposition that is specific to mixed models (Nakagawa and Schielzeth, 2013). We obtain a *marginal* yielding the proportion of variance explained by instance features, and a *conditional* which, despite its name, gives the proportion of variance explained by the entire model, that is, including the random effect of benchmark parameter combinations. Marginal and conditional are respectively 0.617 and 0.919 for the regression modeling GSEMO’s , respectively 0.482 and 0.911 for PLS.

**A note on multicollinearity.** Multilinear regression modeling rests on few key assumptions, namely the usual normality of residuals and homogeneity of variance, but also on predictors (linear) independence. In fact, linear correlation between two or more predictors (collinearity) may produce unstable parameter estimates with high standard errors. That is all the more problematic when the analysis goal is to determine the individual contribution of each predictor in explaining the response variable. The astute reader might have spotted collinearity in Section 4.5 and will be skeptical of the regression results hereafter. Hence, we need to address this issue before going further.

In order to assess the degree of multicollinearity, we calculate the widely used Variance Inflation Factor (VIF) (Fox and Monette, 1992). Classically, the VIF of a predictor is computed from the of the multilinear model predicting from the remaining covariates. That method is to measure its redundancy w.r.t. all other predictors. However, in the presence of clustered data, this redundancy has to be assessed conditionally on the (random) group effects. Therefore, following Davis et al. (1986) and Harrell Jr (2016), we compute the VIFs from the variance-covariance matrix of the regression coefficients estimates, which in the case of a mixed model does take random effects into account. Results are reported in Figure 6. It appears that considering group effects mitigates the consequences of collinearity to a degree that is reasonably acceptable, especially as compared to what would result from fitting a traditional multilinear model (O’brien, 2007). As a possible explanation for this, since some of the values of certain predictors might only be observed in certain groups, what appears as collinearity in multilinear models might not pose problems in mixed models. Notably, such has to be the case of the two most problematic predictors, the number of Pareto local optima () and the length of adaptive walks ().

### 5.3 Predictor Effect Size

**Question #5:***What is the conditional impact of each feature on algorithm performance? What are the significant common trends across instance groups?*

We report in Figure 7 (Top) the values of the regression coefficients estimated from the mixed model. In a multilinear regression, each coefficient predicts the change in the conditional mean of the response variable after a unitary change of the corresponding covariate . Since predictors have values in different ranges, in order to be able to compare their impact we need to standardize the regression estimates, which are reported in Figure 7 (Bottom). Standardized predict by how many standard deviations the conditional mean of the response variable will change if predictor shifts by one standard deviation. As such, higher values correspond to steeper slopes of the partial regression lines where the given predictor is the only covariate and all other predictors are held constant at their respective median value. This can be appreciated in Figure 8. Partial residuals are also added to the plots for a visual assessment of the model fit (Larsen and McCleary, 1972). For a given covariate , the corresponding partial residuals are obtained by adding the vector to the vector of residuals from the complete regression, such that the slope of a simple regression of the partial residuals on would equal . This simple interpretation motivates the choice of a multi-linear model.

For both algorithms, the features having the highest individual impact on the runtime are the hypervolume () and its autocorrelation measures ( and , in order of importance). For GSEMO, the number of Pareto optimal solutions has a significant effect on estimated runtime, the larger the shorter , whereas we cannot reject the hypothesis that the number of Pareto local optimal solutions has no effect on GSEMO’s . Still, the effect of the length of an adaptive walk (), which is a good estimator for (see Section 4.5), is significant for GSEMO. Conversely, the opposite is true for PLS: its runtime is more impacted by the problem multimodality than GSEMO. Indeed, the number of Pareto local optima has one of the largest effect sizes on PLS runtime (cf. Figure 7, Bottom). This suggests that GSEMO could be more appropriate than PLS when tackling highly multimodal instances. This also shows that, by taking group effect into account (i.e., conditionally on the problem instance class), mixed models are able to distinguish between the effects of and , which could be taken as surrogates for one another at the aggregate level.

Furthermore, the relative size of the largest connected component of the Pareto set () impacts the performance of both algorithms. However, when controlling for all other features (i.e., conditionally on all other predictors), we find that an increase in the Pareto set connectedness yields a small increase in the estimated runtime. Surprisingly also, the fraction of supported solutions () is a significant predictor for both algorithms, even if none of them explicitly exploits this feature during the search process and the impact on runtime is very small indeed. Finally, despite being highly correlated with , the average distance between non-dominated solutions () has only a moderate impact on GSEMO and no significant impact on PLS.

### 5.4 Relative Importance of Predictors

**Question #6:***Which features are relevant predictors of algorithm performance?*

Variable importance is commonly assessed via feature selection. However, stepwise selection can be misleading: intercorrelated predictors have a confounding effect on each other, but what the multilinear regression tries to measure is precisely the effect of one variable *controlling* for the others (i.e., fixing all the others), which leads to a poor estimation in the presence of collinearity. In an information theoretical framework, the Akaike Information Criterion (AIC) (Akaike, 1973) measures the relative quality of a model on a given dataset, not in terms of accuracy, but in terms of likelihood, that is, the relative distance to the unknown true mechanism generating the observed data. The difference in AIC-values between two models can then be used to estimate the strength of evidence for one model against the other. On a set of alternate models, AIC differences can be transformed into so-called Akaike weights, which can be directly interpreted as the probability for each model to be the “best” one, conditionally on the considered set of models (Burnham and Anderson, 2002). In this context, instead of performing feature selection, variable importance can be better assessed by making inference from all candidate models (Burnham and Anderson, 2004). We perform an exhaustive search in the space of all models that can be built with our predictors. The sum of Akaike weights of the models that contain a given predictor can give us an estimation of the relative importance of that particular variable. Admittedly, intercorrelated predictors can still be confounded depending on the regression model under consideration, but this methodology avoids the biases of stepwise feature selection.

Results are reported in Figure 9. The hypervolume of the Pareto front (), and the first autocorrelation coefficient of solution and local hypervolume (, ) are strong explanatory features for both algorithms, albeit in different order of relative importance. Indeed, notice that the two autocorrelation measures of ruggedness are considered to be both important in the prediction. These two features are highly correlated with each other (see Figure 3), and we can observe that their respective coefficients in the linear models have nearly the same value (see Figure 7). It is interesting to highlight that, as in single-objective optimization, the ruggedness of the multiobjective landscape is an important performance predictor for both algorithms.

More features are relatively important predictors for GSEMO than for PLS. Features reflecting the connectedness of the Pareto set are important for GSEMO, which navigates the search space with an ergodic variation operator. On the contrary, the same features carry little information about the runtime of PLS, which is constrained by the exploration of a finite neighborhood. Moreover, the number of Pareto optimal solutions (), the adaptive walk length (), and the relative size of the largest component () can also be considered as important features for GSEMO, whereas the number of Pareto local optima () is the second most-important predictor for PLS. Remember that the log-transformed number of Pareto local optima and the length of an adaptive walk are highly correlated (see Figure 3). In this case, only one of these two features is preferred by the regression models: the number of Pareto local optima is more important for PLS, and the length of adaptive walk is more important for GSEMO. In any case, our results show that, in multiobjective optimization as well, the problem multimodality actually gives a good indication of the problem difficulty.

### 5.5 Hierarchical Linear Models

**Question #7:***Does the impact of features on algorithm performance change with landscape ruggedness? Can ruggedness be used to explain changes across instance groups?*

So far, we considered a multilinear model to study the conditional effects of landscape features on algorithm runtime, assuming these effects to be *fixed* across instance classes or groups, but each group having a different *random* baseline. Random intercepts allow for variability in baseline algorithm performance across instance classes, that is, at the level of instance groups, whereas fixed regression slopes estimate the impact of features on performance at the instance level. The next logical step is to allow for variability also in the effect of landscape features across instance groups, and to try to explain these variations using class characteristics, such as the degree of epistasis . To this purpose, we shall reformulate our model as a multilevel model.

### 5.6 Multilevel Analysis

More concretely, let us focus on problem nonlinearity as instance class characteristic, which we treat as a categorical variable with values in . As a result of this choice, the level 2 predictor is encoded as a series of dummy variables, one for each -value in . Model fitting would then yield a series of and coefficients, two for each dummy variable, which we can interpret in the following way: and represent, respectively, the difference in average intercept and average slope between problem instances with and problem instances with ; and represent the difference in average intercept and average slope between problem instances with and problem instances with , and so forth. These average slopes and average intercepts, that is, the partial regression lines for a given value of problem nonlinearity , can be visualized through so-called cross-sectional plots, as in any regression that includes predictor interactions. Such plots allow us to see how the relationship between runtime and the feature of interest changes depending on the degree of epistasis of the problem.

In Sections 5.3 and 5.4, we identified the hypervolume () and its autocorrelation measures ( and ) to have the highest impact on the estimated runtime of both GSEMO and PLS. We also showed how both algorithms are impacted differently by the multimodality of a problem instance, as measured by the number of Pareto local optima () or by the length of a Pareto-based adaptive walk (). In the following, we want to see if, and how, the effect of those features changes depending on . Indeed, the ruggedness of the objective functions is often overlooked in the multiobjective optimization literature, but here we are able to exploit the tunable nature of MNK-landscapes.

First, Figure 10 displays the effect of hypervolume on runtime depending on , as captured by a multilevel model: a simple regression of runtime on hypervolume with hypervolume-epistasis interaction and random deviations in slope and intercept for each instance group. The general trend is the same as in the multilinear model of Section 5.3: the larger the hypervolume of the (exact) Pareto set to be approximated, the longer the runtime required to find a -approximation of it. However, we now clearly see that this effect is milder on smoother landscapes, while it becomes more pronounced as the degree of epistasis increases, that is, as the problem instance gets more rugged. This trend is similar for both GSEMO and PLS.

Next, Figure 11 displays the effect of the length of a Pareto-based adaptive walk on the algorithm runtime depending on . In this case, the two EMO algorithms show a divergent behavior in the way the ruggedness of the objective functions interacts with the multimodality of the multiobjective optimization problem. For both GSEMO and PLS, the common trend is as follows: the shorter , that is, the larger the expected number of Pareto local optima, the longer the estimated runtime to find a good approximation of the exact Pareto set. But while the impact of multimodality on GSEMO is rather small and does not seem to depend on , PLS is comparatively much more affected by the number of Pareto local optima, as the impact of multimodality on runtime increases with the ruggedness parameter . For both algorithms, the larger the degree of epistasis, the longer the runtime. However, on more rugged landscapes, the local EMO algorithm (PLS) appears to be increasingly more effective than its global counterpart (GSEMO) when the problem under consideration has few Pareto local optima.

At last, let us notice that the data does not support a multilevel analysis considering and the autocorrelation measures of hypervolume and . Indeed, the values of those predictors are too tightly tied to the ruggedness parameter and thus, once we control for , the residual variance is too low to allow for any meaningful analysis. For this reason, related figures are not shown in the article. However, the observations obtained from our multilevel analysis surely complement those from Section 5.3, where the focus was on instance features alone, and where reasons of model-fitting convergence did not allow us to have random slopes and a cross-level interaction term for each feature.

## 6 Conclusions

In this article, we proposed a general-purpose methodology to understand the impact of problem characteristics and fitness landscape features on the performance of EMO algorithms. To the best of our knowledge, this is novel in the multiobjective optimization literature. Our statistical investigation, based on correlation and regression analysis, does stem from the *intuitions* we may have about problem features and how they might relate to algorithm performance. But our conclusions are quantitatively supported by *empirical data*, coming from a large set of experiments (albeit on a single testbed of enumerable instances), and covering a wide range of the structural properties that EMO algorithms might encounter. In addition, the use of mixed models allows us to respect the clustered structure of the particular dataset (different instance groups, defined by the combinations of benchmark parameters), while still aiming to produce generalizable inferences: instance groups are modeled as *random* effects, whereas the interest is on the *fixed* effects of problem features on algorithm performance at the instance level. Through these mixed models, we assess and contrast the impact of landscape characteristics on the runtime of two prototypical EMO algorithms.

In particular, our analysis on MNK-landscapes emphasizes the importance of *ruggedness* and *multimodality* as the more impactful characteristics for both local and global dominance-based EMO algorithms. Although the significance of those features is certainly recognized in single-objective optimization, this is in our opinion often overlooked in the multiobjective optimization literature. Indeed, those features interact differently on the runtime of the considered EMO algorithms, with the Pareto local search algorithm showing a competitive advantage when the landscape is rugged but Pareto local optima are few. In addition, the *hypervolume* covered by the optimal Pareto set is also reported as a key feature to explain the runtime of both EMO algorithms under consideration. We could attribute this to the chosen stopping criterion, a quality threshold on the approximation set measured in terms of epsilon distance to the optimal Pareto front.

As for the choice of problem instances, we reckon the family of MNK-landscapes as a synthetic benchmark that can generalize other multiobjective combinatorial optimization problems, as NK-landscapes do in the single-objective case (Heckendorn and Whitley, 1997). However, we must acknowledge that the obvious next step is to consider additional (large-size) problem and algorithm classes, and more importantly additional (even multiple) stopping conditions and quality assessment indicators, in order to further generalize the current findings. For instance, considering a approximation of the Pareto set as a target is indeed just one way of measuring performance. In any case, it is our hope that the proposed methodology will be helpful, not only to the practitioner who wants to gain insights about his/her problem classes, but also to the algorithm designer. In fact, understanding the performance of EMO algorithms is a necessary step before improving them, for example by comparing the impact on algorithm performance of different operators (i.e., different landscapes) or of different selection mechanisms, depending on the problem characteristics. This might not only help us to have a better understanding of the respective strengths and weaknesses of multiobjective optimization algorithms, but it might also lead us to predict their expected performance on a black-box problem instance.

## Acknowledgments

The authors are grateful to the anonymous reviewers for their valuable comments and suggestions. The main part of this work was funded by the JSPS Strategic Young Researcher Overseas Visits Program for Accelerating Brain Circulation entitled “Global Research on the Framework of Evolutionary Solution Search to Accelerate Innovation” (2013–2016), and partly by the JSPS-Inria project “Threefold Scalability in Any-objective Black-Box Optimization” (2015–2017).