## Abstract

The set of available multi-objective optimisation algorithms continues to grow. This fact can be partially attributed to their widespread use and applicability. However, this increase also suggests several issues remain to be addressed satisfactorily. One such issue is the diversity and the number of solutions available to the decision maker (DM). Even for algorithms very well suited for a particular problem, it is difficult—mainly due to the computational cost—to use a population large enough to ensure the likelihood of obtaining a solution close to the DM's preferences. In this paper we present a novel methodology that produces additional Pareto optimal solutions from a Pareto optimal set obtained at the end run of any multi-objective optimisation algorithm for two-objective and three-objective problem instances.

## 1 Introduction

Much research effort has been dedicated to finding the Pareto optimal set (PS) of multi-objective optimisation problems (MOPs). Many algorithms have been developed to solve MOPs. Broadly speaking, these can be categorised into two groups based on their approach to fitness assignment: (1) algorithms based on Pareto dominance relations; and (2) decomposition-based approaches. Most algorithms developed during the 1990s and early 2000s were Pareto-based (Fonseca and Fleming, 1993; Srinivas and Deb, 1994; Zitzler et al., 2001; Deb et al., 2002a), and to this day the majority of the methodologies rely on some variant of this type of fitness assignment. Decomposition-based optimisation techniques draw upon the fact that Pareto optimal solutions can be obtained by aggregating the objective functions into a set of scalar optimisation problems (Miettinen, 1999). This set of problems can, in principle, be solved using some single-objective optimisation algorithm.

In an a posteriori preference articulation^{1} scenario (Miettinen, 1999, p. 77), the main focus of multi-objective optimisation algorithms is to approximate the Pareto optimal set^{2} as quickly as possible, and distribute the Pareto optimal solutions evenly. However, this objective can easily become unmanageable for problems with more than three objectives. This is due to the fact that, as the number of dimensions in objective space increases, the number of solutions required to cover the entire front increases exponentially (Ishibuchi et al., 2008; Schütze et al., 2012) and the sweet spot of the evolutionary algorithm parameters that produce good solutions is reduced in size (Purshouse and Fleming, 2007). A potential solution to this problem could be to restrict the search space by introducing constraints in objective space (see, e.g., Fonseca and Fleming, 1993). In the present work we focus our attention on two- and three-objective problems to avoid such complications and better illustrate the presented concepts.

An issue with the aforementioned paradigm arises when the decision maker (DM) is not completely satisfied with the obtained PS. Their options in such a case are either to use a different algorithm or to restart the preferred algorithm with different parameters in the hope that the new PS will more closely satisfy the requirements. Progressive-preference articulation algorithms (Fonseca and Fleming, 1993; Miettinen, 1999; Wagner and Trautmann, 2010) offer an alternative approach; however, the drawback is that the DM must be *in-the-loop* for the algorithm execution (Miettinen, 1999) and this can be rather demanding. Miettinen et al. (2008) provide an overview of interactive multi-objective optimisation methods.

The proposed method in this paper alleviates these difficulties for continuous MOPs by producing more, and usually better, distributed Pareto optimal solutions along the entire Pareto front (PF). Additionally, an important feature that may be helpful to both the analyst and the DM is that if there is a specific region of interest on the PF, the generation of solutions can be focused on that region. This can be helpful in situations where no solution is found in a PF region of interest to the DM. The method we propose achieves this result by estimating the mapping of a convex set to the decision vectors corresponding to the Pareto optimal solutions obtained by a multi-objective optimisation algorithm. This convex set is used in lieu of the objective vectors for reasons that are clarified in Section 3 and Section 4. This mapping, identified here using a radial basis function neural network (RBFNN), is then used to generate estimates of decision vectors that would lead to Pareto optimal solutions in the neighbourhood of the original solutions. It should be noted that an RBFNN is chosen mainly because it is computationally efficient to train and the produced results are reasonable for the selected test problems. However, this choice is not restrictive and does not characterise the presented methodology, as any modelling or metamodelling method can be used instead, should the situation demand it. Additionally, the fact that a convex set is used does not impose any restrictions on the PF geometry.

The idea that supports the proposed method is that, for continuous MOPs, it can be deduced from the Karush-Kuhn-Tucker optimality conditions that the PS is piecewise continuous in the decision variables, as previously noted by Y. Jin and Sendhoff (2003) and Zhang et al. (2008). This fact, combined with a reasonable approximation of the PS, can be used constructively to infer the mapping of the above-mentioned convex set to decision variables that produce Pareto optimal solutions.

The main contributions of this work can be summarised as follows:

A method, which we call Pareto estimation (PE), is presented. PE can be used to increase the number of Pareto optimal solutions, for two- and three-objective problems, from an approximation of the Pareto set. This can be useful in a situation where the evolutionary algorithm has not produced a solution close enough to the desired location on the PF. Furthermore, the Pareto estimation method does not necessitate any alteration to the optimisation algorithm that is used to produce the Pareto set and is dependent on it only insofar as the quality of the PS is concerned. Quality in this context depends on two factors: (1) the proximity of the PS approximation to the Pareto optimal set, and, (2) the availability of solutions across the entire PF.

The effectiveness of PE is validated using a set of test problems, commonly used in the MOEA community, for two and three objectives. It is shown that PE can produce more Pareto optimal solutions across the entire PF with a much lower cost compared to the alternative of restarting the optimisation using the same or an alternative algorithm to solve the MOP. Although this is not the main purpose of the PE method, it is a good test, since if it can produce more solutions on the entire PF, then it should be able to increase the number of solutions in specific regions as well. We believe that this is the main utility of PE.

The remainder of this paper is organised as follows. In Section 2, a general formulation of a multi-objective optimisation problem is given, followed by some fundamental definitions. Related concepts and motivating ideas for the proposed method are discussed in Section 3. In Section 4, the PE method is described for Pareto and decomposition-based algorithms. The method is tested against a set of multi-objective optimisation problems and these tests are reported in Section 5. Lastly, in Section 6 we discuss problems, potential solutions, and ideas related to the PE method, and in Section 7, this paper is summarised and concluded.

## 2 Problem Setting and Definitions

*objective functions*is

*k*,

*S*is the

*feasible region*for the

*decision*vectors , and is a scalar objective function, with . Additionally, in this work it is assumed that and that

*n*is the number of decision variables in the decision vector, . Also, depending on the definition of

*S*, Equation (1) can be a constrained MOP. For simplicity, only minimisation problems are considered; however, this does not limit the generality of the produced results since, due to the duality principle, maximisation of a scalar objective function

*f*is the same as minimisation of . An implicit assumption is that the scalar objective functions in Equation (1) are mutually competing and possibly non-commensurate.

_{i}Although a complete ordering can be defined for multi-objective problems, for instance, a lexicographic order, only a partial ordering can be defined without prior knowledge of the preferences of a DM. Namely, when comparing two decision vectors , it can so happen that their corresponding objective vectors are incomparable. In practice, this situation is resolved when a DM will select one solution over all others, thus inducing a form of complete ordering. However, this ordering is mostly subjective, even in the case that utility functions (Vira and Haimes, 1983) are used to ease the work of the DM. In the absence of a DM, a usual assumption is that the relative importance of the objectives, *f _{i}*, is unknown; hence, it is reasonable to obtain several non-comparable solutions. The problem of inducing partial ordering in Euclidean spaces was initially studied by Edgeworth (1881) and later further expanded by Pareto (1896). The relations introduced by Pareto are defined as follows for a minimisation problem.

A decision vector is said to **weakly dominate** a decision vector iff and , then .

A decision vector is said to **dominate** a decision vector iff , then .

A decision vector is said to be **Pareto optimal** if there is no other decision vector such that and .

Let , with and . If *S* is the feasible region, then the set *Z* is the feasible region in objective space. Given a set , the **non-dominated set**^{3} is defined as . If is the entire feasible region in the objective space, *Z*, then the set is called the **Pareto optimal set** or **Pareto front**. Any element is referred to as **objective vector**.

Also, the following definitions are used in this work in various contexts.

The **ideal objective vector**, , is the vector with elements (Miettinen, 1999, p. 16).

The **nadir objective vector**, , is the vector with elements , subject to *f _{i}* being elements of objective vectors in the Pareto optimal set (Miettinen, 1999, p. 16).

The **convex hull** of a set *C*, denoted as , is the set of all convex combinations of the elements in the set *C* (Boyd and Vandenberghe, 2004, p. 24). For convenience, we define to be the convex hull of the set , where is a vector of zeros with 1 on the *i ^{th}* position, that is,

The **extended convex hull** () of the set *C*, is the union of *CH _{I}* and the points in the affine space of the set

*C*produced by the projection of a Pareto optimal front, with ideal vector and nadir vector , onto the hypersurface of

*C*.

A multi-objective evolutionary algorithm (MOEA) will, in general, attempt to obtain a *representative* approximation of the Pareto optimal set. However, what is considered to be a representative approximation of the PS is not context independent, but metrics have been devised to measure the *quality* of the PS. For an excellent review of this topic, the reader is referred to Zitzler et al. (2003).

## 3 Related Work

Multi-objective evolutionary algorithms have had tremendous success in solving real-world problems. They have been applied in control systems (Fonseca and Fleming, 1998a, 1998b; Fleming and Purshouse, 2002), in economics and finance (Shoaf and Foster, 1998; Armananzas and Lozano, 2005; Subbu et al., 2005; Tapia and Coello, 2007) and aerospace (Ong et al., 2006; Goel et al., 2007). This can be attributed to the fact that evolutionary algorithms (EAs) perform well for a wide range of problems for which classical methods, such as convex optimisation (Boyd and Vandenberghe, 2004), are inapplicable. However, the robustness of EAs does not come for free. For example, in contrast to convex optimisation, there is no guarantee of global optimality for solutions produced by evolutionary algorithms. However, in practice there is strong evidence that very good approximations of Pareto optimal solutions are generated.

### 3.1 Metamodelling Methods in Multi-Objective Optimization

A single-objective function can potentially be computationally expensive, requiring hours or days even to evaluate, which constitutes a severe limitation. This issue is widely acknowledged in the MOEA community (Jones et al., 1998; Liang et al., 2000; Y. Jin et al., 2002; Knowles, 2006). A prevalent methodology employed by researchers to tackle this issue is the use of metamodelling methods in optimisation. The insight is that if a surrogate model of the actual objective function can be created with relatively few samples, then this surrogate model can be used instead of the objective function in the optimisation process.

Since the purpose of the surrogate model is to relieve the EA from evaluating an expensive objective function as much as possible, the primary selection criteria for a surrogate model are adapted accordingly. Namely, the suitability of a modelling method is judged according to: (1) the ease with which the model parameters can be identified; and (2) the cost of one evaluation of the surrogate model which must be much smaller than that of the actual objective function. Therefore, for a metamodelling method that satisfies the above criteria, a large number of objective function evaluations can be substituted with calls to the surrogate model, hence reducing the total cost of the optimisation. Another criterion that is critical in the success of the aforementioned procedure is the model precision. This is the case because, if the surrogate model cannot capture important features of the objective function, the search will be grossly misled, although caution should be exercised not to overcomplicate the surrogate model to a degree that its cost becomes comparable to the original objective function. In a way, a surrogate model can be viewed as a low-pass filter, hopefully separating the noise from the important features of the objective function, that is, its minima (or maxima). This is why such methods have been employed in noisy optimisation problems as well (Y. Jin and Branke, 2005).

The general approach, when substituting the real objective function with a surrogate model for use in an EA, has the following structure:

Sample the real objective function.

Use the obtained samples to create a surrogate model.

Use the surrogate model in the optimisation.

If the convergence criteria are met, stop; if not, go to

**Step 1**.

An illustration of this iterative procedure can be seen in Figure 1, where an ever more accurate mapping of the decision space, *S*, to the objective space *Z*, is created in every iteration, . This approach was initially limited to serial implementations (Farina, 2002; Emmerich et al., 2002); however, later advances in metamodelling-based EAs employed local models (Ong et al., 2003), thus reinstating a key strength of EAs, their potential to be executed in parallel.

The idea to employ surrogate models in lieu of the true model of a process can be traced back to Box and Wilson (1951) where they employed polynomial basis functions to create a model from data. This approach is commonly referred to in the literature as the response surface method (RSM). Other examples of modelling methods used in combination with an evolutionary algorithm are neural networks (Poloni et al., 2000; Y. Jin et al., 2002; Adra et al., 2009), Kriging or Gaussian processes, and generalised response surface methods (Y. Jin, 2005), as well as Bayesian regression (Ong et al., 2004).

Another recently introduced class of methods that employ metamodelling is presented by Hartikainen et al. (2012) and Haanpää (2012). Specifically, Haanpää (2012) proposed the interactive hyperbox exploration method (IHBE). Of course, this method requires a significantly greater degree of commitment on behalf of the DM as the aforementioned hyperbox is progressively refined. Nevertheless, this methodology does have the potential to reduce the computational cost if the corresponding hyperbox is relatively small when compared with the extent of the entire Pareto front.

### 3.2 Pareto Estimation Method: Motivation

In this work we bring forward and resolve, to some extent, a question that seems to be ignored by the literature.^{4} Namely, given an approximation of the Pareto front by any MOEA, is there a way to obtain solutions in specific parts of the PF that are not present in the given set and, if the answer is positive, how can this be achieved? This question stems from the fact that, if there was a way to obtain a Pareto optimal solution that adheres exactly to the DM’s preferences, then there would be no need to evaluate any other solutions as is usually the case for multi-objective optimisation algorithms. However, such algorithms inherit this strategy because there is no clear path in incorporating all preferences, since it is unknown whether they are in fact reasonable, that is to say, if such a solution exists at all. Additionally, there is no clear way to obtain a specific solution with accuracy. The remainder of this paper considers this question and an answer is presented.

*G*, returns the corresponding Pareto optimal decision vector if a Pareto optimal solution, , is used and 0 otherwise. Obviously, such a function would be of limited use if the analyst had no information about the shape of the Pareto front as well as its location. Thus, the function,

*G*, is a special indicator function whose domain of definition is the Pareto optimal set, , and whose range is the Pareto optimal decision vectors, . Therefore, given such a function and the information about the exact location of the Pareto front, it would be simply a matter of evaluating Equation (2) in order to obtain the decision vector that would result in a Pareto optimal solution. Such a description of the Pareto front geometry can be given by a parametric or non-parametric model if the problem has already been successfully solved. A potential issue with such an approach is that a different description of the PF will be required for different problems. Although this seems troubling, there is nothing to preclude the existence of a function with a

*convenient*domain of definition that would map to the Pareto front of any given problem. Naturally, such a function must depend on, and adapt to, the Pareto optimal set or some approximation of it, and hopefully a procedure can be found to map the former to the latter. Strictly speaking, such a function would perform the following task, Additionally, it would be even more convenient if the mapping, , were predictable in the sense that, for a given , the resulting is not very hard to predict, as this would ease the complexity of using the function in Equation (3). A natural candidate for such a task would be an affine function.

The final piece of this puzzle lies in the domain of definition of the function described in Equation (3). The requirements on such a domain would be: (1) that points within the domain of definition of the function, , should be easy to obtain; and (2) that any convex combination of the points in the set must still be in the set, that is to say, the set must be convex. By adhering to these requirements, and if relations similar to Equations (2) and (3) can be identified, then by the following procedure, a Pareto optimal solution can be obtained at any desired location on the PF.

Choose a vector, , that will produce the desired vector, . This is verified by Equation (3). If the resulting is not the intended one, it will be sufficient to change a little. In this step we exploit the

*predictability*of the mapping, .Use the obtained vector, , in Equation (2) to obtain the corresponding decision vector, .

Evaluate the actual objective function,

*F*, using the obtained vector, , to verify that . Strictly speaking, if the mappings described in Equations (2) and (3) are exact, this step is redundant.

If such a procedure were available in practice, there would be a way to obtain the decision vector that satisfies the requirements of the DM exactly instead of repeatedly solving a multi-objective optimisation problem in the hope of obtaining a solution that closely satisfies the aforementioned requirements.

## 4 Pareto Estimation Method

### 4.1 Overview

The question posed in Section 3.2 is interesting because, depending on how well it can be answered, the information that is in the analyst’s possession increases dramatically, thus allowing the analyst to cater to more specific requests from the DM. This is so because given absolute knowledge of the aforementioned functions, Equations (2) and (3), a multi-objective problem is virtually solved as any solution on the Pareto front can, theoretically, be obtained at very small additional expense with high precision. Although to obtain the entire Pareto optimal set may be infeasible in practice, this is the predominant definition of what it means to solve a multi-objective optimisation problem (Miettinen, 1999, p. 61).

^{5}their corresponding decision variables , can be identified using a modelling method (Y. Jin, 2005). A theoretical argument based on the Karush-Kuhn-Tucker (KKT) optimality conditions, which further fosters the idea that the mapping in Equation (4) should be identifiable, was proposed by Zhang et al. (2008); this is further supported by the observations of Schütze et al. (2003) and Y. Jin and Sendhoff (2003). Zhang et al. (2008) state that, for continuous multi-objective problems, the Pareto optimal set is piecewise continuous in decision space. This point is revisited in Section 6. In the present work, a radial basis function neural network is used for this purpose since it is both robust and accurate for a wide set of problems (R. Jin et al., 2001). The structure and further details regarding the way that this type of neural network is employed in this work are discussed in Section 4.2.

*G*. For example, assume that we have the exact mapping for a multi-objective problem, with the only restriction being that we provide the exact coordinates of Pareto optimal points. In order to be able to provide this information, we are required to know exactly the shape of the PF, meaning a mathematical description of the PF hypersurface must be available for all potential problems. If such information is available for the given problem, then all decision variables corresponding to the PF could be obtained using . This point becomes clearer if we view the mapping as the inverse of the objective function , which leads to Even if the function is not one-to-one, a mapping can still be obtained, but can no longer be called the inverse image of ; however, for practical purposes its function would be the same. Therefore it is relatively safe to ignore, for the moment, the fact that the objective function may be many-to-one; this issue is further discussed in Section 6.

*EH*; for problems with two objectives, this is the same as

_{I}*CH*. Subsequently, the mapping, is identified using a RBFNN as shown in Figure 1. This model, in essence, subsumes the composition of the mapping and in Equation (6).

_{I} effectively takes a set of vectors in , , and creates its corresponding set in *EH _{I}*, . For two dimensions, vectors in will be part of the convex set

*CH*and this set will be identical to

_{I}*EH*, see Figure 2. For more than two dimensions, both

_{I}*EH*and

_{I}*CH*are still convex sets but a more elaborate procedure will be required to obtain points on the

_{I}*EH*due to its geometry, see Figure 3.

_{I}For example, consider a concave Pareto front such as the one shown in Figure 3. This front is the first octant of a sphere centred at the origin with radius . If we apply the transform to this Pareto optimal set, the resulting set will be on the union of the striped areas in Figure 3, that is, *EH _{I}*. The part of in

*CH*is the set within the triangle with vertices , , and . The remaining points in are part of ,

_{I}^{6}and, since the edges of the

*EH*set are curved, it is no longer straightforward to generate points within this set that are evenly distributed. Therefore, the desired property of the function, , discussed in Section 3.2, that is, the ability to easily generate points within its domain, would be restricted. A partial solution to this is simply to bound the domain of definitions of the mapping to the

_{I}*CH*artificially. This would maintain the aforementioned desirable property but such a restriction would limit the method in producing solutions that their projection is within the

_{I}*CH*. The solutions in correspond to extreme Pareto optimal points which are, potentially, of low interest (Das and Dennis, 1996). However, if this assumption is not true and the DM requires solutions within these regions, the method described in Giagkiozis and Fleming (2012) could be employed to obtain estimates from the PE method. This can be achieved as the entire set, , is used in the model creation process (see Section 4.2).

_{I}Finally, to generate the estimated Pareto optimal solutions, a set of evenly spaced convex combinations of the set is created, and the resulting set is called . Subsequently, this set can be used as an input to a model of . The resulting decision vectors may then be used in the objective function to verify that they correspond to Pareto optimal objective vectors. An alternative is to create for a specific region of interest in the PF, for example, using points that are within the .

### 4.2 Radial Basis Function Neural Networks

Before we delve into a detailed description of how the proposed method can be applied to decomposition and Pareto-based multi-objective optimisation algorithms, we first explore the technique used to model the mapping.

Neural networks, or more precisely artificial neural networks,^{7} are widely used in an array of different disciplines (Maier and Dandy, 2000; Atiya, 2001; Wong and Selvi, 1998). They are well known for their universal approximator property (Hornik et al., 1989). Furthermore, a subclass of NNs, namely radial basis function neural networks (RBFNNs), have been shown to be robust and accurate predictors when compared to Kriging, multivariate adaptive splines, and polynomial regression methods (R. Jin et al., 2001). RBFNNs have a single hidden layer and an output layer. Their output layer is often composed of linear functions since this guarantees a unique solution to their weights, *w* (Bishop, 1995), without the need to resort to the renowned back-propagation algorithm (Rumelhart et al., 1986).

*active*. Therefore, Equation (9) becomes Arguably, this is the simplest way to choose the parameters of the basis functions and was used to retain focus on the proposed methodology. For a more elaborate and comprehensive methodology on selection of the parameters of RBFNNs, the reader is referred to Billings et al. (2007).

To validate the created neural network, -cross validation was used, as suggested in Jones et al. (1998). Namely, for a Pareto set of size *N*, *N* NNs were created using samples for the training set and the remaining sample was used to estimate the generalisation error. This procedure is repeated until all the solutions in the Pareto set have been used as a test sample and then the mean square error is calculated. After estimating the NN expected generalisation error using cross validation, the final NN is generated using the entire Pareto set.

### 4.3 Pareto Dominance-Based Algorithms

The method described in Section 4.1 introduced the general procedure of the proposed technique; however, certain details were abstracted. Optimization algorithms based on Pareto dominance for fitness assignment have several control parameters. One of these parameters is the size of the population to be used in the optimisation process. This parameter effectively provides an upper bound on the resulting number of Pareto optimal solutions in the final set . One requirement, for the methodology to function correctly for the entire PF, is that there be a sufficient number of non-dominated solutions in the final population. An additional requirement, that is evident from experiments, is that the non-dominated set produced by the algorithm is well spread across the PF, that is, the solutions are diverse and the mean distance from their neighbours has small variance. This simply states that the performance of the proposed method is dependent on the performance of the algorithm used to solve the MOP.

*f*is the objective function and its corresponding values for different solutions are found in the column of . Note that since the produced Pareto set approximation has finite size, the and operators are replaced by the and operators, which return the minimum and maximum element of a set, respectively. Next, the normalised set is projected onto the hyperplane

_{i}*E*defined by , where is a vector of zeros with a one in the position. This is achieved by initially projecting onto the subspace

^{8}parallel to

*E*and then shifting the result by , where is the unit matrix. To obtain the projection matrix, linearly independent vectors in the

*E*plane are required. These vectors are obtained in the following way: where

*H*is a matrix. Subsequently, the projection matrix,

*P*, is obtained by where

_{E}*P*is a matrix with rank . The transformed Pareto set is Finally, the neural network, used to identify the mapping , is created as described in Section 4.2, using and as the training inputs and outputs, respectively.

_{E}Once the neural network is trained, it can be used to create additional solutions for a new set of convex combinations, . However, this set has to be generated by the DM according to his or her preference in a particular region of the PF; alternately, a more densely and evenly spaced convex set spanning the entire PF could be created. The first option is likely to be preferred when the cost of evaluating the objective function is considerable or there is a clear preference toward a particular region of the PF.

The described procedure is summarised as follows.

Obtain the non-dominated individuals from the final population of a Pareto-based MOEA, , and its corresponding decision variables .

Normalise according to Equation (7).

Project the normalised onto the -hyperplane going through according to Equations (14), (15), and (16), to produce . For two objectives this is the line through and , and for three objectives, it is the plane through , , and .

Identify the mapping, , using and as inputs and outputs, respectively, to train a RBFNN as described in Section 4.2.

Create the set, ; in this work, this is a set of evenly spaced convex vectors.

Use the set as inputs to the RBFNN (which was created in Step 4) to obtain estimates of decision vectors, .

Verify the produced solutions, , using the objective function to establish their optimality.

### 4.4 Decomposition-Based Algorithms

*N*evenly distributed weighting vectors,

*N*is the population size, and is the scalar objective function. The operator denotes the Hadamard product, which is element-wise multiplication of vectors or matrices. The intuition behind this is that, since is a continuous function of (Zhang and Li, 2007),

*N*evenly distributed weighting vectors are assumed to produce a well distributed set of Pareto optimal solutions.

Consequently, since decomposition-based algorithms already have a set of convex combinations, namely the weighting vectors , and the correspondence of weighting vectors to objective vectors is clear, the set can be substituted with the weighting vectors that produce Pareto optimal solutions. This has the potential to greatly simplify the described procedure in Section 4.3. However, although this simplifies the algorithm, the choice of input vectors, , by the DM is more difficult because of its indirect nature compared to the general method described in Section 4.3, and this problem becomes increasingly more difficult for an increased number of objectives.

Obtain the weighting vectors, , corresponding to non-dominated solutions.

Identify the mapping, , using and as inputs and outputs, respectively, to train a RBFNN.

Generate a new set of weighting vectors in the PF region of interest, or using one of the methods discussed so far.

Use the set, , as inputs to the neural network created in Step 2, to obtain estimates of decision vectors .

Verify that the produced solutions, , are Pareto optimal.

## 5 Experiment Results

To test the merits of the proposed method, the Pareto-based algorithm was chosen to be NSGA-II (Deb et al., 2002a) and the decomposition-based algorithm was chosen to be MOEA/D (Zhang and Li, 2007). The algorithms were run times, using a different seed for the random number generator on every run, for six MOPs with two and three objectives. The population size used for both algorithms was set to for the three-objective problems and to for the three-objective problems, as these values are commonly employed in benchmarks (Zhang and Li, 2007). We used the Kruskal-Wallis test at the confidence level to verify whether the mean performance of the produced sets is different according to the selected metrics defined by Equations (20) and (22). Additionally, the algorithms were allowed to run for generations for the WFG problems and for generations for the DTLZ problems. The DTLZ test problems are DTLZ1 and DTLZ2 for two and three objectives. A definition of the DTLZ1-2 test problems is provided in Deb et al. (2002b) for decision variables. Additionally, the test problems WFG2-3 and WFG6-7 from the WFG toolkit (Huband et al., 2006) were used. The parameter settings used for these test problems can be seen in Table 1. The parameters, *k* and *l*, in Table 1, are the position and distance related parameters, respectively. This particular collection of test problems was chosen with several considerations in mind. First, the problem set had to be broadly used and recognised by the MOEA community. Second, the problems should be challenging and diverse. It is our belief that these aims are accomplished by this particular problem set. It is hoped that future research will provide further validation of the proposed methodology through experiments on more test problems as well as real-world problems. More specifically, DTLZ1 and DTLZ2 (Deb et al., 2002b) have been used in numerous studies (Hadka and Reed, 2012; Purshouse et al., 2011; Zhang and Li, 2007) and this is also true for the WFG toolkit (Hadka and Reed, 2012; Purshouse et al., 2011). Furthermore, each of these problems poses a different challenge. For instance, WFG2 has a discontinuous Pareto front and is non-separable. WFG3 is also non-separable and its Pareto front is linear for two dimensions and degenerate for three or more dimensions. WFG6 has a concave Pareto front and is non-separable and unimodal; and, lastly, WFG7 is separable with a concave Pareto front and has parameter-dependent bias (Huband et al., 2006). The parameter settings for the two algorithms were chosen in a similar fashion.

. | Two-objective . | Three-objective . | |||||||
---|---|---|---|---|---|---|---|---|---|

. | . | instances . | instances . | ||||||

. | generations . | N
. | n
. | k
. | l
. | N
. | n
. | k
. | l
. |

WFG | 300 | 101 | 24 | 4 | 20 | 276 | 24 | 4 | 20 |

DTLZ1 | 500 | 101 | 10 | — | — | 276 | 10 | — | — |

DTLZ2 | 500 | 101 | 10 | — | — | 276 | 10 | — | — |

. | Two-objective . | Three-objective . | |||||||
---|---|---|---|---|---|---|---|---|---|

. | . | instances . | instances . | ||||||

. | generations . | N
. | n
. | k
. | l
. | N
. | n
. | k
. | l
. |

WFG | 300 | 101 | 24 | 4 | 20 | 276 | 24 | 4 | 20 |

DTLZ1 | 500 | 101 | 10 | — | — | 276 | 10 | — | — |

DTLZ2 | 500 | 101 | 10 | — | — | 276 | 10 | — | — |

The hypothesis of this paper is that by using the Pareto estimation methodology, the number of Pareto optimal solutions available to the DM can be increased significantly, and despite the fact that on many test instances the estimated Pareto set actually turns out to be superior to the initial set, this is not the intended purpose of the method and can be treated as a positive side effect. Our main focus is the relative quality of the Pareto set, produced by MOEA/D and NSGA-II, before and after the application of the proposed method, and not the performance of the employed algorithms in absolute terms. For performance assessment purposes, the ratio of the following indices was used.

- Inverted generational distance (IGD), introduced in Van Veldhuizen (1999), where is the cardinality of the set and
*A*is an approximation of the PF. The IGD metric measures the distance of the elements in the set*A*from the nearest point of the actual PF. The ratio of this metric was used as where*B*is another PF approximation set. In this work*B*is the estimated PF using the Pareto estimation methodology.

### 5.1 Pareto Dominance-Based Algorithms

For every run of NSGA-II, the proposed method was applied using an evenly spaced convex set of size times greater than the initial population used in the optimisation algorithm. Figure 4 shows the experimental result of the Pareto estimation method. The set was used as the input to the identified mapping , resulting in the estimated decision vectors . Subsequently, was used with the objective function generating the objective vectors . Figure 5 shows the number of valid solutions produced by the RBFNN method.

Specifically, for the two-objective test problems, the size of the set was set to and for the three-objective problems, the size of the set was set to . The original Pareto optimal solutions used in the estimation process (cf. the corresponding estimates ) are shown in Figure 6 (cf. Figure 7) and Figure 8 (cf. Figure 9). It should also be noted, as is perhaps apparent from the figures, that the entire estimated population is presented and not a non-dominated subset. The same procedure was performed for all runs of NSGA-II for all test problems for two and three objectives and the results are summarised in Tables 2–^{3}^{4}5; their non-parametric counterparts are presented in Figure 4. Furthermore, the number of valid solutions produced by the RBFNN, the number of Pareto optimal solutions, and the RBFNN estimated generalization error using cross validation (see Section 4.2), are presented in Figure 5. Note that in Tables 2–^{3}^{4}5 the leftmost four columns correspond to the results referred to in this section and the remaining four columns correspond to the results obtained using the Pareto set approximation from MOEA/D. Comments on the latter are presented in the next section; we have combined the information in the tables in order to preserve space and to ease comparison.

. | NSGA2 . | MOEA/D . | ||||||
---|---|---|---|---|---|---|---|---|

Problem | Mean | SD | Mean | SD | Mean | SD | Mean | SD |

WFG2 | 1.0370 | 0.0174 | 2.7844 | 0.2177 | 1.0341 | 0.0742 | 6.1070 | 1.6286 |

WFG3 | 1.0589 | 0.0046 | 7.8959 | 0.2286 | 1.0401 | 0.0216 | 9.9152 | 0.1197 |

WFG6 | 0.7504 | 0.2730 | 7.0383 | 0.6354 | 1.0288 | 0.0514 | 9.5327 | 0.5143 |

WFG7 | 2.2765 | 0.5875 | 7.6369 | 0.3695 | 2.3724 | 2.5573 | 8.7620 | 0.3967 |

DTLZ1 | 4.0822 | 8.6262 | 7.9676 | 0.2790 | 1.0000 | 0.0000 | 9.9932 | 0.0049 |

DTLZ2 | 12.3377 | 0.3990 | 8.0198 | 0.3061 | 9.8542 | 0.6573 | 9.8454 | 0.0062 |

. | NSGA2 . | MOEA/D . | ||||||
---|---|---|---|---|---|---|---|---|

Problem | Mean | SD | Mean | SD | Mean | SD | Mean | SD |

WFG2 | 1.0370 | 0.0174 | 2.7844 | 0.2177 | 1.0341 | 0.0742 | 6.1070 | 1.6286 |

WFG3 | 1.0589 | 0.0046 | 7.8959 | 0.2286 | 1.0401 | 0.0216 | 9.9152 | 0.1197 |

WFG6 | 0.7504 | 0.2730 | 7.0383 | 0.6354 | 1.0288 | 0.0514 | 9.5327 | 0.5143 |

WFG7 | 2.2765 | 0.5875 | 7.6369 | 0.3695 | 2.3724 | 2.5573 | 8.7620 | 0.3967 |

DTLZ1 | 4.0822 | 8.6262 | 7.9676 | 0.2790 | 1.0000 | 0.0000 | 9.9932 | 0.0049 |

DTLZ2 | 12.3377 | 0.3990 | 8.0198 | 0.3061 | 9.8542 | 0.6573 | 9.8454 | 0.0062 |

. | NSGA2 . | MOEA/D . | ||||||
---|---|---|---|---|---|---|---|---|

Problem | Mean | SD | Mean | SD | Mean | SD | Mean | SD |

WFG2 | 0.6789 | 0.0153 | 0.3154 | 0.0756 | 0.6101 | 0.0787 | 0.1174 | 0.1405 |

WFG3 | 0.0253 | 0.0096 | 0.6306 | 0.0761 | 0.0003 | 0.0009 | 0.7430 | 0.0591 |

WFG6 | 0.6046 | 0.3884 | 0.1778 | 0.2115 | 0.0323 | 0.1474 | 0.1289 | 0.0745 |

WFG7 | 0.0726 | 0.0185 | 0.4150 | 0.0551 | 0.0001 | 0.0003 | 0.0362 | 0.0209 |

DTLZ1 | 0.0192 | 0.0139 | 0.3222 | 0.1814 | 0.0441 | 0.1414 | 0.2875 | 0.3366 |

DTLZ2 | 0.0122 | 0.0020 | 0.6108 | 0.0494 | 0.0000 | 0.0000 | 0.0196 | 0.0014 |

. | NSGA2 . | MOEA/D . | ||||||
---|---|---|---|---|---|---|---|---|

Problem | Mean | SD | Mean | SD | Mean | SD | Mean | SD |

WFG2 | 0.6789 | 0.0153 | 0.3154 | 0.0756 | 0.6101 | 0.0787 | 0.1174 | 0.1405 |

WFG3 | 0.0253 | 0.0096 | 0.6306 | 0.0761 | 0.0003 | 0.0009 | 0.7430 | 0.0591 |

WFG6 | 0.6046 | 0.3884 | 0.1778 | 0.2115 | 0.0323 | 0.1474 | 0.1289 | 0.0745 |

WFG7 | 0.0726 | 0.0185 | 0.4150 | 0.0551 | 0.0001 | 0.0003 | 0.0362 | 0.0209 |

DTLZ1 | 0.0192 | 0.0139 | 0.3222 | 0.1814 | 0.0441 | 0.1414 | 0.2875 | 0.3366 |

DTLZ2 | 0.0122 | 0.0020 | 0.6108 | 0.0494 | 0.0000 | 0.0000 | 0.0196 | 0.0014 |

. | NSGA2 . | MOEA/D . | ||||||
---|---|---|---|---|---|---|---|---|

Problem | Mean | SD | Mean | SD | Mean | SD | Mean | SD |

WFG2 | 2.8047 | 0.4878 | 3.1172 | 0.2306 | 8.8543 | 0.8840 | 2.3451 | 0.1331 |

WFG3 | 0.8013 | 0.1292 | 0.7331 | 0.1036 | 0.4957 | 0.0709 | 1.8425 | 0.1406 |

WFG6 | 0.8790 | 0.1679 | 4.2835 | 0.7461 | 1.4326 | 0.8999 | 4.3185 | 1.2262 |

WFG7 | 1.1950 | 0.0720 | 3.7008 | 0.1752 | 3.5314 | 0.4975 | 4.9538 | 0.0914 |

DTLZ1 | 6.6262 | 1.2512 | 3.0811 | 0.1057 | 1.0983 | 0.6850 | 5.4396 | 0.5015 |

DTLZ2 | 1.2655 | 0.0587 | 3.8767 | 0.1289 | 5.3924 | 0.5535 | 4.3850 | 0.2362 |

. | NSGA2 . | MOEA/D . | ||||||
---|---|---|---|---|---|---|---|---|

Problem | Mean | SD | Mean | SD | Mean | SD | Mean | SD |

WFG2 | 2.8047 | 0.4878 | 3.1172 | 0.2306 | 8.8543 | 0.8840 | 2.3451 | 0.1331 |

WFG3 | 0.8013 | 0.1292 | 0.7331 | 0.1036 | 0.4957 | 0.0709 | 1.8425 | 0.1406 |

WFG6 | 0.8790 | 0.1679 | 4.2835 | 0.7461 | 1.4326 | 0.8999 | 4.3185 | 1.2262 |

WFG7 | 1.1950 | 0.0720 | 3.7008 | 0.1752 | 3.5314 | 0.4975 | 4.9538 | 0.0914 |

DTLZ1 | 6.6262 | 1.2512 | 3.0811 | 0.1057 | 1.0983 | 0.6850 | 5.4396 | 0.5015 |

DTLZ2 | 1.2655 | 0.0587 | 3.8767 | 0.1289 | 5.3924 | 0.5535 | 4.3850 | 0.2362 |

. | NSGA2 . | MOEA/D . | ||||||
---|---|---|---|---|---|---|---|---|

Problem | Mean | SD | Mean | SD | Mean | SD | Mean | SD |

WFG2 | 0.5753 | 0.0197 | 0.6342 | 0.0348 | 0.6845 | 0.0237 | 0.0000 | 0.0000 |

WFG3 | 0.3773 | 0.1153 | 0.1739 | 0.0262 | 0.8238 | 0.0139 | 0.1851 | 0.0159 |

WFG6 | 0.2137 | 0.1768 | 0.2417 | 0.1887 | 0.4265 | 0.4114 | 0.1303 | 0.1999 |

WFG7 | 0.0035 | 0.0019 | 0.5360 | 0.0280 | 0.0001 | 0.0001 | 0.0031 | 0.0031 |

DTLZ1 | 0.0003 | 0.0004 | 0.0463 | 0.0468 | 0.0000 | 0.0000 | 0.0005 | 0.0033 |

DTLZ2 | 0.0027 | 0.0012 | 0.4952 | 0.0264 | 0.0000 | 0.0000 | 0.2030 | 0.0316 |

. | NSGA2 . | MOEA/D . | ||||||
---|---|---|---|---|---|---|---|---|

Problem | Mean | SD | Mean | SD | Mean | SD | Mean | SD |

WFG2 | 0.5753 | 0.0197 | 0.6342 | 0.0348 | 0.6845 | 0.0237 | 0.0000 | 0.0000 |

WFG3 | 0.3773 | 0.1153 | 0.1739 | 0.0262 | 0.8238 | 0.0139 | 0.1851 | 0.0159 |

WFG6 | 0.2137 | 0.1768 | 0.2417 | 0.1887 | 0.4265 | 0.4114 | 0.1303 | 0.1999 |

WFG7 | 0.0035 | 0.0019 | 0.5360 | 0.0280 | 0.0001 | 0.0001 | 0.0031 | 0.0031 |

DTLZ1 | 0.0003 | 0.0004 | 0.0463 | 0.0468 | 0.0000 | 0.0000 | 0.0005 | 0.0033 |

DTLZ2 | 0.0027 | 0.0012 | 0.4952 | 0.0264 | 0.0000 | 0.0000 | 0.2030 | 0.0316 |

Table 2 presents the ratio of the IGD index and the mean distance to the nearest neighbour for problems with two objectives. The IGD index, in principle, attains smaller values the closer the set under testing is to the known PF. Additionally, if the set does not cover certain regions of the PF, this will cause the value of the IGD index to increase, signifying a degraded performance. Therefore, for this problem set, the proposed methodology is consistent in producing solutions that are at least of the same distance from the actual PF. Values of mean that the set produces better values for the IGD index compared to the original set , and for the converse is true. Regarding the mean nearest neighbour distance ratio , values of mean that the mean distance from a solution to its nearest neighbour is smaller in compared to , and for the converse is true. In all cases, the mean distance of neighbouring solutions in is much smaller. This fact, combined with the results for , strongly indicates that the density of the available Pareto optimal solutions has significantly increased using our proposed method.

In Table 3, the results for the C-metric are given for and for the two-objective test problems. This metric was employed to further verify the consistency of the method. As can be seen for all problems, except WFG2 and WFG6, the results are favourable. However, it is interesting to explore the potential reasons for the less impressive performance in these two problems. Regarding WFG2, since we did not use only the non-dominated subset of , the identified PF is, as can be seen in Figure 7, an oscillating function; this is exactly the PF directly obtained from the WFG2 problem. Therefore, in a way, the method did actually perform rather well in identifying the front. A remedy to avoid such a behaviour would be that the requested solutions, , are reasonably close to the transformed set, , of the original Pareto optimal solutions, . More elaborate methods are left for future research. Regarding the test problem WFG6, combined with the same moderate results in Table 2, it seems that our methodology has consistent difficulties with this particular problem instance. A potential cause for these difficulties is perhaps the simplicity of the modelling technique.

Table 4 presents and indices for the three-objective case. Again, has acceptable values, meaning that there is no significant sign of degradation of the IGD index. shows that the mean neighbour distance is consistently lower for . One notable feature for the values of is that they are almost half of their counterparts for the two-objective case, as can be seen in Table 2. This can partly be attributed to the *curse of dimensionality*, in the sense that to obtain similar results to Table 2, approximately order of solutions more have to be produced compared to the two-objective case. This is not the case for the three-objective instances of WFG2 and WFG3, which is mainly due to their PF geometry.

In Table 5, the results for the C-metric are given for and for the three-objective problems. Again, the results are consistent, with WFG2 performing rather moderately for the same reasons as for the two-objective case. The surprising fact is that for the three-objective case, WFG6 performs extremely well.

### 5.2 Decomposition-Based Algorithms

The same experiment procedure as in Section 5.1 is applied for the decomposition-based version of the MOEA. As previously mentioned, for this test case MOEA/D (Zhang and Li, 2007) was used with the same population size as NSGA-II. Instead of a set, , an evenly distributed set of weighting vectors, , was used, as described in Section 4.4. In all other respects, the experiment setup is identical. As before, the entire estimated population is presented and used for the calculation of the statistical results. The results are summarised in Figures 4 and 5.

Table 2 presents the ratio of the IGD index and the mean distance to the nearest neighbour for problems with two objectives. A distinctive pattern, when compared with the corresponding values in Table 2, is that when the index is very close to 1, the mean value for is very close to ; this is almost equal to the scaling factor we chose to increase the size of the set relative to the initial set . One possible reason for this behaviour, which, no doubt, is desirable, is that the solutions produced by MOEA/D are very well distributed across the PF. If we view the two-dimensional PF as a function, the fact that solutions are well distributed can be seen as sampling the function at regular intervals, hence their mean distance has low variance. This enables the modelling technique we used to better estimate the mapping, since a uniform value was chosen for all the basis functions, as discussed in Section 4.2. Another interesting fact is that, although the minimum value of for the problem WFG2 is less than 1, the mean value is and the standard deviation is relatively small. This indicates that, in general, the performance of our method is producing good results with low deviations, for this problem instance.

In Table 3 and Table 5, the results for the C-metric are given for and for the two-objective and three-objective test problems, respectively. The results are very consistent for all problems except WFG2, which is to be expected due to the shape of its PF. is very close to 0, signifying that a very small number of the solutions in are dominated by solutions in the original Pareto set .

Table 4 presents and indices for the three-objective case. In line with the results in Table 2, the index is satisfactory, although, for problems WFG3 and WFG6, it seems to be somewhat low. This is also reflected, to a certain extent, in the index. This behaviour, regarding problem WFG3, can be attributed to the fact that the real PF was not successfully identified by the algorithm, which for WFG3 is a line in three-dimensions. This conclusion is further supported by the fact that the corresponding values for in Table 5 are very close to 0, attesting to the fact that the produced estimated Pareto set, , does in fact model the given set rather well. Therefore, this behaviour could be remedied by choosing the non-dominated solutions in the set . However, for our purposes, this option was avoided, since this would tend to mask such deficiencies, thus disallowing further insight for possible improvements of the proposed methodology.

## 6 Discussion

This study has shown that the question posed in Section 3.2 can be answered with relative precision, as is strongly indicated by the results for the selected two- and three-objective test problems, shown in Figures 4 and 5. However, the Pareto estimation method is not without difficulties. For instance, since the quality of the produced solutions depends on the employed modelling method, which in turn depends on the quality of the produced Pareto set approximation, it is to be expected that when both these factors are satisfied to a greater degree, better results will follow. This is related to the observation in Y. Jin and Sendhoff (2003) about the connectedness of the Pareto optimal set in decision space for continuous multi-objective problems. Namely, if the Pareto set approximation is not *close* to the true Pareto set, this argument need not necessarily hold. For instance, such a Pareto set approximation need not necessarily be piecewise continuous in decision space, as the KKT conditions would not hold for the aforementioned PS. Furthermore, although Pareto estimation is, in principle, applicable to problems with more than three objectives, the visual component of the mapping is lost, hence its predictability is limited. It is unclear how to create a mapping in a high-dimensional space that is intuitive for the DM. A solution to this is presented by the use of the variant presented in Section 4.4. In this case the DM can express his or her preferences by specifying the relative importance of the objectives. Additionally, in the case that a Pareto-based algorithm is used, then generalised decomposition, a method introduced by Giagkiozis et al. (2013), can be employed to calculate the weighting vectors after which the procedure is identical to the one described in Section 4.4.

As mentioned in Section 3.1, there are many alternative methods for identifying the mappings used in the Pareto estimation method. However, since the cost of more elaborate methods renders them prohibitive for repetitive testing, it is difficult to quantify the benefits in using more sophisticated identification methods and even more difficult to discern whether the results are due to the affinity of the modelling method to the particular problem set. When applying the Pareto estimation method to a specific real-world problem, the analyst has several options on how to proceed to identify the required mappings used in PE. In an excellent work, Billings et al. (2007) address modelling issues and propose a comprehensive approach based on neural networks, wherein the entire procedure is systematised for producing high-quality models. It should be noted that, based on the results obtained in this work, the radial basis function neural network proposed in Section 4.2 has demonstrated more than acceptable performance, given the small amount of data that are usually available in a Pareto set approximation, and therefore it is an excellent starting point.

Another aspect that has become evident, especially when comparing the results produced using the Pareto sets produced by NSGA-II and MOEA/D, is that the distribution of the Pareto optimal solutions on the Pareto front, disregarding their convergence, seems to be an important factor determining the quality of the model. So, it would appear that if some active learning method, as in Cohn et al. (1994), could be used, the results could potentially be improved. However, the problem of direct control of the distribution of Pareto optimal points in the PS is a very difficult one.

Lastly, the modelling employed in the Pareto estimation method operates under the assumption that the mapping from objective to decision space is one-to-one, which seems to be limiting if, in fact, the objective function, , is many-to-one. However, careful consideration of this issue shows that this is not limiting to the Pareto estimation method; on the contrary, it can be rather helpful.

When a many-to-one objective function is mapped from the objective space to the decision space, for every objective vector there are one or more decision vectors to be found. This means that the probability of finding one decision vector for a specific objective vector is increased, which is to the benefit of the modelling method, as there are many alternatives. Also, given the way multi-objective evolutionary algorithms operate, that is, they distribute Pareto optimal solutions across the entire Pareto front, this one-to-many relationship would be impossible to discern as MOEAs are not expressly designed to preserve solutions that result in identical objective vectors. Therefore, it would be highly unlikely for a Pareto set approximation to have such alternatives, as this conflicts with MOEA objectives.

## 7 Conclusion

Multi-objective optimisation problem solvers seek to generate a satisfactory set of Pareto optimal solutions to enable a decision maker to select a suitable solution. Here, a novel methodology that increases the density of available Pareto optimal solutions has been described. Using this method, the number of available solutions along the trade-off surface is increased, thereby greatly enhancing the ability of the DM to identify a suitable solution with accuracy.

This is accomplished by identifying the mapping of a transformed set, derived from an approximation of the Pareto optimal set, to the corresponding decision vectors. This mapping was identified with the aid of a radial basis function neural network which was subsequently used to infer a number of Pareto optimal solutions, . The proposed method was presented in two forms. The first is a general formulation that is widely applicable to any multi-objective optimisation algorithm. This formulation was applied to a Pareto-based algorithm, NSGA-II, exhibiting almost perfect correlation between the size of the requested additional solutions and the obtained Pareto optimal solutions. The second form of the proposed method applies to decomposition-based algorithms. This form is motivated by the fact that by using the weighting vectors in place of we avoid the operations required to generate that set. Both versions of the proposed method were experimentally tested against a set of well-known test problems and the results strongly indicate that the suggested methodology shows great promise.

Furthermore, the results in Section 5.1 and Section 5.2 suggest that the choice of weighting vectors in MOEA/D is not optimal, that is, an even distribution of Pareto optimal points is not produced by the algorithm. By even distribution, we mean that the Pareto optimal points have a distribution that minimises the *s*-energy which has been shown to solve the best packing problem for a sphere (see Katanforoush and Shahshahani, 2003; Hardin and Saff, 2004, for further detail). This effect is transferred to the results of the proposed method that used an approximation of the PF produced by MOEA/D. In contrast with MOEA/D, the Pareto-based method produced much more uniform results. However, there are obvious edge effects that are explained by the fact that we generate solutions only within the *CH _{I}* (see Section 4.1 and Figure 3).

Finally, although the concepts presented in this work need to be further developed, we believe that they can alter the definition of what we currently consider to be a well-distributed approximation of the PF. This is primarily due to the fact that, if an inverse mapping can be identified, then the main issue becomes that of the *optimal* allocation of Pareto optimal solutions on the PF, such that the process of identifying a suitable solution is facilitated. By optimal allocation, we mean an approximation set of the PF that provides the most information about the underlying PF. This, still unknown, distribution need not necessarily be an even distribution of Pareto optimal solutions. This issue is deferred to future research along with the exploration of the applicability of the presented method for many-objective optimisation problems. An application of Pareto estimation on a three-objective portfolio optimisation problem is described in Giagkiozis and Fleming (2012). The Source code for this work is available.^{9}

## Acknowledgments

The authors would like to thank Ricardo H. C. Takahashi for most interesting discussions and his invaluable perspective with respect to the present work, during his visit to the University of Sheffield, while supported by a Marie Curie International Research Staff Exchange Scheme Fellowship within the 7th European Community Framework Programme.

## References

*Machine Learning*,

## Notes

^{1}

The interaction of the analyst and the decision maker takes place after a set of Pareto optimal solutions has been generated.

^{2}

See Definition 4 in Section 2.

^{3}

Or Pareto front approximation.

^{4}

To the authors’ best knowledge.

^{5}

See Section 6, for an explanation why this mapping is usually *into* and not *onto*.

^{6}

is the complement of the set *CH _{I}*.

^{7}

Artificial neural networks are simply referred to as neural networks or NNs in this work for convenience.

^{8}

The parallel plane to *E* that goes through the vector.