Abstract

The set of available multi-objective optimisation algorithms continues to grow. This fact can be partially attributed to their widespread use and applicability. However, this increase also suggests several issues remain to be addressed satisfactorily. One such issue is the diversity and the number of solutions available to the decision maker (DM). Even for algorithms very well suited for a particular problem, it is difficult—mainly due to the computational cost—to use a population large enough to ensure the likelihood of obtaining a solution close to the DM's preferences. In this paper we present a novel methodology that produces additional Pareto optimal solutions from a Pareto optimal set obtained at the end run of any multi-objective optimisation algorithm for two-objective and three-objective problem instances.

1  Introduction

Much research effort has been dedicated to finding the Pareto optimal set (PS) of multi-objective optimisation problems (MOPs). Many algorithms have been developed to solve MOPs. Broadly speaking, these can be categorised into two groups based on their approach to fitness assignment: (1) algorithms based on Pareto dominance relations; and (2) decomposition-based approaches. Most algorithms developed during the 1990s and early 2000s were Pareto-based (Fonseca and Fleming, 1993; Srinivas and Deb, 1994; Zitzler et al., 2001; Deb et al., 2002a), and to this day the majority of the methodologies rely on some variant of this type of fitness assignment. Decomposition-based optimisation techniques draw upon the fact that Pareto optimal solutions can be obtained by aggregating the objective functions into a set of scalar optimisation problems (Miettinen, 1999). This set of problems can, in principle, be solved using some single-objective optimisation algorithm.

In an a posteriori preference articulation1 scenario (Miettinen, 1999, p. 77), the main focus of multi-objective optimisation algorithms is to approximate the Pareto optimal set2 as quickly as possible, and distribute the Pareto optimal solutions evenly. However, this objective can easily become unmanageable for problems with more than three objectives. This is due to the fact that, as the number of dimensions in objective space increases, the number of solutions required to cover the entire front increases exponentially (Ishibuchi et al., 2008; Schütze et al., 2012) and the sweet spot of the evolutionary algorithm parameters that produce good solutions is reduced in size (Purshouse and Fleming, 2007). A potential solution to this problem could be to restrict the search space by introducing constraints in objective space (see, e.g., Fonseca and Fleming, 1993). In the present work we focus our attention on two- and three-objective problems to avoid such complications and better illustrate the presented concepts.

An issue with the aforementioned paradigm arises when the decision maker (DM) is not completely satisfied with the obtained PS. Their options in such a case are either to use a different algorithm or to restart the preferred algorithm with different parameters in the hope that the new PS will more closely satisfy the requirements. Progressive-preference articulation algorithms (Fonseca and Fleming, 1993; Miettinen, 1999; Wagner and Trautmann, 2010) offer an alternative approach; however, the drawback is that the DM must be in-the-loop for the algorithm execution (Miettinen, 1999) and this can be rather demanding. Miettinen et al. (2008) provide an overview of interactive multi-objective optimisation methods.

The proposed method in this paper alleviates these difficulties for continuous MOPs by producing more, and usually better, distributed Pareto optimal solutions along the entire Pareto front (PF). Additionally, an important feature that may be helpful to both the analyst and the DM is that if there is a specific region of interest on the PF, the generation of solutions can be focused on that region. This can be helpful in situations where no solution is found in a PF region of interest to the DM. The method we propose achieves this result by estimating the mapping of a convex set to the decision vectors corresponding to the Pareto optimal solutions obtained by a multi-objective optimisation algorithm. This convex set is used in lieu of the objective vectors for reasons that are clarified in Section 3 and Section 4. This mapping, identified here using a radial basis function neural network (RBFNN), is then used to generate estimates of decision vectors that would lead to Pareto optimal solutions in the neighbourhood of the original solutions. It should be noted that an RBFNN is chosen mainly because it is computationally efficient to train and the produced results are reasonable for the selected test problems. However, this choice is not restrictive and does not characterise the presented methodology, as any modelling or metamodelling method can be used instead, should the situation demand it. Additionally, the fact that a convex set is used does not impose any restrictions on the PF geometry.

The idea that supports the proposed method is that, for continuous MOPs, it can be deduced from the Karush-Kuhn-Tucker optimality conditions that the PS is piecewise continuous in the decision variables, as previously noted by Y. Jin and Sendhoff (2003) and Zhang et al. (2008). This fact, combined with a reasonable approximation of the PS, can be used constructively to infer the mapping of the above-mentioned convex set to decision variables that produce Pareto optimal solutions.

The main contributions of this work can be summarised as follows:

  • A method, which we call Pareto estimation (PE), is presented. PE can be used to increase the number of Pareto optimal solutions, for two- and three-objective problems, from an approximation of the Pareto set. This can be useful in a situation where the evolutionary algorithm has not produced a solution close enough to the desired location on the PF. Furthermore, the Pareto estimation method does not necessitate any alteration to the optimisation algorithm that is used to produce the Pareto set and is dependent on it only insofar as the quality of the PS is concerned. Quality in this context depends on two factors: (1) the proximity of the PS approximation to the Pareto optimal set, and, (2) the availability of solutions across the entire PF.

  • The effectiveness of PE is validated using a set of test problems, commonly used in the MOEA community, for two and three objectives. It is shown that PE can produce more Pareto optimal solutions across the entire PF with a much lower cost compared to the alternative of restarting the optimisation using the same or an alternative algorithm to solve the MOP. Although this is not the main purpose of the PE method, it is a good test, since if it can produce more solutions on the entire PF, then it should be able to increase the number of solutions in specific regions as well. We believe that this is the main utility of PE.

The remainder of this paper is organised as follows. In Section 2, a general formulation of a multi-objective optimisation problem is given, followed by some fundamental definitions. Related concepts and motivating ideas for the proposed method are discussed in Section 3. In Section 4, the PE method is described for Pareto and decomposition-based algorithms. The method is tested against a set of multi-objective optimisation problems and these tests are reported in Section 5. Lastly, in Section 6 we discuss problems, potential solutions, and ideas related to the PE method, and in Section 7, this paper is summarised and concluded.

2  Problem Setting and Definitions

A general definition of a multi-objective problem is
formula
1
where the number of objective functions is k, S is the feasible region for the decision vectors , and is a scalar objective function, with . Additionally, in this work it is assumed that and that n is the number of decision variables in the decision vector, . Also, depending on the definition of S, Equation (1) can be a constrained MOP. For simplicity, only minimisation problems are considered; however, this does not limit the generality of the produced results since, due to the duality principle, maximisation of a scalar objective function fi is the same as minimisation of . An implicit assumption is that the scalar objective functions in Equation (1) are mutually competing and possibly non-commensurate.

Although a complete ordering can be defined for multi-objective problems, for instance, a lexicographic order, only a partial ordering can be defined without prior knowledge of the preferences of a DM. Namely, when comparing two decision vectors , it can so happen that their corresponding objective vectors are incomparable. In practice, this situation is resolved when a DM will select one solution over all others, thus inducing a form of complete ordering. However, this ordering is mostly subjective, even in the case that utility functions (Vira and Haimes, 1983) are used to ease the work of the DM. In the absence of a DM, a usual assumption is that the relative importance of the objectives, fi, is unknown; hence, it is reasonable to obtain several non-comparable solutions. The problem of inducing partial ordering in Euclidean spaces was initially studied by Edgeworth (1881) and later further expanded by Pareto (1896). The relations introduced by Pareto are defined as follows for a minimisation problem.

Definition 1: 

A decision vector is said to weakly dominate a decision vector iff and , then .

Definition 2: 

A decision vector is said to dominate a decision vector iff , then .

Definition 3: 

A decision vector is said to be Pareto optimal if there is no other decision vector such that and .

Definition 4: 

Let , with and . If S is the feasible region, then the set Z is the feasible region in objective space. Given a set , the non-dominated set3 is defined as . If is the entire feasible region in the objective space, Z, then the set is called the Pareto optimal set or Pareto front. Any element is referred to as objective vector.

Also, the following definitions are used in this work in various contexts.

Definition 5: 

The ideal objective vector, , is the vector with elements (Miettinen, 1999, p. 16).

Definition 6: 

The nadir objective vector, , is the vector with elements , subject to fi being elements of objective vectors in the Pareto optimal set (Miettinen, 1999, p. 16).

Definition 7: 

The convex hull of a set C, denoted as , is the set of all convex combinations of the elements in the set C (Boyd and Vandenberghe, 2004, p. 24). For convenience, we define to be the convex hull of the set , where is a vector of zeros with 1 on the ith position, that is,

Definition 8: 

The extended convex hull () of the set C, is the union of CHI and the points in the affine space of the set C produced by the projection of a Pareto optimal front, with ideal vector and nadir vector , onto the hypersurface of C.

A multi-objective evolutionary algorithm (MOEA) will, in general, attempt to obtain a representative approximation of the Pareto optimal set. However, what is considered to be a representative approximation of the PS is not context independent, but metrics have been devised to measure the quality of the PS. For an excellent review of this topic, the reader is referred to Zitzler et al. (2003).

3  Related Work

Multi-objective evolutionary algorithms have had tremendous success in solving real-world problems. They have been applied in control systems (Fonseca and Fleming, 1998a, 1998b; Fleming and Purshouse, 2002), in economics and finance (Shoaf and Foster, 1998; Armananzas and Lozano, 2005; Subbu et al., 2005; Tapia and Coello, 2007) and aerospace (Ong et al., 2006; Goel et al., 2007). This can be attributed to the fact that evolutionary algorithms (EAs) perform well for a wide range of problems for which classical methods, such as convex optimisation (Boyd and Vandenberghe, 2004), are inapplicable. However, the robustness of EAs does not come for free. For example, in contrast to convex optimisation, there is no guarantee of global optimality for solutions produced by evolutionary algorithms. However, in practice there is strong evidence that very good approximations of Pareto optimal solutions are generated.

3.1  Metamodelling Methods in Multi-Objective Optimization

A single-objective function can potentially be computationally expensive, requiring hours or days even to evaluate, which constitutes a severe limitation. This issue is widely acknowledged in the MOEA community (Jones et al., 1998; Liang et al., 2000; Y. Jin et al., 2002; Knowles, 2006). A prevalent methodology employed by researchers to tackle this issue is the use of metamodelling methods in optimisation. The insight is that if a surrogate model of the actual objective function can be created with relatively few samples, then this surrogate model can be used instead of the objective function in the optimisation process.

Since the purpose of the surrogate model is to relieve the EA from evaluating an expensive objective function as much as possible, the primary selection criteria for a surrogate model are adapted accordingly. Namely, the suitability of a modelling method is judged according to: (1) the ease with which the model parameters can be identified; and (2) the cost of one evaluation of the surrogate model which must be much smaller than that of the actual objective function. Therefore, for a metamodelling method that satisfies the above criteria, a large number of objective function evaluations can be substituted with calls to the surrogate model, hence reducing the total cost of the optimisation. Another criterion that is critical in the success of the aforementioned procedure is the model precision. This is the case because, if the surrogate model cannot capture important features of the objective function, the search will be grossly misled, although caution should be exercised not to overcomplicate the surrogate model to a degree that its cost becomes comparable to the original objective function. In a way, a surrogate model can be viewed as a low-pass filter, hopefully separating the noise from the important features of the objective function, that is, its minima (or maxima). This is why such methods have been employed in noisy optimisation problems as well (Y. Jin and Branke, 2005).

The general approach, when substituting the real objective function with a surrogate model for use in an EA, has the following structure:

  •   Sample the real objective function.

  •   Use the obtained samples to create a surrogate model.

  •   Use the surrogate model in the optimisation.

  •   If the convergence criteria are met, stop; if not, go to Step 1.

An illustration of this iterative procedure can be seen in Figure 1, where an ever more accurate mapping of the decision space, S, to the objective space Z, is created in every iteration, . This approach was initially limited to serial implementations (Farina, 2002; Emmerich et al., 2002); however, later advances in metamodelling-based EAs employed local models (Ong et al., 2003), thus reinstating a key strength of EAs, their potential to be executed in parallel.

Figure 1:

Metamodelling methods in EAs gradually refine a surrogate model and then use it to find a better Pareto set approximation. Pareto estimation, the method proposed in this work, proceeds in the reverse direction by mapping a surrogate set, , of a Pareto front approximation, , to the decision vector set that maps to .

Figure 1:

Metamodelling methods in EAs gradually refine a surrogate model and then use it to find a better Pareto set approximation. Pareto estimation, the method proposed in this work, proceeds in the reverse direction by mapping a surrogate set, , of a Pareto front approximation, , to the decision vector set that maps to .

The idea to employ surrogate models in lieu of the true model of a process can be traced back to Box and Wilson (1951) where they employed polynomial basis functions to create a model from data. This approach is commonly referred to in the literature as the response surface method (RSM). Other examples of modelling methods used in combination with an evolutionary algorithm are neural networks (Poloni et al., 2000; Y. Jin et al., 2002; Adra et al., 2009), Kriging or Gaussian processes, and generalised response surface methods (Y. Jin, 2005), as well as Bayesian regression (Ong et al., 2004).

Another recently introduced class of methods that employ metamodelling is presented by Hartikainen et al. (2012) and Haanpää (2012). Specifically, Haanpää (2012) proposed the interactive hyperbox exploration method (IHBE). Of course, this method requires a significantly greater degree of commitment on behalf of the DM as the aforementioned hyperbox is progressively refined. Nevertheless, this methodology does have the potential to reduce the computational cost if the corresponding hyperbox is relatively small when compared with the extent of the entire Pareto front.

3.2  Pareto Estimation Method: Motivation

In this work we bring forward and resolve, to some extent, a question that seems to be ignored by the literature.4 Namely, given an approximation of the Pareto front by any MOEA, is there a way to obtain solutions in specific parts of the PF that are not present in the given set and, if the answer is positive, how can this be achieved? This question stems from the fact that, if there was a way to obtain a Pareto optimal solution that adheres exactly to the DM’s preferences, then there would be no need to evaluate any other solutions as is usually the case for multi-objective optimisation algorithms. However, such algorithms inherit this strategy because there is no clear path in incorporating all preferences, since it is unknown whether they are in fact reasonable, that is to say, if such a solution exists at all. Additionally, there is no clear way to obtain a specific solution with accuracy. The remainder of this paper considers this question and an answer is presented.

To appreciate the importance of this question, let us embark on a thought experiment. Assume that we have a function,
formula
2
Thus, the function, G, returns the corresponding Pareto optimal decision vector if a Pareto optimal solution, , is used and 0 otherwise. Obviously, such a function would be of limited use if the analyst had no information about the shape of the Pareto front as well as its location. Thus, the function, G, is a special indicator function whose domain of definition is the Pareto optimal set, , and whose range is the Pareto optimal decision vectors, . Therefore, given such a function and the information about the exact location of the Pareto front, it would be simply a matter of evaluating Equation (2) in order to obtain the decision vector that would result in a Pareto optimal solution. Such a description of the Pareto front geometry can be given by a parametric or non-parametric model if the problem has already been successfully solved. A potential issue with such an approach is that a different description of the PF will be required for different problems. Although this seems troubling, there is nothing to preclude the existence of a function with a convenient domain of definition that would map to the Pareto front of any given problem. Naturally, such a function must depend on, and adapt to, the Pareto optimal set or some approximation of it, and hopefully a procedure can be found to map the former to the latter. Strictly speaking, such a function would perform the following task,
formula
3
Additionally, it would be even more convenient if the mapping, , were predictable in the sense that, for a given , the resulting is not very hard to predict, as this would ease the complexity of using the function in Equation (3). A natural candidate for such a task would be an affine function.

The final piece of this puzzle lies in the domain of definition of the function described in Equation (3). The requirements on such a domain would be: (1) that points within the domain of definition of the function, , should be easy to obtain; and (2) that any convex combination of the points in the set must still be in the set, that is to say, the set must be convex. By adhering to these requirements, and if relations similar to Equations (2) and (3) can be identified, then by the following procedure, a Pareto optimal solution can be obtained at any desired location on the PF.

  • Choose a vector, , that will produce the desired vector, . This is verified by Equation (3). If the resulting is not the intended one, it will be sufficient to change a little. In this step we exploit the predictability of the mapping, .

  • Use the obtained vector, , in Equation (2) to obtain the corresponding decision vector, .

  • Evaluate the actual objective function, F, using the obtained vector, , to verify that . Strictly speaking, if the mappings described in Equations (2) and (3) are exact, this step is redundant.

If such a procedure were available in practice, there would be a way to obtain the decision vector that satisfies the requirements of the DM exactly instead of repeatedly solving a multi-objective optimisation problem in the hope of obtaining a solution that closely satisfies the aforementioned requirements.

4  Pareto Estimation Method

4.1  Overview

The question posed in Section 3.2 is interesting because, depending on how well it can be answered, the information that is in the analyst’s possession increases dramatically, thus allowing the analyst to cater to more specific requests from the DM. This is so because given absolute knowledge of the aforementioned functions, Equations (2) and (3), a multi-objective problem is virtually solved as any solution on the Pareto front can, theoretically, be obtained at very small additional expense with high precision. Although to obtain the entire Pareto optimal set may be infeasible in practice, this is the predominant definition of what it means to solve a multi-objective optimisation problem (Miettinen, 1999, p.  61).

However, such a relationship is usually unknown for real-world problems and sometimes it is unknown even for test problems. Most multi-objective optimisation algorithms strive to generate a PS that possesses two key properties. First, it should produce objective vectors as close as possible to the true PF and, second, these objective vectors should be evenly spread across the PF hypersurface. Under the assumption that the optimisation algorithm of choice has succeeded, to a reasonable degree, in producing a PS that possesses the aforementioned properties then the mapping, , of Pareto optimal objective vectors, , into5 their corresponding decision variables ,
formula
4
can be identified using a modelling method (Y. Jin, 2005). A theoretical argument based on the Karush-Kuhn-Tucker (KKT) optimality conditions, which further fosters the idea that the mapping in Equation (4) should be identifiable, was proposed by Zhang et al. (2008); this is further supported by the observations of Schütze et al. (2003) and Y. Jin and Sendhoff (2003). Zhang et al. (2008) state that, for continuous multi-objective problems, the Pareto optimal set is piecewise continuous in decision space. This point is revisited in Section 6. In the present work, a radial basis function neural network is used for this purpose since it is both robust and accurate for a wide set of problems (R. Jin et al., 2001). The structure and further details regarding the way that this type of neural network is employed in this work are discussed in Section 4.2.
However, even if the mapping, , is explicitly known, it is still unclear how the desired Pareto optimal objective vectors should be generated in order to obtain their corresponding decision variables using . This problem is related to the issue encountered in Section 3.2 with the function G. For example, assume that we have the exact mapping for a multi-objective problem, with the only restriction being that we provide the exact coordinates of Pareto optimal points. In order to be able to provide this information, we are required to know exactly the shape of the PF, meaning a mathematical description of the PF hypersurface must be available for all potential problems. If such information is available for the given problem, then all decision variables corresponding to the PF could be obtained using . This point becomes clearer if we view the mapping as the inverse of the objective function , which leads to
formula
5
Even if the function is not one-to-one, a mapping can still be obtained, but can no longer be called the inverse image of ; however, for practical purposes its function would be the same. Therefore it is relatively safe to ignore, for the moment, the fact that the objective function may be many-to-one; this issue is further discussed in Section 6.
Now, let us assume that we can transform the set to a set , with the only difference being that we can very easily obtain and manipulate the elements in and that any element in is mapped exactly to one element in the Pareto optimal set . That is, we require the mapping to be a bijection. In which case we can obtain the inverse transform , and
formula
6
would enable the DM to generate any required solution. One way to produce such a mapping is to initially normalise the objective vectors in according to
formula
7
where and are estimated from the set . This normalization scales the objectives in the range . The mapping is illustrated in Figure 2. After normalization, the resulting objective vectors are projected onto EHI; for problems with two objectives, this is the same as CHI. Subsequently, the mapping,
formula
8
is identified using a RBFNN as shown in Figure 1. This model, in essence, subsumes the composition of the mapping and in Equation (6).
Figure 2:

Illustration of the mapping for a hypothetical Pareto set, .

Figure 2:

Illustration of the mapping for a hypothetical Pareto set, .

effectively takes a set of vectors in , , and creates its corresponding set in EHI, . For two dimensions, vectors in will be part of the convex set CHI and this set will be identical to EHI, see Figure 2. For more than two dimensions, both EHI and CHI are still convex sets but a more elaborate procedure will be required to obtain points on the EHI due to its geometry, see Figure 3.

Figure 3:

Illustration of the mapping for a Pareto set, , with 3 objectives. The points on the outer grid are in , while the transformed set is within the hashed regions.

Figure 3:

Illustration of the mapping for a Pareto set, , with 3 objectives. The points on the outer grid are in , while the transformed set is within the hashed regions.

For example, consider a concave Pareto front such as the one shown in Figure 3. This front is the first octant of a sphere centred at the origin with radius . If we apply the transform to this Pareto optimal set, the resulting set will be on the union of the striped areas in Figure 3, that is, EHI. The part of in CHI is the set within the triangle with vertices , , and . The remaining points in are part of ,6 and, since the edges of the EHI set are curved, it is no longer straightforward to generate points within this set that are evenly distributed. Therefore, the desired property of the function, , discussed in Section 3.2, that is, the ability to easily generate points within its domain, would be restricted. A partial solution to this is simply to bound the domain of definitions of the mapping to the CHI artificially. This would maintain the aforementioned desirable property but such a restriction would limit the method in producing solutions that their projection is within the CHI. The solutions in correspond to extreme Pareto optimal points which are, potentially, of low interest (Das and Dennis, 1996). However, if this assumption is not true and the DM requires solutions within these regions, the method described in Giagkiozis and Fleming (2012) could be employed to obtain estimates from the PE method. This can be achieved as the entire set, , is used in the model creation process (see Section 4.2).

Finally, to generate the estimated Pareto optimal solutions, a set of evenly spaced convex combinations of the set is created, and the resulting set is called . Subsequently, this set can be used as an input to a model of . The resulting decision vectors may then be used in the objective function to verify that they correspond to Pareto optimal objective vectors. An alternative is to create for a specific region of interest in the PF, for example, using points that are within the .

4.2  Radial Basis Function Neural Networks

Before we delve into a detailed description of how the proposed method can be applied to decomposition and Pareto-based multi-objective optimisation algorithms, we first explore the technique used to model the mapping.

Neural networks, or more precisely artificial neural networks,7 are widely used in an array of different disciplines (Maier and Dandy, 2000; Atiya, 2001; Wong and Selvi, 1998). They are well known for their universal approximator property (Hornik et al., 1989). Furthermore, a subclass of NNs, namely radial basis function neural networks (RBFNNs), have been shown to be robust and accurate predictors when compared to Kriging, multivariate adaptive splines, and polynomial regression methods (R. Jin et al., 2001). RBFNNs have a single hidden layer and an output layer. Their output layer is often composed of linear functions since this guarantees a unique solution to their weights, w (Bishop, 1995), without the need to resort to the renowned back-propagation algorithm (Rumelhart et al., 1986).

RBFNNs usually employ basis functions that are radially symmetric about their centres, , for the chosen norm, and decreasing as drifts away from . A commonly used basis function is the Gaussian (Bishop, 1995), given in its general form by
formula
9
where the norm is often the Euclidean (-norm). Perhaps at this point a difficulty associated with RBFNNs is evident, namely that, although the output layer is composed of linear functions, the hidden layer is highly nonlinear in the parameters and , which can prove a challenge in the selection of their optimal values. Various techniques are suggested in the literature addressing this problem (Bishop, 1995). In this work we choose to use all the training data as centres for the radial basis functions, . Therefore, the number of basis functions is equal to the number of training vectors used. Additionally, a uniform value for the parameter, , is used for all basis functions, and it is set to , where is the mean distance of solutions in to their nearest neighbour. This value for was chosen experimentally. Intuitively, this almost guarantees that the basis functions overlap, thus minimizing the number of regions in the interior of the set for which no basis function is active. Therefore, Equation (9) becomes
formula
10
Arguably, this is the simplest way to choose the parameters of the basis functions and was used to retain focus on the proposed methodology. For a more elaborate and comprehensive methodology on selection of the parameters of RBFNNs, the reader is referred to Billings et al. (2007).
The output of a RBFNN is a linear combination of the basis functions ,
formula
11
where is the output layer bias term and , where n is the number of outputs, that is, the number of decision variables.

To validate the created neural network, -cross validation was used, as suggested in Jones et al. (1998). Namely, for a Pareto set of size N, N NNs were created using samples for the training set and the remaining sample was used to estimate the generalisation error. This procedure is repeated until all the solutions in the Pareto set have been used as a test sample and then the mean square error is calculated. After estimating the NN expected generalisation error using cross validation, the final NN is generated using the entire Pareto set.

4.3  Pareto Dominance-Based Algorithms

The method described in Section 4.1 introduced the general procedure of the proposed technique; however, certain details were abstracted. Optimization algorithms based on Pareto dominance for fitness assignment have several control parameters. One of these parameters is the size of the population to be used in the optimisation process. This parameter effectively provides an upper bound on the resulting number of Pareto optimal solutions in the final set . One requirement, for the methodology to function correctly for the entire PF, is that there be a sufficient number of non-dominated solutions in the final population. An additional requirement, that is evident from experiments, is that the non-dominated set produced by the algorithm is well spread across the PF, that is, the solutions are diverse and the mean distance from their neighbours has small variance. This simply states that the performance of the proposed method is dependent on the performance of the algorithm used to solve the MOP.

Once the execution of a multi-objective evolutionary algorithm (MOEA) has come to an end, the non-dominated solutions of the resulting set constitute the set , with corresponding decision vectors . Then each objective in is normalised according to Equation (7) in the range and the ideal and nadir vectors are estimated from the set as follows,
formula
12
formula
13
where fi is the objective function and its corresponding values for different solutions are found in the column of . Note that since the produced Pareto set approximation has finite size, the and operators are replaced by the and operators, which return the minimum and maximum element of a set, respectively. Next, the normalised set is projected onto the hyperplane E defined by , where is a vector of zeros with a one in the position. This is achieved by initially projecting onto the subspace8 parallel to E and then shifting the result by , where is the unit matrix. To obtain the projection matrix, linearly independent vectors in the E plane are required. These vectors are obtained in the following way:
formula
14
where H is a matrix. Subsequently, the projection matrix, PE, is obtained by
formula
15
where PE is a matrix with rank . The transformed Pareto set is
formula
16
Finally, the neural network, used to identify the mapping , is created as described in Section 4.2, using and as the training inputs and outputs, respectively.

Once the neural network is trained, it can be used to create additional solutions for a new set of convex combinations, . However, this set has to be generated by the DM according to his or her preference in a particular region of the PF; alternately, a more densely and evenly spaced convex set spanning the entire PF could be created. The first option is likely to be preferred when the cost of evaluating the objective function is considerable or there is a clear preference toward a particular region of the PF.

The described procedure is summarised as follows.

  •   Obtain the non-dominated individuals from the final population of a Pareto-based MOEA, , and its corresponding decision variables .

  •   Normalise according to Equation (7).

  •  Project the normalised onto the -hyperplane going through according to Equations (14), (15), and (16), to produce . For two objectives this is the line through and , and for three objectives, it is the plane through , , and .

  •   Identify the mapping, , using and as inputs and outputs, respectively, to train a RBFNN as described in Section 4.2.

  •   Create the set, ; in this work, this is a set of evenly spaced convex vectors.

  •   Use the set as inputs to the RBFNN (which was created in Step 4) to obtain estimates of decision vectors, .

  •   Verify the produced solutions, , using the objective function to establish their optimality.

4.4  Decomposition-Based Algorithms

Decomposition-based MOEAs have recently increased in popularity, a trend that was reinforced by the introduction of MOEA/D by Zhang and Li (2007). In MOEA/D, the MOP in Equation (1) is decomposed into a set of scalar subproblems. This is achieved with the help of one of several decomposition techniques. Weighted sum, Chebyshev (Miettinen, 1999), and normal boundary intersection (Das and Dennis, 1996) decompositions are some of the available options. The multi-objective optimisation problem, see Equation (1), is restated in the following way with the aid of the Chebyshev decomposition,
formula
17
where are N evenly distributed weighting vectors, N is the population size, and is the scalar objective function. The operator denotes the Hadamard product, which is element-wise multiplication of vectors or matrices. The intuition behind this is that, since is a continuous function of (Zhang and Li, 2007), N evenly distributed weighting vectors are assumed to produce a well distributed set of Pareto optimal solutions.

Consequently, since decomposition-based algorithms already have a set of convex combinations, namely the weighting vectors , and the correspondence of weighting vectors to objective vectors is clear, the set can be substituted with the weighting vectors that produce Pareto optimal solutions. This has the potential to greatly simplify the described procedure in Section 4.3. However, although this simplifies the algorithm, the choice of input vectors, , by the DM is more difficult because of its indirect nature compared to the general method described in Section 4.3, and this problem becomes increasingly more difficult for an increased number of objectives.

Therefore, although the method described for Pareto-based algorithms can be applied directly to decomposition-based algorithms, if we choose to use the weighting vectors, , instead of creating the set, , we can use the mapping
formula
18
Thus, a simplification to the proposed method is available when the MOEA used is based on decomposition and is summarised as follows.
  •   Obtain the weighting vectors, , corresponding to non-dominated solutions.

  •   Identify the mapping, , using and as inputs and outputs, respectively, to train a RBFNN.

  •   Generate a new set of weighting vectors in the PF region of interest, or using one of the methods discussed so far.

  •   Use the set, , as inputs to the neural network created in Step 2, to obtain estimates of decision vectors .

  •   Verify that the produced solutions, , are Pareto optimal.

5  Experiment Results

To test the merits of the proposed method, the Pareto-based algorithm was chosen to be NSGA-II (Deb et al., 2002a) and the decomposition-based algorithm was chosen to be MOEA/D (Zhang and Li, 2007). The algorithms were run times, using a different seed for the random number generator on every run, for six MOPs with two and three objectives. The population size used for both algorithms was set to for the three-objective problems and to for the three-objective problems, as these values are commonly employed in benchmarks (Zhang and Li, 2007). We used the Kruskal-Wallis test at the confidence level to verify whether the mean performance of the produced sets is different according to the selected metrics defined by Equations (20) and (22). Additionally, the algorithms were allowed to run for generations for the WFG problems and for generations for the DTLZ problems. The DTLZ test problems are DTLZ1 and DTLZ2 for two and three objectives. A definition of the DTLZ1-2 test problems is provided in Deb et al. (2002b) for decision variables. Additionally, the test problems WFG2-3 and WFG6-7 from the WFG toolkit (Huband et al., 2006) were used. The parameter settings used for these test problems can be seen in Table 1. The parameters, k and l, in Table 1, are the position and distance related parameters, respectively. This particular collection of test problems was chosen with several considerations in mind. First, the problem set had to be broadly used and recognised by the MOEA community. Second, the problems should be challenging and diverse. It is our belief that these aims are accomplished by this particular problem set. It is hoped that future research will provide further validation of the proposed methodology through experiments on more test problems as well as real-world problems. More specifically, DTLZ1 and DTLZ2 (Deb et al., 2002b) have been used in numerous studies (Hadka and Reed, 2012; Purshouse et al., 2011; Zhang and Li, 2007) and this is also true for the WFG toolkit (Hadka and Reed, 2012; Purshouse et al., 2011). Furthermore, each of these problems poses a different challenge. For instance, WFG2 has a discontinuous Pareto front and is non-separable. WFG3 is also non-separable and its Pareto front is linear for two dimensions and degenerate for three or more dimensions. WFG6 has a concave Pareto front and is non-separable and unimodal; and, lastly, WFG7 is separable with a concave Pareto front and has parameter-dependent bias (Huband et al., 2006). The parameter settings for the two algorithms were chosen in a similar fashion.

Table 1:
Test problem settings summary.
Two-objectiveThree-objective
instancesinstances
generationsNnklNnkl
WFG 300 101 24 20 276 24 20 
DTLZ1 500 101 10 — — 276 10 — — 
DTLZ2 500 101 10 — — 276 10 — — 
Two-objectiveThree-objective
instancesinstances
generationsNnklNnkl
WFG 300 101 24 20 276 24 20 
DTLZ1 500 101 10 — — 276 10 — — 
DTLZ2 500 101 10 — — 276 10 — — 

The hypothesis of this paper is that by using the Pareto estimation methodology, the number of Pareto optimal solutions available to the DM can be increased significantly, and despite the fact that on many test instances the estimated Pareto set actually turns out to be superior to the initial set, this is not the intended purpose of the method and can be treated as a positive side effect. Our main focus is the relative quality of the Pareto set, produced by MOEA/D and NSGA-II, before and after the application of the proposed method, and not the performance of the employed algorithms in absolute terms. For performance assessment purposes, the ratio of the following indices was used.

  • Inverted generational distance (IGD), introduced in Van Veldhuizen (1999),
    formula
    19
    where is the cardinality of the set and A is an approximation of the PF. The IGD metric measures the distance of the elements in the set A from the nearest point of the actual PF. The ratio of this metric was used as
    formula
    20
    where B is another PF approximation set. In this work B is the estimated PF using the Pareto estimation methodology.
  • Mean distance to nearest neighbour,
    formula
    21
    where di is
    formula
    This metric can serve as a measure of the density of solutions. Again, the ratio of this metric is used as
    formula
    22
and the coverage metric, described below, was used exactly as defined in Zitzler and Thiele (1999).
  • Coverage metric (C-metric)
    formula
    23
    is interpreted as: there is no solution in A that dominates any solution in B. And is interpreted as the exact opposite, that is, all the solutions in B are dominated by at least one solution in A.

5.1  Pareto Dominance-Based Algorithms

For every run of NSGA-II, the proposed method was applied using an evenly spaced convex set of size times greater than the initial population used in the optimisation algorithm. Figure 4 shows the experimental result of the Pareto estimation method. The set was used as the input to the identified mapping , resulting in the estimated decision vectors . Subsequently, was used with the objective function generating the objective vectors . Figure 5 shows the number of valid solutions produced by the RBFNN method.

Figure 4:

Boxplots of the experiment results of the Pareto estimation method using Pareto set approximations generated by MOEA/D and NSGA-II. The labels have the following format: Problem family:Problem number:Algorithm used, where W refers to the WFG problem set, and D to the DTLZ problem set. Also, the postfix D means that the Pareto set used was produced by MOEA/D, while N means that the Pareto set used was produced by NSGA-II. For example, the label W6N refers to results obtained for the WFG6 test problems using NSGA-II. The horizontal dashed line in the top four plots marks the value 1.

Figure 4:

Boxplots of the experiment results of the Pareto estimation method using Pareto set approximations generated by MOEA/D and NSGA-II. The labels have the following format: Problem family:Problem number:Algorithm used, where W refers to the WFG problem set, and D to the DTLZ problem set. Also, the postfix D means that the Pareto set used was produced by MOEA/D, while N means that the Pareto set used was produced by NSGA-II. For example, the label W6N refers to results obtained for the WFG6 test problems using NSGA-II. The horizontal dashed line in the top four plots marks the value 1.

Figure 5:

Top Row: The number of valid solutions produced by the RBFNN in the Pareto estimation method for two- and three-objective problems instances, normalised to the interval. A value of 1 means that all produced solutions are valid, and a value of 0 means that no valid solution was produced. Middle row: Number of Pareto optimal solutions generated by the RBFNN in the PE method. Here, too, the values are normalised to the interval. Bottom row: The mean square error (MSE) of the RBFNN. Note that all outputs of the NN have been normalised to the interval before calculating the MSE. The labels on the horizontal axis have the same interpretation as in Figure 4.

Figure 5:

Top Row: The number of valid solutions produced by the RBFNN in the Pareto estimation method for two- and three-objective problems instances, normalised to the interval. A value of 1 means that all produced solutions are valid, and a value of 0 means that no valid solution was produced. Middle row: Number of Pareto optimal solutions generated by the RBFNN in the PE method. Here, too, the values are normalised to the interval. Bottom row: The mean square error (MSE) of the RBFNN. Note that all outputs of the NN have been normalised to the interval before calculating the MSE. The labels on the horizontal axis have the same interpretation as in Figure 4.

Specifically, for the two-objective test problems, the size of the set was set to and for the three-objective problems, the size of the set was set to . The original Pareto optimal solutions used in the estimation process (cf. the corresponding estimates ) are shown in Figure 6 (cf. Figure 7) and Figure 8 (cf. Figure 9). It should also be noted, as is perhaps apparent from the figures, that the entire estimated population is presented and not a non-dominated subset. The same procedure was performed for all runs of NSGA-II for all test problems for two and three objectives and the results are summarised in Tables 2345; their non-parametric counterparts are presented in Figure 4. Furthermore, the number of valid solutions produced by the RBFNN, the number of Pareto optimal solutions, and the RBFNN estimated generalization error using cross validation (see Section 4.2), are presented in Figure 5. Note that in Tables 2345 the leftmost four columns correspond to the results referred to in this section and the remaining four columns correspond to the results obtained using the Pareto set approximation from MOEA/D. Comments on the latter are presented in the next section; we have combined the information in the tables in order to preserve space and to ease comparison.

Figure 6:

Pareto front solutions found by NSGA-II for the two-objective problem set.

Figure 6:

Pareto front solutions found by NSGA-II for the two-objective problem set.

Figure 7:

Estimated solutions, (), from the non-dominated solutions found by NSGA-II for the two-objective problem set. The dominated solutions, for the WFG2 problem, are drawn in grey.

Figure 7:

Estimated solutions, (), from the non-dominated solutions found by NSGA-II for the two-objective problem set. The dominated solutions, for the WFG2 problem, are drawn in grey.

Figure 8:

Pareto front solutions found by NSGA-II for the three-objective problem set.

Figure 8:

Pareto front solutions found by NSGA-II for the three-objective problem set.

Figure 9:

Estimated solutions, (), from the non-dominated solutions found by NSGA-II for the three-objective problem set. The non-dominated solutions in the WFG2 test problem are represented by the darker points on the plot.

Figure 9:

Estimated solutions, (), from the non-dominated solutions found by NSGA-II for the three-objective problem set. The non-dominated solutions in the WFG2 test problem are represented by the darker points on the plot.

Table 2:
and values of the solutions found by NSGA-II (left four columns) and MOEA/D (right four columns), , and the estimated set, , for the two-objective problem set.
NSGA2MOEA/D
     
Problem Mean SD Mean SD Mean SD Mean SD 
WFG2 1.0370 0.0174 2.7844 0.2177 1.0341 0.0742 6.1070 1.6286 
WFG3 1.0589 0.0046 7.8959 0.2286 1.0401 0.0216 9.9152 0.1197 
WFG6 0.7504 0.2730 7.0383 0.6354 1.0288 0.0514 9.5327 0.5143 
WFG7 2.2765 0.5875 7.6369 0.3695 2.3724 2.5573 8.7620 0.3967 
DTLZ1 4.0822 8.6262 7.9676 0.2790 1.0000 0.0000 9.9932 0.0049 
DTLZ2 12.3377 0.3990 8.0198 0.3061 9.8542 0.6573 9.8454 0.0062 
NSGA2MOEA/D
     
Problem Mean SD Mean SD Mean SD Mean SD 
WFG2 1.0370 0.0174 2.7844 0.2177 1.0341 0.0742 6.1070 1.6286 
WFG3 1.0589 0.0046 7.8959 0.2286 1.0401 0.0216 9.9152 0.1197 
WFG6 0.7504 0.2730 7.0383 0.6354 1.0288 0.0514 9.5327 0.5143 
WFG7 2.2765 0.5875 7.6369 0.3695 2.3724 2.5573 8.7620 0.3967 
DTLZ1 4.0822 8.6262 7.9676 0.2790 1.0000 0.0000 9.9932 0.0049 
DTLZ2 12.3377 0.3990 8.0198 0.3061 9.8542 0.6573 9.8454 0.0062 
Table 3:
C-metric values of the solutions found by NSGA-II (left four columns) and MOEA/D (right four columns), , and the estimated set, , for the two-objective instances of the selected problem set.
NSGA2MOEA/D
     
Problem Mean SD Mean SD Mean SD Mean SD 
WFG2 0.6789 0.0153 0.3154 0.0756 0.6101 0.0787 0.1174 0.1405 
WFG3 0.0253 0.0096 0.6306 0.0761 0.0003 0.0009 0.7430 0.0591 
WFG6 0.6046 0.3884 0.1778 0.2115 0.0323 0.1474 0.1289 0.0745 
WFG7 0.0726 0.0185 0.4150 0.0551 0.0001 0.0003 0.0362 0.0209 
DTLZ1 0.0192 0.0139 0.3222 0.1814 0.0441 0.1414 0.2875 0.3366 
DTLZ2 0.0122 0.0020 0.6108 0.0494 0.0000 0.0000 0.0196 0.0014 
NSGA2MOEA/D
     
Problem Mean SD Mean SD Mean SD Mean SD 
WFG2 0.6789 0.0153 0.3154 0.0756 0.6101 0.0787 0.1174 0.1405 
WFG3 0.0253 0.0096 0.6306 0.0761 0.0003 0.0009 0.7430 0.0591 
WFG6 0.6046 0.3884 0.1778 0.2115 0.0323 0.1474 0.1289 0.0745 
WFG7 0.0726 0.0185 0.4150 0.0551 0.0001 0.0003 0.0362 0.0209 
DTLZ1 0.0192 0.0139 0.3222 0.1814 0.0441 0.1414 0.2875 0.3366 
DTLZ2 0.0122 0.0020 0.6108 0.0494 0.0000 0.0000 0.0196 0.0014 
Table 4:
and values of the solutions found by NSGA2 (left four columns) and MOEA/D (right four columns), , and the estimated set, , for the three-objective problem set.
NSGA2MOEA/D
     
Problem Mean SD Mean SD Mean SD Mean SD 
WFG2 2.8047 0.4878 3.1172 0.2306 8.8543 0.8840 2.3451 0.1331 
WFG3 0.8013 0.1292 0.7331 0.1036 0.4957 0.0709 1.8425 0.1406 
WFG6 0.8790 0.1679 4.2835 0.7461 1.4326 0.8999 4.3185 1.2262 
WFG7 1.1950 0.0720 3.7008 0.1752 3.5314 0.4975 4.9538 0.0914 
DTLZ1 6.6262 1.2512 3.0811 0.1057 1.0983 0.6850 5.4396 0.5015 
DTLZ2 1.2655 0.0587 3.8767 0.1289 5.3924 0.5535 4.3850 0.2362 
NSGA2MOEA/D
     
Problem Mean SD Mean SD Mean SD Mean SD 
WFG2 2.8047 0.4878 3.1172 0.2306 8.8543 0.8840 2.3451 0.1331 
WFG3 0.8013 0.1292 0.7331 0.1036 0.4957 0.0709 1.8425 0.1406 
WFG6 0.8790 0.1679 4.2835 0.7461 1.4326 0.8999 4.3185 1.2262 
WFG7 1.1950 0.0720 3.7008 0.1752 3.5314 0.4975 4.9538 0.0914 
DTLZ1 6.6262 1.2512 3.0811 0.1057 1.0983 0.6850 5.4396 0.5015 
DTLZ2 1.2655 0.0587 3.8767 0.1289 5.3924 0.5535 4.3850 0.2362 
Table 5:
C-metric values of the solutions found by NSGA2 (left four columns) and MOEA/D (right four columns), , and the estimated set , for the three-objective instances of the selected problem set.
NSGA2MOEA/D
     
Problem Mean SD Mean SD Mean SD Mean SD 
WFG2 0.5753 0.0197 0.6342 0.0348 0.6845 0.0237 0.0000 0.0000 
WFG3 0.3773 0.1153 0.1739 0.0262 0.8238 0.0139 0.1851 0.0159 
WFG6 0.2137 0.1768 0.2417 0.1887 0.4265 0.4114 0.1303 0.1999 
WFG7 0.0035 0.0019 0.5360 0.0280 0.0001 0.0001 0.0031 0.0031 
DTLZ1 0.0003 0.0004 0.0463 0.0468 0.0000 0.0000 0.0005 0.0033 
DTLZ2 0.0027 0.0012 0.4952 0.0264 0.0000 0.0000 0.2030 0.0316 
NSGA2MOEA/D
     
Problem Mean SD Mean SD Mean SD Mean SD 
WFG2 0.5753 0.0197 0.6342 0.0348 0.6845 0.0237 0.0000 0.0000 
WFG3 0.3773 0.1153 0.1739 0.0262 0.8238 0.0139 0.1851 0.0159 
WFG6 0.2137 0.1768 0.2417 0.1887 0.4265 0.4114 0.1303 0.1999 
WFG7 0.0035 0.0019 0.5360 0.0280 0.0001 0.0001 0.0031 0.0031 
DTLZ1 0.0003 0.0004 0.0463 0.0468 0.0000 0.0000 0.0005 0.0033 
DTLZ2 0.0027 0.0012 0.4952 0.0264 0.0000 0.0000 0.2030 0.0316 

Table 2 presents the ratio of the IGD index and the mean distance to the nearest neighbour for problems with two objectives. The IGD index, in principle, attains smaller values the closer the set under testing is to the known PF. Additionally, if the set does not cover certain regions of the PF, this will cause the value of the IGD index to increase, signifying a degraded performance. Therefore, for this problem set, the proposed methodology is consistent in producing solutions that are at least of the same distance from the actual PF. Values of mean that the set produces better values for the IGD index compared to the original set , and for the converse is true. Regarding the mean nearest neighbour distance ratio , values of mean that the mean distance from a solution to its nearest neighbour is smaller in compared to , and for the converse is true. In all cases, the mean distance of neighbouring solutions in is much smaller. This fact, combined with the results for , strongly indicates that the density of the available Pareto optimal solutions has significantly increased using our proposed method.

In Table 3, the results for the C-metric are given for and for the two-objective test problems. This metric was employed to further verify the consistency of the method. As can be seen for all problems, except WFG2 and WFG6, the results are favourable. However, it is interesting to explore the potential reasons for the less impressive performance in these two problems. Regarding WFG2, since we did not use only the non-dominated subset of , the identified PF is, as can be seen in Figure 7, an oscillating function; this is exactly the PF directly obtained from the WFG2 problem. Therefore, in a way, the method did actually perform rather well in identifying the front. A remedy to avoid such a behaviour would be that the requested solutions, , are reasonably close to the transformed set, , of the original Pareto optimal solutions, . More elaborate methods are left for future research. Regarding the test problem WFG6, combined with the same moderate results in Table 2, it seems that our methodology has consistent difficulties with this particular problem instance. A potential cause for these difficulties is perhaps the simplicity of the modelling technique.

Table 4 presents and indices for the three-objective case. Again, has acceptable values, meaning that there is no significant sign of degradation of the IGD index. shows that the mean neighbour distance is consistently lower for . One notable feature for the values of is that they are almost half of their counterparts for the two-objective case, as can be seen in Table 2. This can partly be attributed to the curse of dimensionality, in the sense that to obtain similar results to Table 2, approximately order of solutions more have to be produced compared to the two-objective case. This is not the case for the three-objective instances of WFG2 and WFG3, which is mainly due to their PF geometry.

In Table 5, the results for the C-metric are given for and for the three-objective problems. Again, the results are consistent, with WFG2 performing rather moderately for the same reasons as for the two-objective case. The surprising fact is that for the three-objective case, WFG6 performs extremely well.

5.2  Decomposition-Based Algorithms

The same experiment procedure as in Section 5.1 is applied for the decomposition-based version of the MOEA. As previously mentioned, for this test case MOEA/D (Zhang and Li, 2007) was used with the same population size as NSGA-II. Instead of a set, , an evenly distributed set of weighting vectors, , was used, as described in Section 4.4. In all other respects, the experiment setup is identical. As before, the entire estimated population is presented and used for the calculation of the statistical results. The results are summarised in Figures 4 and 5.

Table 2 presents the ratio of the IGD index and the mean distance to the nearest neighbour for problems with two objectives. A distinctive pattern, when compared with the corresponding values in Table 2, is that when the index is very close to 1, the mean value for is very close to ; this is almost equal to the scaling factor we chose to increase the size of the set relative to the initial set . One possible reason for this behaviour, which, no doubt, is desirable, is that the solutions produced by MOEA/D are very well distributed across the PF. If we view the two-dimensional PF as a function, the fact that solutions are well distributed can be seen as sampling the function at regular intervals, hence their mean distance has low variance. This enables the modelling technique we used to better estimate the mapping, since a uniform value was chosen for all the basis functions, as discussed in Section 4.2. Another interesting fact is that, although the minimum value of for the problem WFG2 is less than 1, the mean value is and the standard deviation is relatively small. This indicates that, in general, the performance of our method is producing good results with low deviations, for this problem instance.

In Table 3 and Table 5, the results for the C-metric are given for and for the two-objective and three-objective test problems, respectively. The results are very consistent for all problems except WFG2, which is to be expected due to the shape of its PF. is very close to 0, signifying that a very small number of the solutions in are dominated by solutions in the original Pareto set .

Table 4 presents and indices for the three-objective case. In line with the results in Table 2, the index is satisfactory, although, for problems WFG3 and WFG6, it seems to be somewhat low. This is also reflected, to a certain extent, in the index. This behaviour, regarding problem WFG3, can be attributed to the fact that the real PF was not successfully identified by the algorithm, which for WFG3 is a line in three-dimensions. This conclusion is further supported by the fact that the corresponding values for in Table 5 are very close to 0, attesting to the fact that the produced estimated Pareto set, , does in fact model the given set rather well. Therefore, this behaviour could be remedied by choosing the non-dominated solutions in the set . However, for our purposes, this option was avoided, since this would tend to mask such deficiencies, thus disallowing further insight for possible improvements of the proposed methodology.

6  Discussion

This study has shown that the question posed in Section 3.2 can be answered with relative precision, as is strongly indicated by the results for the selected two- and three-objective test problems, shown in Figures 4 and 5. However, the Pareto estimation method is not without difficulties. For instance, since the quality of the produced solutions depends on the employed modelling method, which in turn depends on the quality of the produced Pareto set approximation, it is to be expected that when both these factors are satisfied to a greater degree, better results will follow. This is related to the observation in Y. Jin and Sendhoff (2003) about the connectedness of the Pareto optimal set in decision space for continuous multi-objective problems. Namely, if the Pareto set approximation is not close to the true Pareto set, this argument need not necessarily hold. For instance, such a Pareto set approximation need not necessarily be piecewise continuous in decision space, as the KKT conditions would not hold for the aforementioned PS. Furthermore, although Pareto estimation is, in principle, applicable to problems with more than three objectives, the visual component of the mapping is lost, hence its predictability is limited. It is unclear how to create a mapping in a high-dimensional space that is intuitive for the DM. A solution to this is presented by the use of the variant presented in Section 4.4. In this case the DM can express his or her preferences by specifying the relative importance of the objectives. Additionally, in the case that a Pareto-based algorithm is used, then generalised decomposition, a method introduced by Giagkiozis et al. (2013), can be employed to calculate the weighting vectors after which the procedure is identical to the one described in Section 4.4.

As mentioned in Section 3.1, there are many alternative methods for identifying the mappings used in the Pareto estimation method. However, since the cost of more elaborate methods renders them prohibitive for repetitive testing, it is difficult to quantify the benefits in using more sophisticated identification methods and even more difficult to discern whether the results are due to the affinity of the modelling method to the particular problem set. When applying the Pareto estimation method to a specific real-world problem, the analyst has several options on how to proceed to identify the required mappings used in PE. In an excellent work, Billings et al. (2007) address modelling issues and propose a comprehensive approach based on neural networks, wherein the entire procedure is systematised for producing high-quality models. It should be noted that, based on the results obtained in this work, the radial basis function neural network proposed in Section 4.2 has demonstrated more than acceptable performance, given the small amount of data that are usually available in a Pareto set approximation, and therefore it is an excellent starting point.

Another aspect that has become evident, especially when comparing the results produced using the Pareto sets produced by NSGA-II and MOEA/D, is that the distribution of the Pareto optimal solutions on the Pareto front, disregarding their convergence, seems to be an important factor determining the quality of the model. So, it would appear that if some active learning method, as in Cohn et al. (1994), could be used, the results could potentially be improved. However, the problem of direct control of the distribution of Pareto optimal points in the PS is a very difficult one.

Lastly, the modelling employed in the Pareto estimation method operates under the assumption that the mapping from objective to decision space is one-to-one, which seems to be limiting if, in fact, the objective function, , is many-to-one. However, careful consideration of this issue shows that this is not limiting to the Pareto estimation method; on the contrary, it can be rather helpful.

When a many-to-one objective function is mapped from the objective space to the decision space, for every objective vector there are one or more decision vectors to be found. This means that the probability of finding one decision vector for a specific objective vector is increased, which is to the benefit of the modelling method, as there are many alternatives. Also, given the way multi-objective evolutionary algorithms operate, that is, they distribute Pareto optimal solutions across the entire Pareto front, this one-to-many relationship would be impossible to discern as MOEAs are not expressly designed to preserve solutions that result in identical objective vectors. Therefore, it would be highly unlikely for a Pareto set approximation to have such alternatives, as this conflicts with MOEA objectives.

7  Conclusion

Multi-objective optimisation problem solvers seek to generate a satisfactory set of Pareto optimal solutions to enable a decision maker to select a suitable solution. Here, a novel methodology that increases the density of available Pareto optimal solutions has been described. Using this method, the number of available solutions along the trade-off surface is increased, thereby greatly enhancing the ability of the DM to identify a suitable solution with accuracy.

This is accomplished by identifying the mapping of a transformed set, derived from an approximation of the Pareto optimal set, to the corresponding decision vectors. This mapping was identified with the aid of a radial basis function neural network which was subsequently used to infer a number of Pareto optimal solutions, . The proposed method was presented in two forms. The first is a general formulation that is widely applicable to any multi-objective optimisation algorithm. This formulation was applied to a Pareto-based algorithm, NSGA-II, exhibiting almost perfect correlation between the size of the requested additional solutions and the obtained Pareto optimal solutions. The second form of the proposed method applies to decomposition-based algorithms. This form is motivated by the fact that by using the weighting vectors in place of we avoid the operations required to generate that set. Both versions of the proposed method were experimentally tested against a set of well-known test problems and the results strongly indicate that the suggested methodology shows great promise.

Furthermore, the results in Section 5.1 and Section 5.2 suggest that the choice of weighting vectors in MOEA/D is not optimal, that is, an even distribution of Pareto optimal points is not produced by the algorithm. By even distribution, we mean that the Pareto optimal points have a distribution that minimises the s-energy which has been shown to solve the best packing problem for a sphere (see Katanforoush and Shahshahani, 2003; Hardin and Saff, 2004, for further detail). This effect is transferred to the results of the proposed method that used an approximation of the PF produced by MOEA/D. In contrast with MOEA/D, the Pareto-based method produced much more uniform results. However, there are obvious edge effects that are explained by the fact that we generate solutions only within the CHI (see Section 4.1 and Figure 3).

Finally, although the concepts presented in this work need to be further developed, we believe that they can alter the definition of what we currently consider to be a well-distributed approximation of the PF. This is primarily due to the fact that, if an inverse mapping can be identified, then the main issue becomes that of the optimal allocation of Pareto optimal solutions on the PF, such that the process of identifying a suitable solution is facilitated. By optimal allocation, we mean an approximation set of the PF that provides the most information about the underlying PF. This, still unknown, distribution need not necessarily be an even distribution of Pareto optimal solutions. This issue is deferred to future research along with the exploration of the applicability of the presented method for many-objective optimisation problems. An application of Pareto estimation on a three-objective portfolio optimisation problem is described in Giagkiozis and Fleming (2012). The Source code for this work is available.9

Acknowledgments

The authors would like to thank Ricardo H. C. Takahashi for most interesting discussions and his invaluable perspective with respect to the present work, during his visit to the University of Sheffield, while supported by a Marie Curie International Research Staff Exchange Scheme Fellowship within the 7th European Community Framework Programme.

References

Adra
,
S.
,
Dodd
,
T.
,
Griffin
,
I.
, and
Fleming
,
P
. (
2009
).
Convergence acceleration operator for multiobjective optimization
.
IEEE Transactions on Evolutionary Computation
,
13
(
4
):
825
847
.
Armananzas
,
R.
, and
Lozano
,
J
. (
2005
).
A multiobjective approach to the portfolio optimization problem
. In
Proceeding of the IEEE Congress on Evolutionary Computation
, Vol.
2
, pp.
1388
1395
.
Atiya
,
A
. (
2001
).
Bankruptcy prediction for credit risk using neural networks: A survey and new results
.
IEEE Transactions on Neural Networks
,
12
(
4
):
929
935
.
Billings
,
S. A.
,
Wei
,
H.-L.
, and
Balikhin
,
M. A
. (
2007
).
Generalized multiscale radial basis function networks
.
Neural Networks
,
20
(
10
):
1081
1094
.
Bishop
,
C. M.
(
1995
).
Neural networks for pattern recognition
.
Oxford, UK
:
Oxford University Press
.
Box
,
G. E. P.
, and
Wilson
,
K. B.
(
1951
).
On the experimental attainment of optimum conditions
.
Journal of the Royal Statistical Society. Series B (Methodological)
,
13
(
1
):
1
45
.
Boyd
,
S.
, and
Vandenberghe
,
L
. (
2004
).
Convex optimization
.
Cambridge, UK
:
Cambridge University Press
.
Cohn
,
D.
,
Atlas
,
L.
, and
Ladner
,
R.
(
1994
). Improving generalization with active learning. Machine Learning,
15
(
2
):
201
221
.
Das
,
I.
, and
Dennis
,
J.
(
1996
).
Normal-boundary intersection: An alternate method for generating pareto optimal points in multicriteria optimization problems
.
Technical report, DTIC Document
.
Washington, DC
:
Defense Technical Information Center
.
Deb
,
K.
,
Pratap
,
A.
,
Agarwal
,
S.
, and
Meyarivan
,
T
. (
2002a
).
A fast and elitist multiobjective genetic algorithm: NSGA-II
.
IEEE Transactions on Evolutionary Computation
,
6
(
2
):
182
197
.
Deb
,
K.
,
Thiele
,
L.
,
Laumanns
,
M.
, and
Zitzler
,
E
. (
2002b
).
Scalable multi-objective optimization test problems
. In
Proceeding of the Congress on Evolutionary Computation
, Vol.
1
, pp.
825
830
.
Edgeworth
,
F.
(
1881
).
Mathematical psychics: An essay on the application of mathematics to the moral sciences
.
Number 10. London: CK Paul
.
Emmerich
,
M.
,
Giotis
,
A.
,
Özdemir
,
M.
,
Bäck
,
T.
, and
Giannakoglou
,
K.
(
2002
).
Metamodel; Assisted evolution strategies
. In
Parallel problem solving from nature. Lecture notes in computer science
, Vol.
2439
(pp.
361
370
).
Berlin
:
Springer-Verlag
.
Farina
,
M
. (
2002
).
A neural network based generalized response surface multiobjective evolutionary algorithm
. In
Proceedings of the IEEE Congress on Evolutionary Computation
, Vol.
1
, pp.
956
961
.
Fleming
,
P.
, and
Purshouse
,
R
. (
2002
).
Evolutionary algorithms in control systems engineering: A survey
.
Control Engineering Practice
,
10
(
11
):
1223
1241
.
Fonseca
,
C.
, and
Fleming
,
P
. (
1993
).
Genetic algorithms for multiobjective optimization: Formulation, discussion and generalization
. In
Proceedings of the Conference on Genetic Algorithms
, Vol.
423
, pp.
416
423
.
Fonseca
,
C.
, and
Fleming
,
P
. (
1998a
).
Multiobjective optimization and multiple constraint handling with evolutionary algorithms. I. A unified formulation
.
IEEE Transactions on Systems, Man and Cybernetics, Part A: Systems and Humans
,
28
(
1
):
26
37
.
Fonseca
,
C.
, and
Fleming
,
P
. (
1998b
).
Multiobjective optimization and multiple constraint handling with evolutionary algorithms. II. Application example
.
IEEE Transactions on Systems, Man and Cybernetics, Part A: Systems and Humans
,
28
(
1
):
38
47
.
Giagkiozis
,
I.
, and
Fleming
,
P.
(
2012
).
Increasing the density of available pareto optimal solutions
.
Research Report No. 1028. Department of Automatic Control and Systems Engineering, University of Sheffield
.
Giagkiozis
,
I.
,
Purshouse
,
R. C.
, and
Fleming
,
P. J.
(
2013
).
Generalized decomposition
. In
Evolutionary Multi-Criterion Optimization. Lecture notes in computer science
, Vol. 7811 (pp.
428
442
).
Berlin
:
Springer-Verlag
.
Goel
,
T.
,
Vaidyanathan
,
R.
,
Haftka
,
R.
,
Shyy
,
W.
,
Queipo
,
N.
, and
Tucker
,
K
. (
2007
).
Response surface approximation of pareto optimal front in multi-objective optimization
.
Computer Methods in Applied Mechanics and Engineering
,
196
(
4
):
879
893
.
Haanpää
,
T.
(
2012
).
Approximation method for computationally expensive nonconvex multiobjective optimization problems
.
Ph.D. thesis, University of Jyväskylä, Finland
.
Hadka
,
D.
, and
Reed
,
P
. (
2012
).
Diagnostic assessment of search controls and failure modes in many-objective evolutionary optimization
.
Evolutionary Computation
,
20
(
3
):
423
452
.
Hardin
,
D.
, and
Saff
,
E
. (
2004
).
Discretizing manifolds via minimum energy points
.
Notices of the AMS
,
51
(
10
):
1186
1194
.
Hartikainen
,
M.
,
Miettinen
,
K.
, and
Wiecek
,
M. M
. (
2012
).
Paint: Pareto front interpolation for nonlinear multiobjective optimization
.
Computational Optimization and Applications
,
52
(
3
):
845
867
.
Hornik
,
K.
,
Stinchcombe
,
M.
, and
White
,
H
. (
1989
).
Multilayer feedforward networks are universal approximators
.
Neural Networks
,
2
(
5
):
359
366
.
Huband
,
S.
,
Hingston
,
P.
,
Barone
,
L.
, and
While
,
L
. (
2006
).
A review of multiobjective test problems and a scalable test problem toolkit
.
IEEE Transactions on Evolutionary Computation
,
10
(
5
):
477
506
.
Ishibuchi
,
H.
,
Tsukamoto
,
N.
, and
Nojima
,
Y
. (
2008
).
Evolutionary many-objective optimization: A short review
. In
Proceedings of the IEEE Congress on Evolutionary Computation
, pp.
2419
2426
.
Jin
,
R.
,
Chen
,
W.
, and
Simpson
,
T
. (
2001
).
Comparative studies of metamodelling techniques under multiple modelling criteria
.
Structural and Multidisciplinary Optimization
,
23
(
1
):
1
13
.
Jin
,
Y.
(
2005
).
A comprehensive survey of fitness approximation in evolutionary computation
.
Soft Computing—A Fusion of Foundations, Methodologies and Applications
,
9
(
1
):
3
12
.
Jin
,
Y.
, and
Branke
,
J
. (
2005
).
Evolutionary optimization in uncertain environments—A survey
.
IEEE Transactions on Evolutionary Computation
,
9
(
3
):
303
317
.
Jin
,
Y.
,
Olhofer
,
M.
, and
Sendhoff
,
B.
(
2002
).
A framework for evolutionary optimization with approximate fitness functions
.
IEEE Transactions on Evolutionary Computation
,
6
(
5
):
481
494
.
Jin
,
Y.
, and
Sendhoff
,
B
. (
2003
).
Connectedness, regularity and the success of local search in evolutionary multi-objective optimization
. In
Proceedings of the Congress on Evolutionary Computation
, Vol.
3
, pp.
910
1917
.
Jones
,
D. R.
,
Schonlau
,
M.
, and
Welch
,
W. J.
(
1998
).
Efficient global optimization of expensive black-box functions
.
Journal of Global Optimization
,
13
(
4
):
455
492
.
Katanforoush
,
A.
, and
Shahshahani
,
M
. (
2003
).
Distributing points on the sphere, I
.
Experimental Mathematics
,
12
(
2
):
199
210
.
Knowles
,
J
. (
2006
).
ParEGO: A hybrid algorithm with on-line landscape approximation for expensive multiobjective optimization problems
.
IEEE Transactions on Evolutionary Computation
,
10
(
1
):
50
66
.
Liang
,
K.
,
Yao
,
X.
, and
Newton
,
C
. (
2000
).
Evolutionary search of approximated n-dimensional landscapes
.
International Journal of Knowledge Based Intelligent Engineering Systems
,
4
(
3
):
172
183
.
Maier
,
H.
, and
Dandy
,
G
. (
2000
).
Neural networks for the prediction and forecasting of water resources variables: A review of modelling issues and applications
.
Environmental Modelling and Software
,
15
(
1
):
101
124
.
Miettinen
,
K.
(
1999
).
Nonlinear multiobjective optimization
, Vol.
12
.
Berlin
:
Springer-Verlag
.
Miettinen
,
K.
,
Ruiz
,
F.
, and
Wierzbicki
,
A.
(
2008
).
Introduction to multiobjective optimization: Interactive approaches
. In
J. Branke, K. Deb, K. Miettinen, and R. Sowiski (Eds.)
,
Multiobjective Optimization. Lecture notes in computer science
, Vol.
5252
(pp.
27
57
).
Berlin
:
Springer-Verlag
.
Ong
,
Y.
,
Nair
,
P.
, and
Keane
,
A
. (
2003
).
Evolutionary optimization of computationally expensive problems via surrogate modeling
.
AIAA Journal
,
41
(
4
):
687
696
.
Ong
,
Y.
,
Nair
,
P.
,
Keane
,
A.
, and
Wong
,
K.
(
2004
).
Surrogate-assisted evolutionary optimization frameworks for high-fidelity engineering design problems
.
Knowledge Incorporation in Evolutionary Computation
(pp.
307
332
).
Berlin
:
Springer-Verlag
.
Ong
,
Y.-S.
,
Nair
,
P.
, and
Lum
,
K
. (
2006
).
Max-min surrogate-assisted evolutionary algorithm for robust design
.
IEEE Transactions on Evolutionary Computation
,
10
(
4
):
392
404
.
Pareto
,
V.
(
1896
).
Cours D’Économie Politique
.
Switzerland
:
University of Lausanne
.
Poloni
,
C.
,
Giurgevich
,
A.
,
Onesti
,
L.
, and
Pediroda
,
V
. (
2000
).
Hybridization of a multi-objective genetic algorithm, a neural network and a classical optimizer for a complex design problem in fluid dynamics
.
Computer Methods in Applied Mechanics and Engineering
,
186
(
24
):
403
420
.
Purshouse
,
R.
, and
Fleming
,
P
. (
2007
).
On the evolutionary optimization of many conflicting objectives
.
IEEE Transactions on Evolutionary Computation
,
11
(
6
):
770
784
.
Purshouse
,
R.
,
Jalbă
,
C.
, and
Fleming
,
P.
(
2011
).
Preference-driven co-evolutionary algorithms show promise for many-objective optimisation
. In
Evolutionary Multi-Criterion Optimization
(pp.
136
150
).
Berlin
: Springer-Verlag.
Rumelhart
,
D.
,
Hintont
,
G.
, and
Williams
,
R
. (
1986
).
Learning representations by back-propagating errors
.
Nature
,
323
(
6088
):
533
536
.
Schütze
,
O.
,
Esquivel
,
X.
,
Lara
,
A.
, and
Coello
,
C
. (
2012
).
Using the averaged Hausdorff distance as a performance measure in evolutionary multiobjective optimization
.
IEEE Transactions on Evolutionary Computation
,
16
(
4
):
504
522
.
Schütze
,
O.
,
Mostaghim
,
S.
,
Dellnitz
,
M.
, and
Teich
,
J
. (
2003
).
Covering Pareto sets by multilevel evolutionary subdivision techniques
. In
Evolutionary Multi-Criterion Optimization. Lecture notes in computer science
, Vol.
2632
(pp.
118
132
).
Berlin
:
Springer-Verlag
.
Shoaf
,
J.
, and
Foster
,
J.
(
1998
).
The efficient set Ga for stock portfolios
. In
Proceedings of the IEEE International Conference on Evolutionary Computation
, pp.
354
359
.
Srinivas
,
N.
, and
Deb
,
K
. (
1994
).
Muiltiobjective optimization using nondominated sorting in genetic algorithms
.
Evolutionary Computation
,
2
(
3
):
221
248
.
Subbu
,
R.
,
Bonissone
,
P.
,
Eklund
,
N.
,
Bollapragada
,
S.
, and
Chalermkraivuth
,
K
. (
2005
).
Multiobjective financial portfolio design: A hybrid evolutionary approach
. In
Proceedings of the IEEE Congress on Evolutionary Computation
, Vol.
2
, pp.
1722
1729
.
Tapia
,
M.
, and
Coello
,
C
. (
2007
).
Applications of multi-objective evolutionary algorithms in economics and finance: A survey
. In
Proceedings of the IEEE Congress on Evolutionary Computation
, pp.
532
539
.
Van Veldhuizen
,
D.
(
1999
).
Multiobjective evolutionary algorithms: Classifications, analyses, and new innovations
.
Ph.D. thesis, AirForce Institute of Technology, Kettering, OH
.
Vira
,
C.
, and
Haimes
,
Y.
(
1983
).
Multiobjective decision making: Theory and methodology
.
Amsterdam: North-Holland
.
Wagner
,
T.
, and
Trautmann
,
H
. (
2010
).
Integration of preferences in hypervolume-based multiobjective evolutionary algorithms by means of desirability functions
.
IEEE Transactions on Evolutionary Computation
,
14
(
5
):
688
701
.
Wong
,
B.
, and
Selvi
,
Y
. (
1998
).
Neural network applications in finance: A review and analysis of literature (1990–1996)
.
Information & Management
,
34
(
3
):
129
139
.
Zhang
,
Q.
, and
Li
,
H
. (
2007
).
MOEA/D: A multiobjective evolutionary algorithm based on decomposition
.
IEEE Transactions on Evolutionary Computation
,
11
(
6
):
712
731
.
Zhang
,
Q.
,
Zhou
,
A.
, and
Jin
,
Y
. (
2008
).
RM-MEDA: A regularity model-based multiobjective estimation of distribution algorithm
.
IEEE Transactions on Evolutionary Computation
,
12
(
1
):
41
63
.
Zitzler
,
E.
,
Laumanns
,
M.
, and
Thiele
,
L
. (
2001
).
SPEA2: Improving the strength pareto evolutionary algorithm
. In
Proceedings of the Evolutionary Methods Design Optimization and Control with Application to Industrial Problems, EUROGEN
, pp.
1
2
.
Zitzler
,
E.
, and
Thiele
,
L
. (
1999
).
Multiobjective evolutionary algorithms: A comparative case study and the strength Pareto approach
.
IEEE Transactions on Evolutionary Computation
,
3
(
4
):
257
271
.
Zitzler
,
E.
,
Thiele
,
L.
,
Laumanns
,
M.
,
Fonseca
,
C.
, and
da Fonseca
,
V
. (
2003
).
Performance assessment of multiobjective optimizers: An analysis and review
.
IEEE Transactions on Evolutionary Computation
,
7
(
2
):
117
132
.

Notes

1

The interaction of the analyst and the decision maker takes place after a set of Pareto optimal solutions has been generated.

2

See Definition 4 in Section 2.

3

Or Pareto front approximation.

4

To the authors’ best knowledge.

5

See Section 6, for an explanation why this mapping is usually into and not onto.

6

is the complement of the set CHI.

7

Artificial neural networks are simply referred to as neural networks or NNs in this work for convenience.

8

The parallel plane to E that goes through the vector.