Abstract

Many real-world optimization problems are subjected to uncertainties that may be characterized by the presence of noise in the objective functions. The estimation of distribution algorithm (EDA), which models the global distribution of the population for searching tasks, is one of the evolutionary computation techniques that deals with noisy information. This paper studies the potential of EDAs; particularly an EDA based on restricted Boltzmann machines that handles multi-objective optimization problems in a noisy environment. Noise is introduced to the objective functions in the form of a Gaussian distribution. In order to reduce the detrimental effect of noise, a likelihood correction feature is proposed to tune the marginal probability distribution of each decision variable. The EDA is subsequently hybridized with a particle swarm optimization algorithm in a discrete domain to improve its search ability. The effectiveness of the proposed algorithm is examined via eight benchmark instances with different characteristics and shapes of the Pareto optimal front. The scalability, hybridization, and computational time are rigorously studied. Comparative studies show that the proposed approach outperforms other state of the art algorithms.

1.  Introduction

Evolutionary algorithms (EAs; Tomassini, 1996; Bäck et al., 1997; Zhang et al., 2011), a class of stochastic search techniques, have been successfully applied to many real-world optimization problems (Cebrian et al., 2009; Damas et al., 2011; Yogev et al., 2010). These algorithms draw inspiration from evolution and employ biologically inspired operators such as mutation and crossover to sample, learn, and adapt from a population of candidate solutions. However, it was reported in Larrañaga and Lozano (2001) that the variation operators in standard EAs may disrupt the building block of good schemas and make the movement toward optimality extremely difficult to predict. Motivated by the idea of exploiting the linkage information among the decision variables, the estimation of distribution algorithm (EDA) has been regarded as a new computing paradigm in the field of evolutionary computation (Mühlenbein and Paass, 1996; Larrañaga and Lozano, 2001; Lozano et al., 2006; Chen and Chen, 2010). Neither crossover nor mutation is implemented in EDAs. Instead, the reproduction is carried out by building a representative probabilistic model where new solutions are subsequently generated through the sampling of a corresponding probabilistic model. Recently, EDAs have been adapted into a multi-objective optimization framework, commonly known as multi-objective estimation of distribution algorithms (MOEDAs; Pelikan et al., 2006; Shim et al., 2010).

Though multi-objective evolutionary algorithms (MOEAs; Tan et al., 2005; Fonseca and Fleming, 1995; Deb, 2001; Coello Coello et al., 2002; Tan et al., 2009) have gained satisfactory performance for certain static optimization problems, the implementation of MOEAs in a noisy environment still requires major study. In noisy environments, the presence of noise in the cost functions may affect the ability of the algorithms to drive the search process toward optimality. To understand the detrimental effect of noise in evolutionary optimization processes, Beyer (2000) carried out an investigation and found that the presence of noise may reduce the convergence rate, resulting in suboptimal solutions. In another study by Goh and Tan (2007), it was reported that the low level of noise helps an MOEA to produce better solutions for some problems, but a higher noise level may degenerate the optimization process into a random search. Darwen and Pollack (1999) concluded that resampling can reduce the effect of noise for a small population, but may not be as helpful for a larger population. In order to reduce the effect of noise in evolutionary processes, a number of noise handling approaches such as probability dominance (Hughes, 2001; Teich, 2001), fitness inheritance (Bui et al., 2005), and Dempster-Shafer framework of dominance (Limbourg, 2005) have been proposed.

While the implementation of standard MOEAs in a noisy environment has gained some attention from researchers, the adoption of MOEDAs in this particular case remains unexplored. As such, this paper attempts to study the performance of MOEDAs under the influence of noise. The advantage MOEDAs have over standard MOEAs in handling noisy information comes from the construction of noise handling features in the probabilistic model (Hong, Ren, and Zeng, 2005). Hong, Ren, Zeng, et al. (2005) showed that when a smoothing filter is incorporated into an EDA, it is more suited to tackle noisy optimization problems than a genetic algorithm (GA) in single objective optimization. More specifically, the authors argued that the univariate marginal distribution algorithm (UMDA), one of the simplest EDAs, is able to converge to global optimality in the One-max problem under the influence of noise.

In this study, we propose the use of EDAs to tackle multi-objective optimization problems in a noisy environment based on the restricted Boltzmann machine (REDA). However, REDA uses global information only in guiding the search, which may not explore the search space thoroughly. In addition, REDA may be trapped at local optima (Tang et al., 2010). To overcome this limitation, REDA is hybridized with a particle swarm optimization (PSO; Robinson and Rahmat-Samii, 2004; Goh et al., 2010) algorithm. The PSO is another stochastic computing algorithm inspired by swarm intelligence of insects in forming groups and movements such as birds and fishes. PSO is chosen because of its ease of implementation and reduced sensitivity to parameter settings. In addition, PSO has already been successfully applied to many real-world optimization problems (Iqbal et al., 2008; Azevedo et al., 2010) due to its satisfactory search performance and fast convergence rate. The hybridization is expected to improve the performance since the particles may move out of the region modeled by the probabilistic model, providing extra solutions which are unexplored by the model. On the other hand, the presence of noise may affect the selection operator, resulting in a wrong decision. This may directly affect the plausibility of the probabilistic model and cause premature convergence or trap the algorithm at local optima. Therefore, a likelihood correction feature is proposed to tune the marginal probability in each decision variable. The likelihood correction takes advantage of the probability error. The probability error is calculated from the model suggested by Hughes (2001) to determine the probability of making a wrong decision during the selection process due to the presence of noise.

The rest of the paper is as follows. Section 2 briefly describes the background concept of dominance in multi-objective optimization, MOEDAs, and noise handling features. Section 3 presents the framework of the proposed algorithm. Implementation, test instances, and performance indicators are outlined in Section 4. Section 5 examines the performance of the algorithms under the influences of noise, scalability, hybridization and computational time. The conclusion is presented in Section 6.

2.  Background Information

This section briefly reviews the concept of dominance in multi-objective optimization, MOEDAs, and noise handling features for MOEAs.

2.1.  Concept of Dominance in Multi-Objective Optimization

Without loss of generality, the test problems are considered to have m objective functions and n decision variables. Considering the minimization case, the problem is formulated as follows.

Minimize:
formula
1
where is the decision space, x is the decision vector, is the objective space, and f(x) is the objective vector. Let a and b be two decision vectors, then a is said to dominate b if and only if
formula
2
and a and b are incomparable () if and only if
formula
3
A decision vector is said to be nondominated if and only if and is a Pareto optimal solution. The set of all Pareto optimal points is called the Pareto-optimal set (PS) and the corresponding objective vectors form the Pareto-optimal front (PF; Deb, 2001).

2.2.  Multi-Objective Estimation of Distribution Algorithms (MOEDAs)

EDAs were introduced as a new computing paradigm in the field of evolutionary computation (Mühlenbein and Paass, 1996). In EDAs, the offspring are produced by sampling the estimated probability distribution of the parent population. EDAs can be divided into different categories according to the level of interaction among the decision variables that their probabilistic model performs (Larrañaga and Lozano, 2001). Detailed information about the interaction among the decision variables can be found in Ting et al. (2010) and Zhu et al. (2010). Recently, several EDAs have been proposed in the context of multi-objective optimization.

Bosman and Thierens (2002) presented a mixture distribution as a probabilistic model where each component in the mixture is an univariate factorization. This algorithm is known as multi-objective mixture-based iterated density estimation evolutionary algorithm (MIDEA). MIDEA has an advantage in preserving diversity due to its widespread exploration capability toward the Pareto optimal front. This algorithm can serve as a baseline algorithm for MOEDAs for its simplicity, speed, and effectiveness. In another study, Laumanns and Ocenasek (2002) incorporated a Bayesian optimization algorithm based on the binary decision tree as a model building technique—the Bayesian multi-objective optimization algorithm (BMOA)—to approximate the set of Pareto optimal solutions. In Costa and Minisci (2003), the authors proposed the multi-objective Parzen-based EDA (MOPED) to approximate the probability distribution of the solutions lying on the Pareto optimal front. Furthermore, a spreading technique which uses a Parzen estimator in the phenotypic space to identify the crowding level of the solutions was also proposed. The algorithm showed promising results in several well-known test problems.

Li et al. (2004) proposed a hybrid EDA (MOHEDA) by integrating a local search that is based on the weighted sum method while its constraint handling is based on a random repair method for solving multi-objective 0/1 knapsack problems. A stochastic clustering method was also introduced to preserve the diversity of the solutions. The results showed that MOHEDA outperforms several state of the art algorithms. Zhang et al. (2008) proposed a regularity model-based multi-objective estimation of the distribution algorithm (RM-MEDA) by considering regularity in building the probabilistic model. A local principle component analysis was used to extract the regularity patterns of the candidate solutions from previous searches. This algorithm showed good performance in test instances with a nonlinear variable linkage for a scalable number of variables. Okabe et al. (2004) introduced Voronoi diagrams to construct the probability model in an algorithm called Voronoi-based EDA (VEDA). VEDA is able to adjust the reproduction process based on the problem structure and at the same time, it takes advantage of the problem structure by estimating the most appropriate probability distribution.

Pelikan et al. (2005) described a scalable algorithm by combining non-dominated sorting GA II (NSGAII) (Deb, Pratap, et al., 2002), hierarchical Bayesian optimization algorithm (hBOA) and clustering in the phenotypic space to solve decomposable problems. In the proposed algorithm, each cluster is allocated an almost equal number of solutions. The results showed that the algorithm outperforms many standard MOEAs when the numbers of decision variables are scaled up. Another study was carried out by Zhong and Li (2007) where a decision-tree-based multi-objective estimation of distribution algorithm (DT-MEDA) for continuous-valued optimization problems was developed. The conditional dependencies among the decision variables were extracted by a decision-tree-based probabilistic model.

Recently, several attempts were carried out to model neural networks as EDAs in the framework of multi-objective optimization. Marti et al. (2009) implemented a growing neural gas network for modeling purposes. The algorithm showed promising results in high-dimensional problems. In another study, Tang et al. (2010) adapted an unsupervised learning neural network of restricted Boltzmann machine (RBM) to capture interdependent information among the decision variables. The network learns the probability distribution of the input stimuli in terms of energy equilibrium. The final information is returned to each decision variable by clamping the synaptic weights and biases into a marginal probability distribution. The algorithm performed well in high-dimensional problems.

2.3.  Noise Handling Features

Over the past few years, several noise handling techniques have been proposed in the area of utilizing MOEAs to reduce the detrimental effect of noise. Hughes (2001) proposed the multi-objective probabilistic selection evolutionary algorithm (MOPSEA) where the ranking process is reformulated by incorporating the probability dominance among the solutions. The modified ranking process showed promising results in reducing the disturbance of noise. In another independent study by Teich (2001), the concept of probabilistic dominance was introduced to deal with uncertain objective values constrained within certain intervals. This concept was implemented in the strength Pareto evolutionary algorithm (SPEA) and called estimate SPEA (ESPEA).

Büche et al. (2002) proposed a noise-tolerant SPEA (NTSPEA). The study focused on improving the robustness of the solutions when the objective functions are subjected to noise and outliers. Three features were incorporated, including domination dependent lifetime, re-evaluation of solutions, and update of the archive population. A lifetime, which is dependent on the fraction of archived solutions it dominates, is assigned to each of the nondominated individuals. The solutions in the archive will be removed once the lifetime has expired.

Goh and Tan (2007) carried out an extensive investigation to understand the effect of noise in evolutionary multi-objective optimization. They found that the decision error ratio for selection, ranking, and archiving is lower in the early stages of an evolution, the number of nondominated solutions found in an archive is reduced when the noise level increases and the average of the population distribution remains invariant in the decision space. Based on these observations, three noise handling features were proposed. Experiential learning directed perturbation (ELDP) performs the mutation paradigm by deciding which gene will undergo mutation. This decision is based on after the fact knowledge of the favorable movements in the search space. Gene adaptation selection strategy (GASS) adopts the feature to prevent search processes from premature convergence or degeneration into random searches. Finally, a possibilistic archiving methodology updates the archive by considering the probability that the domination by an individual is wrong.

Recently, Bui et al. (2009) adapted local models in dealing with noisy multi-objective optimization problems. In the proposed algorithm, the search space is divided into several nonoverlapping hyper-spheres in order to limit the search within the spheres. Noise filtering was carried out in each hyper-sphere and the future movement of the sphere was adjusted according to the direction of improvement. This implementation reduces the effect of noise based on the movement of the spheres. Other approaches that have been proposed to reduce the effect of noise are fitness inheritance (Bui et al., 2005), indicator model (Basseur et al., 2006), stochastic dominance and significant dominance (Eskandari and Geiger, 2009), the Dempster-Shafer framework of dominance (Limbourg, 2005), and confidence-based operators (Boonma and Suzuki, 2009; Syberfeldt et al., 2010), among others.
formula

2.4.  Noisy Test Instances

In this paper, noise is introduced into the fitness functions in the objective space as Gaussian noise, , with zero mean and different noise levels represented by variance (). Mathematically, the noisy fitness function is modeled as:
formula
4

3.  Proposed Algorithm

REDA surpasses the standard MOEAs in handling noisy information by constructing a noise handling feature in the built probabilistic model. In order to show this advantage, likelihood correction is proposed in order to tune the error in the constructed probabilistic model. Furthermore, REDA is hybridized with PSO in order to improve its search ability. In this section, the framework of the proposed algorithm and other relevant features will be discussed.

3.1.  Algorithm Framework

The proposed algorithm incorporates PSO and likelihood correction in REDA and is called PLREDA. The pseudocode of the PLREDA is presented in Algorithm 1 and the algorithm works as follows. First, at generation g=0, N initial individuals are randomly generated to form an initial population, Pop(g=0). All the solutions in Pop(g) are evaluated to calculate their objective values. Based on the objective values, Pareto ranking, and crowding distance (Deb, Pratap, et al., 2002) are performed over the solutions. Next, binary tournament selection (Miller and Goldberg, 1995; M. Chakraborty and U. Chakraborty, 1997) is carried out. In each selection process, the probability error in selecting a particular individual is computed. Based on this information, the population is clustered into several groups. The population is then rendered into RBM. The RBM is subsequently trained by using the contrastive divergence (CD; Hinton, 2002) training method to obtain the corresponding weights and biases. The probabilistic model using likelihood correction is then built. From the constructed probabilistic model, N new offspring are sampled, and the solutions are stored in archive A1. The solutions in Pop(g) and A1 are combined to form a pool of 2N solutions. Subsequently, N solutions with the lowest Pareto rank and highest crowding distance are selected from the pool to form the new Pop(g). Next, PSO is performed to update the solutions in Pop(g) and the generated N new solutions are stored in archive A2. The solutions in Pop(g) and A2 are then combined to form a pool of 2N solutions. N solutions with the lowest Pareto rank and highest crowding distance are selected from the pool to form the next population Pop(g+1). The same process is continued until the terminating criterion is reached.

3.2.  Particle Swarm Optimization (PSO)

REDA has its limitations in exploring the search space and may be trapped at any local optimum (Tang et al., 2010). In order to overcome these limitations, hybridization is carried out (Neri and Mininno, 2010; Ong et al., 2010). In the implementation stage, REDA is hybridized with a PSO algorithm. This hybridization is expected to improve the performance since the particles may now move out of the regions modeled by REDA, and thus provide extra solutions that REDA alone was not able to tap into.

PSO is another stochastic computing algorithm that is inspired by the swarm intelligence of several animals presenting collective behavior, such as birds and fishes. PSO has gained much attention from research communities in solving real-world optimization problems due to its ease in implementation and great search performance. PSO is particularly efficient for problems presented in real values. The conversion of PSO from real number representation to binary number representation is direct (Engelbrecht, 2006). The first discrete PSO was developed by Kennedy and Eberhart (1997). In binary PSO, each particle represents a position in binary space. Cedeno and Agrafiotis (2002) implemented a binary PSO in feature selection by treating the position vectors as continuous-valued vectors. The continuous-valued vectors are then transformed to discrete-valued vectors by setting a threshold to decide between the bit values 0 and 1.

In this implementation, the velocity (vk,d(g)) of each dth dimension of particle k in generation g is updated according to the continuous-valued case as:
formula
5
where , vmaxd and vmind are the maximum and minimum velocity in the dth dimension, respectively. gbestd(g) is the location in dth dimension parameter space of the best fitness returned for the entire swarm in generation g. In this implementation, tournament selection is applied to find the best individual at the current iteration. The best individual is the one that is nondominated and less crowded by other solutions. pbestk,d(g) is the location in the dth dimension parameter space of the best fitness returned for a particle k in generation g. The factors r1 and r2 are uniformly distributed random variables in the range of [0 1], while c1 and c2 are the weights that regulate the influences between global and local information. Another factor , known as initial weight, aims to balance the global and local search abilities. After updating the velocity, the position of each particle is updated according to the equations as follows.
formula
6
formula
7
where yk,d is the updated position in the dth dimension of particle k in a continuous domain while xk,d is the corresponding position value in a discrete domain after transformation by a reference threshold. The threshold is evolutionary and is treated as a variable to be optimized.

3.3.  Probability Dominance

The concept of probability dominance was proposed in Hughes (2001). In PLREDA, this concept is implemented to determine the probability error of each selected individual. This information is used to group the population into several clusters before a probabilistic model is built.

Assume that fi(A) and fi(B) are two solutions in the objective space with m objective functions . In a noise-free situation, fi(A) is said to strictly dominate fi(B) if fi(A) is smaller than fi(B) in all the objective values in the minimization case. On the other hand, fi(A) and fi(B) are mutually nondominated only if not all the objective values in one solution are lower than that of the other. Figure 1 further illustrates the concept of dominance. In the figure, point Z is strictly dominated by point Y, while points X and Y are mutually nondominated.

Figure 1:

Concept of dominance.

Figure 1:

Concept of dominance.

In a noisy domain, the above statements may not correctly represent the true domination behavior. Even through fi(A) appears to strictly dominate fi(B), the noise may distort the actual fitness values where fi(B) is supposed to dominate fi(A). In the selection process, the selection error occurs when the less fit individual is chosen. Therefore, the probability to make an error in the selection process could be utilized to improve the decision making process.

Hughes (2001) suggested a probabilistic selection model which employs the probabilistic dominance to account for the noise effect in the objective space. The noise is assumed to be normally distributed. In this model, P(fi(A)<fi(B)) is the probability error that fi(A) dominates fi(B) and is calculated as:
formula
8
where is the standard deviation of the distribution of the noise, and erf(x) is the error function. However, the calculation of the error function is time-consuming. Therefore, the probability error (Pe) is approximated as follows.
formula
9
For m objectives, the final probability (Pe) that solution A dominates solution B is calculated as follows.
formula
10

3.4.  Restricted Boltzmann Machine in Noisy Environments

3.4.1.  Restricted Boltzmann Machine as Estimation of Distribution Algorithm (REDA)

In this paper, REDA was applied to build the probabilistic model. This algorithm was proposed by Tang et al. (2010). REDA estimates the joint probability distribution of the selected individuals by using an RBM. An RBM is a kind of neural network that learns the probability distribution in terms of energy equilibrium. The network consists of an input layer (visible layer) and a hidden layer. The network is simple and efficient in modeling, able to learn interdependencies among the decision variables, and easy to implement. The evolution process of REDA is illustrated in Figure 2. First of all, the selected candidate solutions are rendered into the RBM. The decision vectors of the candidate solutions are treated as the input vectors to the visible units in the RBM. The CD training (see Algorithm 2) is performed to obtain the weights and biases of the network.
formula
Figure 2:

Evolution process using RBM.

Figure 2:

Evolution process using RBM.

In the CD training, two phases are carried out before the weights and biases of the network are updated. During the positive phase, the conditional probability of the hidden units given the visible states is computed according to Algorithm 2 (Step 1). Subsequently, the states of the hidden units are sampled. During the negative phase, the hidden units are stochastically fired to reconstruct the visible units by sampling according to Algorithm 2 (Step 3). Next, the hidden units are recomputed from the visible layer according to Algorithm 2 (Step 1). This process is called one step reconstruction of the visible and hidden units. Next, the weights and biases of the network are updated according to Algorithm 2 (Step 5). One training epoch is stopped here. The same procedures (positive and negative phases) are continued until the maximum number of training epoch is reached. The output of the network is the marginal probability distribution of the decision variables. The distribution is factorized as a product of independent univariate marginal distribution of each decision variable. Finally, sampling is performed to generate N offspring. This completes a generation.

In binary representation, the joint probability distribution at generation g is formulated as:
formula
11
where
formula
12
formula
13
formula
14
formula
15
and where xi is the ith decision variable, n is the number of decision variables, H is the number of hidden units, P(xi)+ is the marginal cost of xi when the cardinality of xi=1, P(xi) is the marginal cost of xi when the cardinality of xi=0, and E is the energy function of the RBM. This model has a limitation during the training process when the marginal probability distribution of any decision variable reaches a maximum value of 1.0 or a minimum value of 0.0. In order to alleviate this limitation, the lower and upper bounds are added to the probability distribution. The final probability distribution is constructed as follows.
formula
16
where avg(P(x)) is the average cost of cardinality and ri is the number of different values that xi may take. In the binary case, ri is 2. Based on this probability distribution, the offspring are generated by sampling the probability distribution according to Equation (17).
formula
17

3.4.2.  Likelihood Correction

In a noisy condition, the selected candidate solutions may not represent the best solutions (best in terms of lower Pareto rank or larger crowding distance). Therefore, the probabilistic model built by the REDA may not correctly represent the real distribution of the best solutions. In order to improve the appropriate probabilistic model, likelihood correction is proposed. This correction is based on the heuristic that if the distribution can be approximated as close to the real distribution of the best solutions, the detrimental effect of the noise can then be reduced. To approximate the real distribution, the probability of making an error in the selection process is adapted in the probabilistic modeling. In binary tournament selection, if two solutions in the tournament have a huge distinction in their objective values, for example fi(A) dominates fi(B) by far, then the selection error for selecting fi(A) is small. On the other hand, if two individuals in the tournament are near to each other in the objective space, then the selection error is larger. Therefore, if the probabilistic model built by the REDA is only based on individuals with small selection error, then the model may avoid distortions caused by those solutions with a large selection error.

However, the probability distribution may not come close to the real distribution if the number of individuals with smaller selection error is too little. Thus, a method to combine the distribution between solutions with small selection error and those with large selection error is designed. This combination is based on the penalty approach where individuals with smaller selection error will be penalized less while solutions with larger selection error will be more heavily penalized. This is because the real distribution is more likely to follow the distribution of the population with smaller selection error than those with larger selection error.

Thus, the first step is to determine the number of groups (Gp) and then sort the individuals into Gp clusters according to the predefined threshold () of the probability error (Pe) according to Equation (18).
formula
18
Pe is calculated by using Equation (10). Thy, the threshold, is determined through experimental investigation. In Equation (18), the function of Thy is to define the criterion for grouping. For example, solutions with Pe less than Th1 will be clustered into the first group. Solutions with Pe in between Th1 and Th2 will be clustered into the second group, and so forth. Afterward, the Pe is redetermined to be the value of the threshold (penalty value). Then the distribution in each cluster is combined according to Equation (19).
formula
19
In Equation (19), the penalty assigned to each cluster is determined by the threshold of the probability error of the members in the cluster. The probability error Pe in cluster 1 is always treated as 0.0 because cluster 1 has the smallest selection error compared to other clusters; thus, the probability distribution contributed by the solutions in this cluster will not be penalized. The solutions in other clusters will be penalized (according to the threshold value) since these solutions may not be the fittest ones in the population.

4.  Implementation

This section presents the test functions, performance metrics, and other algorithms used in the comparison. The experimental settings of the implementation are also outlined. In the experimental studies, the assumption is made that there is advance knowledge of the presence of noise.

4.1.  Test Instances

Eight benchmark test instances, including ZDT1, ZDT2, ZDT3, ZDT4, ZDT6, DTLZ1, ZDTL2, and DTLZ3, are used to test the performance of the PLREDA in terms of converging to the true Pareto optimal front and maintaining a set of diverse solutions. All of the test functions have different characteristics and shapes of the Pareto optimal front. ZDT problems (Zitzler et al., 2000) consist of two objectives functions while DTLZ problems (Deb, Thiele, et al., 2002) possess three objective functions. Detailed information on these test problems can be found in their original papers.

4.2.  Performance Metrics

Three performance indicators are used to show the numerical comparison between the PLREDA and other state of the art algorithms. Let be the evolved solutions and PF be the Pareto optimal solutions.

4.2.1.  Generational Distance (GD)

GD (Veldhuizen and Lamont, 1998) is defined as
formula
20
where N is the number of solutions in , , , and is the minimum Euclidean distance in the objective space between and p for each member i. GD illustrates the convergence ability of the algorithm by measuring the closeness between the Pareto optimal front and the evolved Pareto front. Thus, a lower value of GD shows that the evolved Pareto front is closer to the Pareto optimal front.

4.2.2.  Maximum Spread (MS)

This metric measures how well the PF is covered by . The mathematical formulation of the metric is shown below.
formula
21
where Fmaxi and Fmini are the maximum and minimum of the ith objective in PF, respectively; and Fmaxi and Fmini are the maximum and minimum of the ith objective in , respectively (Zitzler et al., 2000). A higher value of MS indicates that the evolved Pareto front has a better spread.

4.2.3.  Inverted Generational Distance (IGD)

IGD performs the near similar calculation as done by GD. The difference is that GD calculates the distance of each solution in to PF while IGD calculates the distance of each solution in PF to . In this indicator, both convergence and diversity are taken into consideration. A lower value of IGD implies that the algorithm has better performance.

4.3.  Other Algorithms Used in Comparison

Seven algorithms were chosen for performance comparison with the PLREDA. Since PLREDA was developed based on REDA, REDA should be included in the comparison. Likelihood correction is introduced into REDA and forms LREDA. This algorithm could show the advantages of incorporating the likelihood correction. Other than that, five state of the art algorithms are also chosen for performance comparison. First, NSGAII (Deb, Pratap, et al., 2002) is a popular algorithm in evolutionary multi-objective optimization for its ability to generate promising solutions for most of the test problems. The resampling mechanism is incorporated into NSGAII and the algorithm is called RNSGAII. Next, NTSPEA (Büche et al., 2002) and MOPSEA (Hughes, 2001) are algorithms which were specifically proposed to handle noisy objective functions. The fundamental noise handling principle in MOPSEA is based on the probability dominance. Finally, MOEARF (Goh and Tan, 2007) is a recently published noise handling MOEA.

4.4.  Experimental Settings

All algorithms were implemented in C++ and ran on an Intel Core 2 Duo, 3.0 GHz personal computer. The experimental settings are presented in Table 1. The parameter settings were determined through preliminary experimental studies or taken from previous works.

Table 1:
Parameter settings.
Representation Binary representation with 15 bits per decision 
 variable 
Population size 100 in ZDT problems and 200 in DTLZ problems 
Archive size 100 in ZDT problems and 200 in DTLZ problems 
Selection Binary tournament selection 
Fitness evaluations 40,000 in ZDT problems and 80,000 in DTLZ 
 problems 
Noise levels 0%, 5%, 10%, 20% 
Independent runs 30 
Crossover rate in NSGAII, RNSGAII, 0.8 (Goh and Tan, 2007
NTSPEA, MOPSEA, and MOEARF  
Mutation rate in NSGAII, RNSGAII, 1/(number of variables x number of bits per 
NTSPEA, MOPSEA, and MOEARF variable) (Goh and Tan, 2007
Re-sampling in RNSGAII 
Hidden units in REDA, LREDA, and 10 in ZDT problems and 5 in DTLZ1 problems 
PLREADA as suggested in Tang et al. (2010
Training epochs in REDA, LREDA, and 20 in all test problems as suggested in 
PLREADA Tang et al. (2010
Number of groups in LREDA and PLREDA 
Thresholds (Thy) in LREDA and PLREDA Th1= 0.25, Th2= 0.5 
in PLREDA 0.99 
c1 and c2 in PLREDA 2 and 1.5 
vmax and vmin in PLREDA 1.0 and −1.0 
Representation Binary representation with 15 bits per decision 
 variable 
Population size 100 in ZDT problems and 200 in DTLZ problems 
Archive size 100 in ZDT problems and 200 in DTLZ problems 
Selection Binary tournament selection 
Fitness evaluations 40,000 in ZDT problems and 80,000 in DTLZ 
 problems 
Noise levels 0%, 5%, 10%, 20% 
Independent runs 30 
Crossover rate in NSGAII, RNSGAII, 0.8 (Goh and Tan, 2007
NTSPEA, MOPSEA, and MOEARF  
Mutation rate in NSGAII, RNSGAII, 1/(number of variables x number of bits per 
NTSPEA, MOPSEA, and MOEARF variable) (Goh and Tan, 2007
Re-sampling in RNSGAII 
Hidden units in REDA, LREDA, and 10 in ZDT problems and 5 in DTLZ1 problems 
PLREADA as suggested in Tang et al. (2010
Training epochs in REDA, LREDA, and 20 in all test problems as suggested in 
PLREADA Tang et al. (2010
Number of groups in LREDA and PLREDA 
Thresholds (Thy) in LREDA and PLREDA Th1= 0.25, Th2= 0.5 
in PLREDA 0.99 
c1 and c2 in PLREDA 2 and 1.5 
vmax and vmin in PLREDA 1.0 and −1.0 

5.  Computational Results

In this section, the studies are divided into four categories. The comparison results of the algorithms are presented in the first category. This is followed by an analysis of the scalability behavior of the algorithms. The possibility of hybridization with other evolutionary paradigms is discussed next. Finally, the computational time for the algorithms to perform a single simulation run is examined.

5.1.  Comparison Results

Table 2 shows the results in terms of the GD measurement obtained from the different algorithms. The first column indicates the test problems, while the number in the parentheses is the number of the decision variables. The mean and standard deviation of the results over 30 independent runs are presented. The best result in each test instance is highlighted in bold. It is obvious from the tabulation that PLREDA is able to obtain the best GD results in most of the test instances, followed by MOEARF and MOPSEA. In a noiseless condition, MOPSEA obtained the best results in four of the test instances, with MOEARF and PLREDA each obtaining two best results. When the noise levels are increased, PLREDA outperformed other algorithms except for DTLZ2. In addition, we can see the improvement from REDA to LREDA in most of the test instances, with the noisy condition. The performance of LREDA is further enhanced when it is hybridized with PSO (PLREDA). It was also observed that the hybridization between LREDA and PSO may affect the search ability in some noiseless circumstances (ZDT1, ZDT2, and ZDT3). However, the performance of the PLREDA was outstanding in noisy environments. Since GD measures the distance between the evolved Pareto front and the Pareto optimal front, the closer they are, the smaller the GD value. It was observed that all of the algorithms are able to obtain a set of solutions that is near to the Pareto optimal front in ZDT1, ZDT2, ZDT3, ZDT6, and DTLZ2. For the rest of the test problems (ZDT4, DTLZ1, and DTLZ3), most of the algorithms are trapped in local optima. This is attributed to the fact that three of these test problems consisted of many local optimal fronts. MOEARF was able to generate good solutions in ZDT4; however, its performance is poor in DTLZ1 and DTLZ3. PLREDA performed better than MOEARF in all DTLZ problems, even though the algorithm also failed to reach the Pareto optimal front.

Table 2:
Comparison of PLREDA to the different MOEAs with respect to the GD performance indicator
Noise
Algorithms
Problemslevels
(n)(%)PLREDALREDAREDANSGAIINTSPEAMOPSEARNSGAIIMOEARF
ZDT1 (30) 0.00450.0016 0.00380.0003 0.00380.0003 0.00380.0003 0.00630.00058 0.00350.00026 0.00380.0003 0.00390.00042 
 0.05980.0351 0.06890.0117 0.07110.0125 0.11560.0180 0.11160.0174 0.09680.0138 0.10030.0158 0.04680.0052 
 10 0.07600.0339 0.13190.0188 0.13410.0245 0.19660.0325 0.19490.0324 0.14930.0255 0.15860.0278 0.09480.0136 
 20 0.15200.0697 0.24000.0511 0.26500.0487 0.33230.0469 0.33510.0498 0.24640.0340 0.27690.0533 0.17690.0213 
ZDT2 (30) 0.00390.0005 0.00380.0002 0.00380.0002 0.00380.0002 0.00630.0035 0.00360.0002 0.00380.0002 0.00420.0005 
 0.04290.0423 0.09960.0174 0.10810.0234 0.15130.0292 0.12960.0245 0.10840.0172 0.17570.0359 0.04940.0063 
 10 0.06140.0380 0.20360.0371 0.19420.0411 0.28330.0542 0.25630.0422 0.19160.0261 0.26940.0609 0.09710.0137 
 20 0.13860.1230 0.43600.0726 0.44120.0833 0.52250.0937 0.50420.1078 0.38750.0553 0.44930.0892 0.22180.0387 
ZDT3 (30) 0.00930.0017 0.00860.0008 0.00860.0008 0.00860.0008 0.00960.0009 0.00640.0006 0.00860.0023 0.00870.0011 
 0.03310.0104 0.04130.0050 0.03930.0060 0.05530.0112 0.05950.0177 0.04990.0097 0.07820.0151 0.04630.0077 
 10 0.05580.0383 0.07040.0204 0.07080.0209 0.11290.0404 0.12770.0471 0.07670.0245 0.09930.0250 0.07440.0211 
 20 0.12280.0688 0.23750.0299 0.22990.0353 0.27390.0545 0.27090.0488 0.12890.0703 0.23480.0435 0.17640.0553 
ZDT4 (10) 0.51301.6893 0.53320.2244 0.53320.2244 0.51690.1978 0.57770.1847 0.54930.2190 0.51690.1679 0.01480.0034 
 0.00470.0010 0.46580.2169 0.53240.1717 0.54320.2086 0.55330.2268 0.51310.1883 0.56860.1650 0.01050.0028 
 10 0.00420.0014 0.59420.2827 0.54050.1999 0.58570.1945 0.55380.2046 0.58260.2264 0.59270.1889 0.02620.0106 
 20 0.00660.0047 0.60400.2204 0.62320.2372 0.60900.2147 0.60070.1768 0.59390.2307 0.67010.2649 0.48611.0196 
ZDT6 (10) 0.01300.0010 0.01280.0010 0.01280.0010 0.01250.0011 0.04260.1194 0.37060.7834 0.01250.0011 0.01010.0043 
 0.01010.0027 0.06230.1702 0.07170.1682 0.02520.0016 0.02500.0026 0.79330.1971 0.98420.1275 0.06350.1391 
 10 0.00810.0011 0.23780.3061 0.29240.3158 0.32190.3582 0.69000.2574 1.09910.1726 1.16820.1895 0.60110.3988 
 20 0.00700.0021 0.80450.4053 0.83790.5210 1.37160.2406 1.43510.2195 1.71540.2660 1.55290.1691 1.07450.3266 
DTLZ1 (30) 57.701395.42 204.7350.023 204.7350.023 122.8035.134 128.6038.403 1066.2347.89 122.8035.134 486.70138.86 
 238.9177.411 345.5071.277 377.538470.179 698.10215.22 800.40176.85 1051.3173.33 1484.495.267 1237.0269.63 
 10 245.66362.950 399.8397.6119 457.1683.127 849.50198.22 777.20151.10 1004.8203.57 1510.683.930 1298.0179.37 
 20 275.41472.80 466.79109.74 472.8079.401 913.30217.38 840.40180.99 1034.9178.07 1551.466.218 1249.4253.14 
DTLZ2 (30) 0.06640.0069 0.10440.0123 0.10440.0123 0.06290.0033 0.07570.0044 0.05300.0068 0.06290.0033 0.07770.0127 
 0.14250.0237 0.20450.0345 0.21240.0416 0.16220.0393 0.16450.0641 0.10290.0199 0.49970.1680 0.29060.0586 
 10 0.22240.0591 0.29680.0654 0.31120.0591 0.26630.0729 0.23970.0667 0.14710.0230 0.51340.1353 0.39490.1275 
 20 0.46820.0775 0.48730.1003 0.49110.0804 0.51950.0969 0.62250.1999 0.27090.0326 0.65890.1325 0.60320.1373 
DTLZ3 (30) 94.1752137.75 224.9954.689 224.9954.689 128.5035.946 266.00252.70 1493.2295.54 128.5035.946 571.60336.39 
 427.72109.86 541.5895.543 554.49145.25 651.00251.07 1075.5310.33 1111.0171.45 1684.2105.79 935.00193.19 
 10 537.46160.78 566.62130.31 653.08112.36 608.90177.56 942.40236.77 1109.2163.71 1700.4120.75 898.20179.99 
 20 515.01156.10 570.19130.86 589.37146.69 653.60218.42 1071.4307.64 1047.4193.59 1644.9106.64 809.2339.23 
Noise
Algorithms
Problemslevels
(n)(%)PLREDALREDAREDANSGAIINTSPEAMOPSEARNSGAIIMOEARF
ZDT1 (30) 0.00450.0016 0.00380.0003 0.00380.0003 0.00380.0003 0.00630.00058 0.00350.00026 0.00380.0003 0.00390.00042 
 0.05980.0351 0.06890.0117 0.07110.0125 0.11560.0180 0.11160.0174 0.09680.0138 0.10030.0158 0.04680.0052 
 10 0.07600.0339 0.13190.0188 0.13410.0245 0.19660.0325 0.19490.0324 0.14930.0255 0.15860.0278 0.09480.0136 
 20 0.15200.0697 0.24000.0511 0.26500.0487 0.33230.0469 0.33510.0498 0.24640.0340 0.27690.0533 0.17690.0213 
ZDT2 (30) 0.00390.0005 0.00380.0002 0.00380.0002 0.00380.0002 0.00630.0035 0.00360.0002 0.00380.0002 0.00420.0005 
 0.04290.0423 0.09960.0174 0.10810.0234 0.15130.0292 0.12960.0245 0.10840.0172 0.17570.0359 0.04940.0063 
 10 0.06140.0380 0.20360.0371 0.19420.0411 0.28330.0542 0.25630.0422 0.19160.0261 0.26940.0609 0.09710.0137 
 20 0.13860.1230 0.43600.0726 0.44120.0833 0.52250.0937 0.50420.1078 0.38750.0553 0.44930.0892 0.22180.0387 
ZDT3 (30) 0.00930.0017 0.00860.0008 0.00860.0008 0.00860.0008 0.00960.0009 0.00640.0006 0.00860.0023 0.00870.0011 
 0.03310.0104 0.04130.0050 0.03930.0060 0.05530.0112 0.05950.0177 0.04990.0097 0.07820.0151 0.04630.0077 
 10 0.05580.0383 0.07040.0204 0.07080.0209 0.11290.0404 0.12770.0471 0.07670.0245 0.09930.0250 0.07440.0211 
 20 0.12280.0688 0.23750.0299 0.22990.0353 0.27390.0545 0.27090.0488 0.12890.0703 0.23480.0435 0.17640.0553 
ZDT4 (10) 0.51301.6893 0.53320.2244 0.53320.2244 0.51690.1978 0.57770.1847 0.54930.2190 0.51690.1679 0.01480.0034 
 0.00470.0010 0.46580.2169 0.53240.1717 0.54320.2086 0.55330.2268 0.51310.1883 0.56860.1650 0.01050.0028 
 10 0.00420.0014 0.59420.2827 0.54050.1999 0.58570.1945 0.55380.2046 0.58260.2264 0.59270.1889 0.02620.0106 
 20 0.00660.0047 0.60400.2204 0.62320.2372 0.60900.2147 0.60070.1768 0.59390.2307 0.67010.2649 0.48611.0196 
ZDT6 (10) 0.01300.0010 0.01280.0010 0.01280.0010 0.01250.0011 0.04260.1194 0.37060.7834 0.01250.0011 0.01010.0043 
 0.01010.0027 0.06230.1702 0.07170.1682 0.02520.0016 0.02500.0026 0.79330.1971 0.98420.1275 0.06350.1391 
 10 0.00810.0011 0.23780.3061 0.29240.3158 0.32190.3582 0.69000.2574 1.09910.1726 1.16820.1895 0.60110.3988 
 20 0.00700.0021 0.80450.4053 0.83790.5210 1.37160.2406 1.43510.2195 1.71540.2660 1.55290.1691 1.07450.3266 
DTLZ1 (30) 57.701395.42 204.7350.023 204.7350.023 122.8035.134 128.6038.403 1066.2347.89 122.8035.134 486.70138.86 
 238.9177.411 345.5071.277 377.538470.179 698.10215.22 800.40176.85 1051.3173.33 1484.495.267 1237.0269.63 
 10 245.66362.950 399.8397.6119 457.1683.127 849.50198.22 777.20151.10 1004.8203.57 1510.683.930 1298.0179.37 
 20 275.41472.80 466.79109.74 472.8079.401 913.30217.38 840.40180.99 1034.9178.07 1551.466.218 1249.4253.14 
DTLZ2 (30) 0.06640.0069 0.10440.0123 0.10440.0123 0.06290.0033 0.07570.0044 0.05300.0068 0.06290.0033 0.07770.0127 
 0.14250.0237 0.20450.0345 0.21240.0416 0.16220.0393 0.16450.0641 0.10290.0199 0.49970.1680 0.29060.0586 
 10 0.22240.0591 0.29680.0654 0.31120.0591 0.26630.0729 0.23970.0667 0.14710.0230 0.51340.1353 0.39490.1275 
 20 0.46820.0775 0.48730.1003 0.49110.0804 0.51950.0969 0.62250.1999 0.27090.0326 0.65890.1325 0.60320.1373 
DTLZ3 (30) 94.1752137.75 224.9954.689 224.9954.689 128.5035.946 266.00252.70 1493.2295.54 128.5035.946 571.60336.39 
 427.72109.86 541.5895.543 554.49145.25 651.00251.07 1075.5310.33 1111.0171.45 1684.2105.79 935.00193.19 
 10 537.46160.78 566.62130.31 653.08112.36 608.90177.56 942.40236.77 1109.2163.71 1700.4120.75 898.20179.99 
 20 515.01156.10 570.19130.86 589.37146.69 653.60218.42 1071.4307.64 1047.4193.59 1644.9106.64 809.2339.23 

The good performance of PLREDA was due to the combination of REDA, PSO, and the likelihood correction feature. First of all, REDA is more robust than NSGAII because the performance of REDA is better than NSGAII in noisy conditions. REDA, which performs the search by modeling the global probability distribution of the population, is more responsive since the reproduction is based on the global information and not individual solutions. Furthermore, likelihood correction is able to tune the probability distribution so that the distribution of the solutions is more likely to follow the one with a smaller selection error. The hybridization has further enhanced the ability of REDA in exploring the search space, especially in noisy conditions. This hybridization is utterly important when the REDA fails to model the promising regions in the search space. In that case, the hybridization could provide opportunities to explore those regions, thus improving the search ability. Resampling is probably one of the simplest noise handling mechanisms which has been implemented in RNSGAII. The performance of this algorithm is better than NSGAII in ZDT problems. However, its performance is poor in DTLZ problems. Even though resampling is able to reduce the effect of noise, it incurred extra fitness evaluations. Since the fitness evaluation in DTLZ problems is limited to 80,000, RNSGAII failed to converge at the end of the evolution. NTSPEA and MOPSEA, which are noise handling algorithms, have shown better results in noisy environments compared to NSGAII. This is unquestionable since NTSPEA and MOPSEA tackled the noisy information by incorporating the domination dependent lifetime and probability dominance respectively, while NSGAII did not take any extra measurement into account when handling the noisy data. For MOEARF, the noise handling mechanism is based on after the fact knowledge of the favorable movements; thus, it was able to obtain good results in ZDT problems. However, the performance of MOEARF in DTLZ problems is poorer than those of all other algorithms except RNSGAII.

Table 3 shows the results in terms of MS measurement obtained from the different algorithms. Generally, PLREDA was able to evolve a set of most diverse solutions in most of the test instances, followed by MOEARF and MOPSEA. The incorporation of likelihood correction was also able to improve the diversity of REDA in some of the test instances. There was no clear difference in performance between REDA and NSGAII. Furthermore, resampling was unable to maintain or improve the diversification of NSGAII. It was also observed that the noise interfered with the diversity performance of all algorithms in all ZDT problems. For DTLZ problems, the negative influence of noise toward the diversity preservation in all algorithms was far smaller than for ZDT problems.

Table 3:
Comparison of PLREDA to the different MOEAs with respect to the MS performance indicator.
Noise
Algorithms
Problemslevels
(n)(%)PLREDALREDAREDANSGAIINTSPEAMOPSEARNSGAIIMOEARF
ZDT1 (30) 0.99820.0014 0.99280.0016 0.99280.0016 0.99790.0007 0.99700.0007 0.98810.0045 0.99790.0009 0.97680.0144 
 0.94060.0281 0.92630.0230 0.92510.0217 0.91330.0267 0.91160.0229 0.91620.0297 0.92640.0292 0.96800.0161 
 10 0.91300.0399 0.88560.0358 0.89980.0303 0.87140.0445 0.85610.0362 0.85350.0546 0.89040.0308 0.93800.0161 
 20 0.87310.0447 0.83190.0582 0.82970.0437 0.81280.0432 0.76930.0464 0.78940.0586 0.83990.0322 0.89490.0432 
ZDT2 (30) 0.99730.0023 0.98770.0011 0.98770.0011 0.99340.0027 0.99010.0046 0.98170.0072 0.99340.0027 0.98120.0118 
 0.90730.0451 0.86460.0301 0.85130.0277 0.83000.0408 0.83810.0395 0.80220.0370 0.78650.0449 0.95650.0082 
 10 0.86290.0643 0.75170.0410 0.74780.0548 0.73180.0665 0.73010.0458 0.53400.1602 0.66320.1516 0.91690.0187 
 20 0.70560.1237 0.46990.1898 0.47780.1370 0.50860.2445 0.39790.2498 0.11020.1428 0.47840.2083 0.74120.2345 
ZDT3 (30) 0.99870.0022 0.99850.0007 0.99850.0007 0.99980.0001 0.98620.0240 0.97530.0415 0.99980.0001 0.97120.0537 
 0.97070.0158 0.96020.0106 0.96250.0106 0.89730.0370 0.88710.0391 0.89010.0557 0.89160.0384 0.94180.0061 
 10 0.95660.0215 0.94090.0136 0.94180.0131 0.87600.0196 0.82260.0668 0.77060.1480 0.86150.0824 0.92650.0138 
 20 0.91900.0550 0.86540.0677 0.88210.0575 0.81170.0561 0.75170.0836 0.64460.1646 0.82840.0643 0.87750.0587 
ZDT4 (10) 0.93020.1159 0.77740.0714 0.77740.0714 0.75450.0681 0.73520.0500 0.71760.0716 0. 75450.0681 0.95440.0182 
 0.93720.0217 0.75690.0820 0.73390.0596 0.74690.0642 0.73940.0710 0.73280.0618 0.73580.0468 0.97690.0175 
 10 0.91680.0221 0.71170.0665 0.71710.0707 0.73370.0626 0.71790.0571 0.69210.0720 0.71560.0494 0.96480.0303 
 20 0.85750.0653 0.68650.0700 0.67620.0554 0.70840.0765 0.65990.0680 0.63440.0903 0.69210.0663 0.86920.0944 
ZDT6 (10) 1.00000.0000 1.00000.0000 1.00000.0000 1.00000.0000 0.98940.0520 0.92230.1292 0.92320.1217 0.99630.0128 
 0.99930.0015 0.98790.0535 0.99040.0485 0.99460.0013 0.99250.0039 0.69290.0371 0.69990.0006 0.99480.0005 
 10 0.99900.0017 0.93360.1209 0.89760.1360 0.87690.1457 0.72890.0897 0.69250.0182 0.69670.0102 0.99470.0014 
 20 0.99820.0027 0.7751