Abstract

In this paper, we propose a multi-restart memetic algorithm framework for box constrained global continuous optimisation. In this framework, an evolutionary algorithm (EA) and a local optimizer are employed as separated building blocks. The EA is used to explore the search space for very promising solutions (e.g., solutions in the attraction basin of the global optimum) through its exploration capability and previous EA search history, and local search is used to improve these promising solutions to local optima. An estimation of distribution algorithm (EDA) combined with a derivative free local optimizer, called NEWUOA (M. Powell, Developments of NEWUOA for minimization without derivatives. Journal of Numerical Analysis, 28:649–664, 2008), is developed based on this framework and empirically compared with several well-known EAs on a set of 40 commonly used test functions. The main components of the specific algorithm include: (1) an adaptive multivariate probability model, (2) a multiple sampling strategy, (3) decoupling of the hybridisation strategy, and (4) a restart mechanism. The adaptive multivariate probability model and multiple sampling strategy are designed to enhance the exploration capability. The restart mechanism attempts to make the search escape from local optima, resorting to previous search history. Comparison results show that the algorithm is comparable with the best known EAs, including the winner of the 2005 IEEE Congress on Evolutionary Computation (CEC2005), and significantly better than the others in terms of both the solution quality and computational cost.

1.  Introduction

In this paper, we consider the following box constrained optimisation problem:
formula
1
where is the search space with and n is the dimensionality of the optimisation problem. From the 1970s, population-based modern global optimisation methods, such as evolutionary programming (Fogel, 1999; Yao and Liu, 1999; Lee and Yao, 2004), genetic algorithms (Michalewicz, 1996; Tsai et al., 2004; Tu and Lu, 2004; Wang and Dang, 2007; Leung and Wang, 2001; Zhong et al., 2004), estimation of distribution algorithms (Larrañaga and Lozano, 2002; Zhang et al., 2003), and many others have been proposed and achieved great success in theory and practice.

One of the main concerns in the design of evolutionary algorithms (EAs) is to balance the exploration and exploitation aspects (Bäck, 1996; Bäck et al., 1997; Torn and Zilinskas, 1989) for effective and efficient search. Among these EAs, memetic algorithms (MAs), that is, the combination of local search (LS) and EAs (Moscato, 1989), have been considered as a promising paradigm. MAs aim to improve the slow convergence of EAs to locate high-quality solutions by using LS to exploit the local fitness landscape. Readers are referred to Hart et al. (2005) for recent advances in MAs, and Krasnogor and Smith (2000) or Krasnogor and Smith (2005) for a comprehensive review of MAs in solving optimisation problems. Also, see Krasnogor (2009) for a more detailed discussion of the balance between exploration and exploitation in MAs.

To design an efficient MA, the trade-off between the intensities of EA and LS methods needs to be carefully tuned under a fixed computational budget. Various strategies have been proposed in the literature to address the trade-off, particularly for continuous optimisation. As summarised elsewhere (Nguyen, Ong, and Krasnogor, 2007; Nguyen et al., 2009), the basic issues addressed in these strategies include: (i) the LS frequency, or similarly, the LS probability, (i.e., how often the LS is applied); (ii) the LS intensity (i.e., for how long to apply the LS); (iii) the subset of individuals selected to carry out LS; and (iv) the selection of memes (i.e., local optimizers) to be used. The study in Nguyen, Ong, and Krasnogor (2007) showed that the LS frequency and intensity should be balanced under a fixed computational budget, and it is better to apply the LS to the best individuals. Theoretical analysis of convergence and search speed of MAs can be found in Sudholt (2006, 2009). Extensive studies on the selection of memes, such as adaptive MAs (Ong and Keane, 2004; Ong et al., 2006; Houck et al., 2004) and co-evolving MAs (Smith, 2003, 2007), have shown that the memes employed have a major influence on the search performance of the MAs. Interested readers are referred to Ong et al. (2006) and the references therein. In this paper, we focus on the development of a MA with a single LS.

Hart (1994) proposed several mechanisms to decide which solutions LS should be applied to, and the LS frequency, including a fixed frequency method, a fitness-based adaptive method, and a distribution-based adaptive method. In the fixed frequency method, LS is to be used in a fixed frequency of generations to all new offspring. The fitness-based adaptive method biases the application of LS to promising solutions with an LS probability. The distribution-based adaptive method applies LS only to solutions that are far away from each other, or without redundancy in the population.

In Bambha et al. (2004), simulated heating is applied to vary an LS parameter (which specifies the LS intensity) during the optimisation process. In Molina et al. (2005), the LS intensity and probability are decided according to the individual qualities in the population at each generation. An MA based on LS chains is developed in Molina et al. (2010) where previous LS parameters are inherited for the application of LS to newly generated individuals with fixed LS intensity, and the percentage of evaluations spent on LS is fixed. In Nguyen, Ong, Lim, et al. (2007), an adaptive method has been proposed for the selection of proper individuals to undertake LS. The probabilistic memetic framework developed in Nguyen et al. (2009) estimates the LS intensity for each individual at each generation. In Zhang et al. (2003), cheap and expensive LS algorithms are applied intelligently to solutions in different search stages. Cheap LS is applied to all new solutions to improve them within a limited number of steps. Expensive LS is only applied to the best solutions in every generation. The cheap LS can improve exploration ability, while the expensive LS will finally locate a near-global optimum.

Usually, for continuous optimisation problems, the memes used are classical LS algorithms, such as those methods appearing in the Schwefel libraries (Schwefel, 1995). Researchers have also proposed to use deliberately designed crossover operators or even an EA to take the role of LS in continuous domain optimisation, such as XLS (Satoh et al., 1996), SPX (Tsutsui et al., 1999; Tsutsui and Goldberg, 2002; Noman and Iba, 2008), and crossover LS (Jones, 1995; O'Reilly and Oppacher, 1995; Satoh et al., 1996; Lozano et al., 2004), among otheres. In Molina et al. (2010), CMA-ES (Hansen and Ostermeier, 2001) is applied as the LS algorithm. Although these methods may perform well, it is doubtful whether these heuristic-based local optimizers are more effective and efficient than classical LS algorithms for local improvement.

Informally, an LS is coupled with an EA in any existing MA. Usually, the LS takes effect on selected individuals at frequent generations (sometimes every generation). This could lead (in some cases) to the MA placing too much emphasis on exploitation. In other words, the balance of exploration and exploitation may be shifted too much in favour of exploitation (Houck et al., 1997). To redress the balance, we believe that it may sometimes be preferable to reduce the relative contribution of LS within the MA. Moreover, the computational cost to add a new solution to the overall MA process by EA and LS may also be highly unbalanced. It is rather easy for EA reproduction operators to generate such a solution, while LS usually requires a higher cost (although the cost does vary considerably depending on the LS intensity). In existing MAs, the imbalance is redressed by tuning the LS frequency and intensity. This tuning could make the development of an efficient MA rather complex.

In this paper, we attempt to moderate these problems in existing MAs. To realize this goal, we propose to isolate LS from the evolutionary search but not vice versa. That is, LS will be applied to improve a limited number of very promising solutions found by an EA in full strength, not to all or part of the population as in previous hybridisation strategies. Hence, we do not need to explicitly tune the LS frequency and intensity. The underlying idea of the algorithm design is to clearly distinguish exploration and exploitation. The task of the EA is limited to exploring the search space for very promising solutions, which is its advantage and should be strengthened. The task of the LS is to improve the very promising solutions efficiently to local optima. The less efforts contributed to the exploration for very promising solutions, the better. The design issues that arise are how to design an EA with good exploration capability and how to speed up an EA's exploration under a fixed computational budget. These problems are addressed in this paper.

The proposed algorithmic framework is described and discussed in Section 2. Section 3 describes the components of the exemplar algorithm including the adaptive multivariate model, multiple sampling strategy, and the embedding of these components. Section 4 gives the experimental results on a diverse set of commonly used test problems including those in the 2005 IEEE Congress on Evolutionary Computation (CEC2005; Suganthan et al., 2005) and the comparison with some well-known EAs. The effects of the algorithmic components, including the adaptive multivariate model, the multiple sampling strategy, the selection operation, the population size, and the decoupling hybridisation strategy, on the performance of the developed algorithm, and the scalability property of the developed algorithm, are experimentally studied in this section. Important issues related to the algorithmic framework and the developed algorithm are discussed in Section 5. Section 6 concludes the paper.

2.  The Algorithmic Framework

Recall that one of the main merits of EAs is that they usually show good ability to explore the search space, while LSs are good at locating local optima. It is worth noting that the quality of the local optima found by LSs largely depends on the initial solutions they start from (although the parameters of the LS are also important, but basically it is the location of the initial solutions that decide the final solution quality). Imagine that if the exploration ability of an EA is extensively increased so that it can provide a very promising solution (e.g., a solution located in the attraction basin of a high-quality optimum, or the global optimum) in a short time, then a properly selected LS will improve the solution to the high-quality local optimum, or the global optimum, efficiently. If this idea can be realised, an efficient optimisation algorithm will be achieved.

The key design issues of an MA based on the above idea lies in the success of designing an EA that is able to quickly locate the very promising solution and the selection of a proper LS. However, it is not always guaranteed that an EA can find the near global optimum solution. Also, the EA's low exploration speed is not acceptable under a fixed computational budget.

In light of the above arguments, we propose a novel memetic framework, called intelligent multi-restart memetic algorithm (IMMA). Figure 1(a) displays the flowchart of the proposed framework. In the framework, LS and EA are untangled and committed to acting separately. That is, no LS is applied inside EA, which distinguishes the classical MA framework shown in Figure 1(b) where local search intertwines with EA operation. In the developed framework, local search comes into play only if an EA is considered to have found a very promising solution. The framework requires an EA with strong exploration ability and a fast exploration speed. If the termination criterion (e.g., fixed computational budget) has not been reached, the diversification scheme is applied so that the EA may escape from the local optima. This multi-restart strategy is used to increase the possibility of finding near global optimum solutions as opposed to the stochastic nature of EAs. Moreover, to enhance the efficiency of the new evolutionary search, it will be particularly useful if we can use knowledge obtained from previous visited solutions. The abstract algorithmic framework shown in Algorithm 1 describes the idea in more detail.
formula
Figure 1:

(a) The flowchart of the proposed novel framework and (b) the traditional MA framework as described in Eiben and Smith (2003).

Figure 1:

(a) The flowchart of the proposed novel framework and (b) the traditional MA framework as described in Eiben and Smith (2003).

In the algorithmic framework, the loop from Selection to Check Search Status until convergence is called a cycle, that is, a typical run of an EA. The EA is considered to have converged if the restart criterion is satisfied. The restart criterion stops the EA's process of exploring for attraction basins of high quality solutions. After a cycle ends, local search is applied to improve the best solution found in the cycle. In Diversification, a new cycle begins following a finished cycle. We use xc to record the best solution found in each cycle (called the cycle best), is the local optimum resulting from the application of LS on xc, and is the best solution among these local optima (called the global best).

In each cycle, the EA components can be specified at will, although the critical design issue is how to increase the exploration ability so that a promising solution can be found quickly. The restart criterion has an important role in deciding when a very promising solution is found. In our implementation, we consider the EA has found a possible very promising solution if in T consecutive generations, xc is not updated. This does not guarantee that the current xc is truly very promising; it may just be that the evolutionary search has been trapped in a large attraction basin. To tell exactly whether xc is promising or not depends on knowledge about the fitness landscape of the optimisation problem. Unfortunately, this is not usually available. However, to choose between two solutions for the LS application, a direct (yet heuristic) approach is to choose the one with a smaller objective function value. A previous study (Nguyen, Ong, and Krasnogor, 2007) has already shown that the LS improvement over best individuals can result in good algorithmic performance. Note that the EA is supposed to be developed with good exploration ability. Therefore, we may postulate that the current best solution is more possible to be located in the attraction basin of a high-quality optimum than the other visited solutions. The stagnation of current evolutionary search also implies that the EA has reached the fine search stage. The use of the LS will make the exploitation much more efficient than the EA itself. Since we do not need the EA to carry out exploitation, T should be relatively small. A small T can also avoid the EA expending computational resources in cases where a large attraction basin is encountered.

In the algorithmic framework, at the end of each cycle, the current best solution (or cycle best, i.e., xc) is used as the initial solution for the LS. During the search, we may think of not applying the LS to improve xc in cases where its fitness is worse than previous ones (i.e., ). However, it is still worth expending computational cost on improving xc since we do not have any criteria to tell its location in the fitness landscape. Alternately, some criteria to decide intelligently whether to apply the local search could be developed. For example, if the minimal distance between xc and is less than a threshold, then it may not be wise to apply local search to improve xc since it may lead to the same local optimum. We do not include this consideration in our implementation, since there are no theoretical rules to decide the threshold: the threshold will depend on the optimisation problem.

If the stop criterion has not been met, we either maintain the current search, or restart a new search. To restart the search in the (c+1)th cycle, a new population of solutions can be randomly or heuristically generated. Note that as we already have some knowledge from previous search history, this may be useful for further searches in two ways. First, the new population can be generated intelligently by taking this knowledge into consideration. For example, we may want the generated population to be far away from the local optima identified so far, so that further searches can escape from them. Second, this knowledge may be used to guide the creation of offspring in the future searches. We will address these issues in the following sections.

To summarise, we see that developing an efficient MA based on the developed framework mostly depends on three factors, that is, an EA with effective exploration capability, a proper LS, and an intelligent diversification scheme. Among these factors, the diversification scheme is expected to help the search escape from local optima through the use of the learned knowledge if the EA fails to provide a solution that locates in the attraction basin of the global optimum. This implies that the developed MA may not perform well on optimisation problems if the learned knowledge has no prospective guidance on the global optimum, for example, problems with very unstructured fitness landscapes.
formula
formula

3.  Exemplar Algorithmic Implementation

The previous section describes a generic algorithmic framework. In this section, we describe one specific implementation of the framework in which our implementation is based on an estimation of distribution algorithm (EDA). In this section, we describe the various components that form our implementation, including an adaptive multivariate model and an offspring generation scheme, and the embedding of these components. To begin, we briefly describe the EDA.

3.1.  Estimation of Distribution Algorithm

The EDA was initially proposed by Mühlenbein and Paaß (1996). It maintains and evolves a set of individuals as in other EAs. But EDAs do not use crossover or mutation operators to produce new offspring. Instead, EDAs produce offspring by sampling from a probabilistic model. The statistical information extracted from promising solutions is used to construct the probabilistic model. The sampling of new offspring from the probability model is expected to guide the search to promising search areas. EDAs can be summarised as shown in Algorithm 2.

Existing EDAs for continuous optimisation can be classified with respect to the statistical model p(x, t) assumed. The probability models assumed include a Gaussian distribution (Larrañaga and Lozano, 2002), a Gaussian mixture (Bosman and Thierens, 2000), and a histogram (Tsutsui et al., 2001; Zhang et al., 2003). Readers are referred to Larrañaga and Lozano (2002) for a detailed description of EDAs.

3.2.  Adaptive Multivariate Model

In our exemplar algorithm, an adaptive multivariate probability model with is assumed to represent the statistical information extracted from the parent set. Suppose that at generation t, the parent set consists of where K is the size of the parent set. For the ith dimension, p(xi, t) is computed as described in Algorithm 3.

The distributions of the ith variable, xi, in the early and later stages of the search could be the profile as shown in Figure 2(a). In comparison, the marginal histogram model developed in Zhang et al. (2003) could be the profile as shown in Figure 2(b). In the marginal histogram model, the range of each variable is divided into a fixed number of subintervals with the same length. The histogram of the selected solutions at each variable is used to represent the probability model. The difference between the developed model and the histogram model is that the range of the decision variable in the developed model will adapt to the search environment.

Figure 2:

(a) The adaptive multivariate probability model and (b) the univariate marginal histogram model at early and later stages.

Figure 2:

(a) The adaptive multivariate probability model and (b) the univariate marginal histogram model at early and later stages.

From Figure 2, we see that the flat distribution in the adaptive multivariate probability model could promote exploration of the search space. On the other hand, the developed model will inevitably shrink the search space, and so accelerate the speed of evolutionary search. However, the shrinkage will weaken the exploration capability, since there is no chance to explore the search space outside whenever the shrinkage has happened. To moderate this problem, we adopt two strategies. First, we expand the present search space to , where is a small number which is proportional to biai. Second, the uniform distribution is applied to the interval (step 2). This is to make the generated solutions as diversified as possible in the search space, so that a good exploration can be expected. The time complexity to compute p(x, t) is , which is linear to the dimension and the selected population size.

3.3.  Guided Mutation and Multiple Sampling Strategy

The proximate optimality principle (POP), first stated by Glover and Laguna (1998), has been applied widely (explicitly or implicitly) in meta-heuristics proposed for combinatorial optimisation problems, such as in Zhang et al. (2005) for the maximum clique problem, in Stützle (1999) and Zhang et al. (2004) for the quadratic assignment problem, and many others. It has been found in Glover (1998) that optimal solutions of the traveling salesman problem have 80% similarities. The POP states that good solutions have similar structures (components). The POP has also been successfully applied to continuous optimisation (Sun et al., 2005), which indicates that the POP can be very helpful in real-parameter optimisation. Actually, the POP has been widely but implicitly applied in EAs, such as in the mutation operators.
formula

The explicit forms of the application of the POP are guided mutation (Zhang et al., 2005, 2004) and iterated local search (Stützle, 1999). Guided mutation has the advantage of generating offspring based on the guidance of statistical information. It has been particularly and most successfully applied to combinatorial optimisation problems. However, it can be easily applied to continuous optimisation problems in cases in which a fully factorised multivariate probability model is used. The offspring generation scheme can be summarised in Algorithm 4 with a template solution and a probability model p(x, t).

To create offspring with Algorithm GMSampling, the best solution , that is, the solution with the least objective function value among all the visited solutions, is taken as the template solution. The fixing of some components (subset V) of takes advantage of the legacy of the best solution. The creation of the rest of the components (subset ) is guided by the probability model p(x, t). The random selection of V makes it possible for the search to find components in that are the same as the global or near-global optimum. The time complexity of sampling one individual is .

Parameter , taking values from 0 to 1, controls the intensity of the fixing of the decision variables. It reflects to what extent POP holds for a certain optimisation problem. It can be expected that a large value implies a quick evolutionary search, but a weak exploration capability, since a large number of variables will be copied from the best solution found so far. Moreover, the optimal setting for efficient exploration varies with optimisation problems, and there is no a priori guidance on how to choose the best value in advance. Therefore, we propose to learn the parameter adaptively during the search in our implementation.

Recalling the weakness of the adaptive multivariate model described in Section 3.2, here we moderate this weakness through a multiple sampling strategy. Note that in almost all probability model based evolutionary algorithms, the number of individuals sampled from the probability model is usually less than, or equal to, the population size. As known in statistics (e.g., Markov Chain Monte Carlo theory, Robert and Casella, 2004), to truly represent the distribution of p(x, t), a large set of data points needs to be sampled from p(x, t). Therefore, statistically speaking, the small sample size applied in most EDAs will result in high sampling noise, leading to the false guidance problem in the search procedure. That is, the sampling noise may mislead the search to possibly wrong areas, even though p(x, t) is accurate. Once this has happened, it is then hard to change back to the right track and, importantly, computational effort will be wasted. One straightforward way to reduce the sampling noise is to sample many more individuals to carry out replacement for the construction of the follow-up generation. This is the so-called multiple sampling strategy.

It should be pointed out that if optimisation problems do not obey the POP assumption (e.g., optimisation problems with discontinuous search domains), that is, if the commonality among the local optima and the global optimum does not exist, the later search cannot obtain any benefit from the best solution found so far. Consequently, the search may be inefficient, since it will rely on EA and LS entirely.

3.4.  The Specific Algorithm

In this section, we embed the developed components described in Sections 3.23.3 to build the algorithm used in this paper.

3.4.1.  The EDA

To construct an EDA, as seen in Algorithm 2, we first need to choose a selection method. Here, we apply the well-known truncation selection. In truncation selection, the fitness values of individuals in the population are sorted, and the individuals with the best fitness values are selected. We use the adaptive multivariate probability model described in Algorithm 3 to carry out modeling. The sampling method described in Algorithm 4 is applied to create new offspring. Note that there is a control parameter in the sampling method. We fix it in each cycle and adjust it adaptively at the end of each cycle as follows:

  • if the current cycle found a better solution than the previous ones

  • then set ;

  • else set ;

  • end if

That is, when the current cycle has found a better solution (), we will set a bigger value for the following cycle. The reasoning behind this is that we have more confidence that there are more similar components between the best solution and the global optimum under the POP assumption. If the current cycle cannot find a better solution, then it means our choice of is not proper. In this case, decreases by (ensuring is greater than zero). If no better solution is found in the new cycle of search, will be decreased again. Obviously, cannot be negative. In the first cycle, is set to zero. This is to make the algorithm explore the search space as much as possible. Moreover, if in some consecutive cycles, there are no better solutions found, could decrease to zero, which means that new search areas are to be explored.

As discussed in Section 3.3, it is appropriate to sample a large number of offspring to reduce the sampling noise. In our implementation, we first introduce an indicator vector in Selection. That is, if a solution j is selected, ij≔1, otherwise ij≔0. We combine the sampling and replacement together as show in Algorithm 5.
formula

3.4.2.  The Restart Mechanism

In our implementation, we apply a simple diversification scheme. That is, we randomly generate a set of individuals with size . The M individuals that have the largest distances to are selected to form the initial population of the (c+1)th cycle. The distance between a randomly generated individual x and the set of visited local optima is defined as:
formula
where is the Euclidean distance (but other distance measures may be used).

3.4.3.  The Local Search

To carry out local search, we use an iterative derivative-free unconstrained optimisation algorithm developed by Powell (2006, 2008), called NEWUOA. Of course, we do not rule out the application of other local search algorithms.

Essentially, NEWUOA is an iterative trust-region method. At the kth step, it constructs a local quadratic model by interpolating a set of m solutions (initially sampled in the radius () of the initial solution xini). The local model within the trust region, which is a disc of radius , is optimized. Based on the quality of the obtained optimal solution , the trust region radius is either expanded or contracted. The local model is updated by minimizing the Frobenius norm of the change to the second derivative of . The local improvement continues until the step size , in which case we decrease . The algorithm terminates if is reached.

From this brief description of NEWUOA, we can see that the parameters of NEWUOA are , , and an integer , apart from the initial solution xini. m=2n+1 is used in our application (as recommended in Powell, 2008), is set as , and (which can be used to control the final accuracy of the local optima) is set as 10-10. NEWUOA does not need any gradient information and hence is applicable to a large set of continuous optimisation problems. Readers are referred to Powell (2008) for a detailed description of NEWUOA.

4.  Experimental Studies

In this section, we experimentally study the performance of IMMA. The experimental studies are divided into three parts. First, we carry out experiments to study the effects of the algorithmic components and parameters on the performance of IMMA. The algorithmic parameters include the population size M, the selected population size K, the sampling multiplication number L, the restart criterion number T, and the expansion parameter . The expansion in each dimension of the search space is set to be . This is set to eliminate the influence of the variable scales in different coordinates. Among these parameters, we found that the most important ones are M and L, while K and T can simply be set as K=M/2 and T = 5, respectively.

Second, we compared the developed algorithm with a diverse set of evolutionary algorithms in literature. Finally, scalability analysis was carried out to study the effect of the dimensionality of the problem on the performance of the algorithm.

4.1.  Test Suite and Performance Evaluation Criteria

We test our algorithm on a set of 40 commonly used test functions in the literature. Appendix  A lists the descriptions of the functions and their respective characteristics including the global optimum , dimensionality, and the characteristics, such as epistasis, modality, and discontinuity. The test functions f1 through f10, f26 through f34, and f37, which are the same as used in Noman and Iba (2008), are applied to carry out the component analysis. In component analysis, to evaluate the performance of the algorithms, the performance criteria applied in Noman and Iba (2008) and Suganthan et al. (2005) are adopted. The analysis is based on the statistics of the criteria within 25 trials. In all cases, the maximal number of fitness evaluations (NFE) to carry out the trials is 10,000 where n is the dimension of the problem. The criteria can be summarised as follows:

  • Error Criterion. The function error value (FEV) is defined as , the difference between the objective values (fitness) of the global optimum of the function and a solution x. We record the minimal FEV that the algorithm can find within the maximal NFE, and calculate the mean (AVGE) and standard deviation (SD) of these FEVs. The notation AVGESDE is used to summarise the criterion.

  • NFE Criterion. If the minimal FEV is less than an accuracy level within NFE, the trial is considered to be successful. We count the number of successful trials, and denote the number as SUC. The number of fitness evaluations required for the algorithm to be successful is recorded, and the mean and standard deviation are calculated, and denoted by AVGN and SDN. The accuracy level is fixed to 10-6 for f1 through f5 and f26 through f40, to 10-2 for f6 through f16 and to 10-1 for f17 through f25 as used in Suganthan et al. (2005). The notation used here is AVGNSDN.

  • Convergence Graphs. The graphs show the evolution process of the algorithms, that is, the average error performance against the generations and/or NFEs.

  • Hypothesis Test. Hypothesis test (either the rank sum test or the Z test) is applied in the following experiments at the 5% significance level. A p value less than .05 suggests a significant difference.

NFEs used by IMMA include the NFEs consumed by NEWUOA and the developed EDA. The performance comparisons of the various EAs used varying performance criteria which will be described specifically in each context.

4.2.  Component Analysis

4.2.1.  Sensitivity to Population Size

The population size is an important factor to the performance of EAs. Large population sizes tend to encourage the preservation of diversity and hence improve the exploration ability of EAs. However, a large population size will make EAs exhaust the fitness evaluations too quickly. Therefore, there should be a trade-off between the exploration capability and the computational cost used for the exploration. To investigate the effect of the population size on the IMMA performance, we apply the IMMA with varied population size on the test functions with n = 30. The adaptive multivariate model and the multiple sampling strategy with L = 3 are applied. The population size is set as M=kn, where . The experimental results are listed in Table 1, where best results are shown in bold type. The upper part of the table lists the NFE criterion obtained for those functions in which all runs reach the accuracy levels, while the lower part of the table shows the error criterion obtained for those functions that IMMA cannot reach the accuracy level in all 25 trials. Table 2 lists the average number of cycles in which IMMA reaches the accuracy level, the number of success trials, and the p values obtained from the rank sum test between the results for M=60 and other population sizes.

Table 1:
The experimental results obtained by the IMMA with varying population size M for the test functions with n=30, while NFEs are shown for the functions that can be solved successfully in the 25 trials, and the best error values are shown for the hard functions.
FunctionM = 30M = 60M=150M=300
AVGNSDN     
f1 2.86e+03 2.08e+02 6.55e+03 6.27e+02 1.52e+04 5.83e+03 3.03e+04 1.17e+03 
f2 8.68e+03 9.55e+02 1.01e+04 1.03e+03 1.33e+04 1.75e+03 1.90e+04 3.70e+03 
f3 2.95e+04 2.05e+03 3.13e+04 2.57e+03 3.32e+04 2.24e+03 3.80e+04 2.83e+03 
f6 2.10e+04 1.84e+04 3.11e+04 3.03e+04 4.13e+04 3.02e+04 6.01e+04 2.97e+04 
f7 6.55e+03 4.09e+03 1.02e+04 6.28e+03 5.11e+04 1.06e+04 9.61e+04 5.16e+04 
f9 3.79e+04 1.12e+04 3.94e+04 1.46e+04 3.90e+04 9.43e+03 4.41e+04 7.93e+03 
f10 5.47e+04 1.45e+04 4.20e+04 1.61e+04 3.97e+04 1.63e+04 5.19e+04 1.64e+04 
f28 2.87e+03 1.36e+02 6.30e+03 3.56e+02 1.85e+04 1.36e+03 4.05e+04 3.70e+03 
f29 2.83e+04 8.77e+03 2.67e+04 6.21e+03 3.93e+04 9.28e+03 5.09e+04 1.72e+04 
f31 4.35e+04 1.19e+04 4.69e+04 1.46e+04 5.60e+04 1.66e+04 6.81e+04 1.23e+04 
f33 2.73e+04 1.23e+04 2.02e+04 9.15e+03 2.67e+04 8.68e+03 4.28e+04 3.43e+03 
f37 1.19e+04 7.70e+03 1.93e+04 1.65e+04 4.56e+04 5.93e+03 6.85e+04 4.63e+04 
AVGESDE     
f4 1.31e+05 2.35e+04 2.94e+04 6.16e+03 5.39e+04 3.59e+04 6.23e+04 5.35e+03 
f5 1.67e+03 3.78e+02 1.68e+03 3.27e+02 2.05e+03 2.20e+02 2.20e+03 1.97e+02 
f8 2.00e+01 1.28e–10 2.00e+01 1.12e–10 2.00e+01 1.77e–10 2.00e+01 1.37e–10 
f26 2.92e–01 4.00e–02 2.84e–01 4.73e–02 3.80e–01 7.07e–02 3.84e–01 6.88e–02 
f27 4.13e+01 5.32e+01 4.13e+01 4.97e+01 1.31e+02 7.13e+01 1.93e+02 1.08e+02 
f30 9.72e–14 7.77e–14 1.63e–13 3.26e–13 2.86e–03 4.37e–03 4.24e–03 5.88e–03 
f32 3.82e–04 1.01e–12 3.82e–04 7.00e–13 3.82e–04 7.59e–13 3.82e–04 1.34e–12 
f34 6.61e–13 4.42e–13 5.56e–13 4.21e–13 9.98e–08 2.76e–08 8.79e–04 3.04e–04 
FunctionM = 30M = 60M=150M=300
AVGNSDN     
f1 2.86e+03 2.08e+02 6.55e+03 6.27e+02 1.52e+04 5.83e+03 3.03e+04 1.17e+03 
f2 8.68e+03 9.55e+02 1.01e+04 1.03e+03 1.33e+04 1.75e+03 1.90e+04 3.70e+03 
f3 2.95e+04 2.05e+03 3.13e+04 2.57e+03 3.32e+04 2.24e+03 3.80e+04 2.83e+03 
f6 2.10e+04 1.84e+04 3.11e+04 3.03e+04 4.13e+04 3.02e+04 6.01e+04 2.97e+04 
f7 6.55e+03 4.09e+03 1.02e+04 6.28e+03 5.11e+04 1.06e+04 9.61e+04 5.16e+04 
f9 3.79e+04 1.12e+04 3.94e+04 1.46e+04 3.90e+04 9.43e+03 4.41e+04 7.93e+03 
f10 5.47e+04 1.45e+04 4.20e+04 1.61e+04 3.97e+04 1.63e+04 5.19e+04 1.64e+04 
f28 2.87e+03 1.36e+02 6.30e+03 3.56e+02 1.85e+04 1.36e+03 4.05e+04 3.70e+03 
f29 2.83e+04 8.77e+03 2.67e+04 6.21e+03 3.93e+04 9.28e+03 5.09e+04 1.72e+04 
f31 4.35e+04 1.19e+04 4.69e+04 1.46e+04 5.60e+04 1.66e+04 6.81e+04 1.23e+04 
f33 2.73e+04 1.23e+04 2.02e+04 9.15e+03 2.67e+04 8.68e+03 4.28e+04 3.43e+03 
f37 1.19e+04 7.70e+03 1.93e+04 1.65e+04 4.56e+04 5.93e+03 6.85e+04 4.63e+04 
AVGESDE     
f4 1.31e+05 2.35e+04 2.94e+04 6.16e+03 5.39e+04 3.59e+04 6.23e+04 5.35e+03 
f5 1.67e+03 3.78e+02 1.68e+03 3.27e+02 2.05e+03 2.20e+02 2.20e+03 1.97e+02 
f8 2.00e+01 1.28e–10 2.00e+01 1.12e–10 2.00e+01 1.77e–10 2.00e+01 1.37e–10 
f26 2.92e–01 4.00e–02 2.84e–01 4.73e–02 3.80e–01 7.07e–02 3.84e–01 6.88e–02 
f27 4.13e+01 5.32e+01 4.13e+01 4.97e+01 1.31e+02 7.13e+01 1.93e+02 1.08e+02 
f30 9.72e–14 7.77e–14 1.63e–13 3.26e–13 2.86e–03 4.37e–03 4.24e–03 5.88e–03 
f32 3.82e–04 1.01e–12 3.82e–04 7.00e–13 3.82e–04 7.59e–13 3.82e–04 1.34e–12 
f34 6.61e–13 4.42e–13 5.56e–13 4.21e–13 9.98e–08 2.76e–08 8.79e–04 3.04e–04 
Table 2:
The number of cycles and success trials to reach the accuracy level and the p values obtained from the rank sum test between the results obtained by the IMMA with M=60 and the other population sizes.
pm
FunctionM = 30M = 60M=150M=300M=30,60M=60,150M = 60,300
Number of cycles      
f1 1.0 1.0 1.0 1.0 <.001 <.001 <.001 
f2 1.0 1.0 1.0 1.0 <.001 <.001 <.001 
f3 1.0 1.0 1.0 1.0 .011 .010 <.001 
f6 1.3 1.0 1.0 1.0 0.167 .045 .002 
f7 2.1 1.5 1.0 1.0 .008 <.001 <.001 
f9 15.9 13.2 8.1 5.8 0.687 0.909 0.170 
f10 16.5 11.5 9.3 6.5 .007 0.620 .042 
f28 1.0 1.0 1.0 1.0 <.001 <.001 <.001 
f29 4.4 3.5 2.9 2.8 0.464 .000 <.001 
f31 14.1 10.1 7.9 6.6 0.376 .051 <.001 
f32 20.3 15.8 12.1 10.5 <.001 .001 <.001 
f33 3.8 2.6 2.5 1.1 .029 .017 <.001 
f37 3.5 2.1 1.6 1.2 .053 <.001 <.001 
Number of success trials      
f4 <.001 .003 <.001 
f5 0.843 <.001 <.001 
f8 1.000 1.000 1.000 
f26 0.525 <.001 .000 
f27 1.000 <.001 <.001 
f30 25 25 17 15 0.336 .003 .001 
f34 25 25 25 23 0.398 <.001 <.001 
pm
FunctionM = 30M = 60M=150M=300M=30,60M=60,150M = 60,300
Number of cycles      
f1 1.0 1.0 1.0 1.0 <.001 <.001 <.001 
f2 1.0 1.0 1.0 1.0 <.001 <.001 <.001 
f3 1.0 1.0 1.0 1.0 .011 .010 <.001 
f6 1.3 1.0 1.0 1.0 0.167 .045 .002 
f7 2.1 1.5 1.0 1.0 .008 <.001 <.001 
f9 15.9 13.2 8.1 5.8 0.687 0.909 0.170 
f10 16.5 11.5 9.3 6.5 .007 0.620 .042 
f28 1.0 1.0 1.0 1.0 <.001 <.001 <.001 
f29 4.4 3.5 2.9 2.8 0.464 .000 <.001 
f31 14.1 10.1 7.9 6.6 0.376 .051 <.001 
f32 20.3 15.8 12.1 10.5 <.001 .001 <.001 
f33 3.8 2.6 2.5 1.1 .029 .017 <.001 
f37 3.5 2.1 1.6 1.2 .053 <.001 <.001 
Number of success trials      
f4 <.001 .003 <.001 
f5 0.843 <.001 <.001 
f8 1.000 1.000 1.000 
f26 0.525 <.001 .000 
f27 1.000 <.001 <.001 
f30 25 25 17 15 0.336 .003 .001 
f34 25 25 25 23 0.398 <.001 <.001 

The upper part of Table 1 indicates that the larger the population size, the more NFEs are required to reach the accuracy level. Moreover, the lower part of the table shows that (1) an IMMA with a small population size (M=30, 60) can find better solutions than an IMMA with a large population size except for f8,32 where the IMMAs with different M obtain similar results; (2) for f30,34, the IMMA with a small population size achieves a higher success rate than the IMMA with a large population size (M=150, 300). These observations show that a small population size is preferred over a large population size.

Moreover, comparing the IMMAs with population sizes M = 30 and 60 (i.e., k=1, 2), we see that the IMMA with k = 1 requires less computational cost to reach the accuracy level except for f10,29,33. However, the computational cost of the IMMA with k = 2 is still very competitive. Table 1 also indicates that the IMMA with k = 2 obtains better optimal solutions than the IMMA with k = 1 on the functions that cannot be solved successfully. Note that most of the functions in the upper part of Table 1 have no epistasis except for f6,7,10,37, while most of these functions in the lower part have high epistasis (see Table A1 in the appendix). This shows that the IMMA with k = 2 can deal with high-epistasis functions better than the IMMA with k = 1. Hence, k = 2 should be preferred in case we do not have any prior knowledge about the characteristics of the optimisation problems.

It can be seen from the upper part of Table 2 that the number of cycles has a tendency to decrease along with an increase of population size. This indicates that, with a large population, a large number of iterations and thereafter a large number of fitness evaluations are required to complete a cycle. Consequently, only a small number of cycles (equivalently, local searches) can be carried out within a fixed number of fitness evaluations. The superior performance of the IMMA with a small population size implies that more local searches are required in order to achieve better performance.

Figure 3 displays the convergence graphs of IMMA on some selected functions. On the x axis is the number of fitness evaluations, and on the y axis is the FEVs. From Figure 3, we can see that usually small population sizes (M=30, 60) require less NFEs to reach the accuracy level (e.g., Figure 3[a,b,c]) than with large population sizes (M=150, 300), while Figure 3(d) shows that small population sizes usually obtain better FEV.

Figure 3:

Convergence curve of IMMA with variations of population size for selected functions (n=30). The x axis shows the number of fitness evaluations. The y axis shows the error values.

Figure 3:

Convergence curve of IMMA with variations of population size for selected functions (n=30). The x axis shows the number of fitness evaluations. The y axis shows the error values.

4.2.2.  Effects of Selection

Another important factor affecting the performance of EAs is the selection operator. Note that under the IMMA framework, we need to speed up the exploration of EA. The selection method plays an important role in the convergence performance of EA. In this section, we experimentally compare two algorithms, IMMA-P (with proportional selection) and IMMA-T (with truncation selection) on the functions with n=30 to carry out analysis of different selection operators. In our implementation for fitness proportional selection, the probability that an individual with fitness f is selected is proportional to exp(−f) (in case of minimization).

To apply the two algorithms, the parameters are set as M=60 and L=3. Table 3 lists the best error criterion and the NFE criterion, respectively, where best results are shown in bold type. In the table, columns pv and pn list the p values of the rank sum test on the best error criterion and the NFE criterion, respectively. From the table, it can be seen that at a 5% significance level, IMMA-P reaches better performances than IMMA-T on only one test function, f37, while on three functions, f2,8,29, the two algorithms perform similarly. Thus, we can conclude that IMMA-T outperforms IMMA-P on average in terms of solution quality and computational cost. The experimental result justifies that truncation selection is significantly better than proportional selection under the IMMA framework. Figure 4 shows the convergence curve comparing IMMA-P and IMMA-T on some selected functions. The plots show that IMMA-T has improved convergence characteristics over IMMA-P.

Table 3:
Evaluation criteria for fitness proportional and truncation selection at n=30. IMMA-P denotes the IMMA with proportional selection, while IMMA-T denotes IMMA with truncation selection. Columns pv and pn list the p values of the rank sum test on the results of the best error criterion and the NFE criterion, respectively. — denotes that the rank sum test is not necessary either because both algorithms are fully successful (no pv is available) or fully failed (no pn is available). denotes that the algorithm stops at the maximum NFEs, that is, .
IMMA-PIMMA-T
FunctionAVGESDEAVGNSDN(SUC)AVGESDEAVGNSDN(SUC)pvpn
f1 0.00e+00 0.00e+00 7.20e+03 4.74e+02(25) .00e+00 0.00e+00 6.55e+03 6.27e+02(25) — <.001 
f2 7.03e–12 1.36e–11 9.51e+03 1.19e+03(25) 6.22e–12 2.57e–12 1.00e+04 1.03e+03(25) — .133 
f3 3.03e–08 5.36e–07 3.95e+04 2.12e+03(25) 1.51e–08 1.54e–08 3.13e+04 2.57e+03(25) — <.001 
f28 2.11e–32 4.00e–32 8.72e+03 7.52e+03(25) 1.93e–34 2.85e–34 6.30e+03 3.56e+02(25) — <.001 
f29 2.20e–14 7.11e–15 2.78e+04 1.66e+04(25) 2.11e–14 7.80e–15 2.67e+04 6.21e+03(25) — .759 
f30 9.78e–14 8.75e–14 9.14e+03 7.23e+02(25) 1.63e–13 3.26e–13 7.68e+03 3.28e+03(25) — .040 
f32 4.29e–04 1.63e–04 3.13e+04 1.47e+04(25) 3.82e–04 7.00e–13 1.47e+04 3.74e+03(25) — <.001 
f33 8.42e–14 2.26e–13 9.42e+04 7.78e+04(25) 3.09e–14 3.02e–14 2.02e+04 9.15e+03(25) — <.001 
f37 2.96e–10 1.30e–09 1.37e+04 7.36e+03(25) 4.73e–11 4.06e–11 1.93e+04 1.65e+03(25) — .001 
f4 6.52e+04 9.10e+03 2.94e+04 6.16e+03 <.001 — 
f5 2.61e+03 8.78e+02 1.68e+03 3.27e+02 <.001 — 
f8 2.00e+01 1.88e–10 2.00e+01 1.12e–10 1.000 — 
f26 1.87e+00 2.12e–01 2.84e–01 4.73e–02 <.001 — 
f27 5.00e+02 9.51e+01 4.13e+01 4.97e+01 <.001 — 
f6 1.91e+03 8.78e+02 3.18e–10 2.56e–10 3.11e+04 3.03e+04(25) <.001 <0.000 
f7 3.54e+04 8.78e+03 1.31e–12 4.79e–12 1.02e+04 6.28e+03(25) <.001 <.001 
f9 2.43e+00 2.17e+00 2.28e+05 6.55e+04(7) 1.09e–03 3.03e–03 3.94e+04 1.46e+04(25) <.001 <.001 
f10 3.43e+01 6.87e–01 2.59e+05 9.85e+04(5) 1.90e–03 3.48e–03 4.20e+04 1.61e+04(25) <.001 <.001 
f31 3.46e+00 2.26e+00 2.90e+05 3.79e+04(2) 3.69e–11 1.10e–10 4.69e+04 1.46e+04(25) <.001 <.001 
f34 2.41e–02 5.22e–02 2.21e+05 5.05e+04(13) 5.56e–13 4.21e–13 4.02e+04 2.22e+04(25) .030 <.001 
IMMA-PIMMA-T
FunctionAVGESDEAVGNSDN(SUC)AVGESDEAVGNSDN(SUC)pvpn
f1 0.00e+00 0.00e+00 7.20e+03 4.74e+02(25) .00e+00 0.00e+00 6.55e+03 6.27e+02(25) — <.001 
f2 7.03e–12 1.36e–11 9.51e+03 1.19e+03(25) 6.22e–12 2.57e–12 1.00e+04 1.03e+03(25) — .133 
f3 3.03e–08 5.36e–07 3.95e+04 2.12e+03(25) 1.51e–08 1.54e–08 3.13e+04 2.57e+03(25) — <.001 
f28 2.11e–32 4.00e–32 8.72e+03 7.52e+03(25) 1.93e–34 2.85e–34 6.30e+03 3.56e+02(25) — <.001 
f29 2.20e–14 7.11e–15 2.78e+04 1.66e+04(25) 2.11e–14 7.80e–15 2.67e+04 6.21e+03(25) — .759 
f30 9.78e–14 8.75e–14 9.14e+03 7.23e+02(25) 1.63e–13 3.26e–13 7.68e+03 3.28e+03(25) — .040 
f32 4.29e–04 1.63e–04 3.13e+04 1.47e+04(25) 3.82e–04 7.00e–13 1.47e+04 3.74e+03(25) — <.001 
f33 8.42e–14 2.26e–13 9.42e+04 7.78e+04(25) 3.09e–14 3.02e–14 2.02e+04 9.15e+03(25) — <.001 
f37 2.96e–10 1.30e–09 1.37e+04 7.36e+03(25) 4.73e–11 4.06e–11 1.93e+04 1.65e+03(25) — .001 
f4 6.52e+04 9.10e+03 2.94e+04 6.16e+03 <.001 — 
f5 2.61e+03 8.78e+02 1.68e+03 3.27e+02 <.001 — 
f8 2.00e+01 1.88e–10 2.00e+01 1.12e–10 1.000 — 
f26 1.87e+00 2.12e–01 2.84e–01 4.73e–02 <.001 — 
f27 5.00e+02 9.51e+01 4.13e+01 4.97e+01 <.001 — 
f6 1.91e+03 8.78e+02 3.18e–10 2.56e–10 3.11e+04 3.03e+04(25) <.001 <0.000 
f7 3.54e+04 8.78e+03 1.31e–12 4.79e–12 1.02e+04 6.28e+03(25) <.001 <.001 
f9 2.43e+00 2.17e+00 2.28e+05 6.55e+04(7) 1.09e–03 3.03e–03 3.94e+04 1.46e+04(25) <.001 <.001 
f10 3.43e+01 6.87e–01 2.59e+05 9.85e+04(5) 1.90e–03 3.48e–03 4.20e+04 1.61e+04(25) <.001 <.001 
f31 3.46e+00 2.26e+00 2.90e+05 3.79e+04(2) 3.69e–11 1.10e–10 4.69e+04 1.46e+04(25) <.001 <.001 
f34 2.41e–02 5.22e–02 2.21e+05 5.05e+04(13) 5.56e–13 4.21e–13 4.02e+04 2.22e+04(25) .030 <.001 
Figure 4:

Convergence curves of the IMMA with different selection operations for selected functions f5,10,30,31 (n=30). The x axis shows the number of fitness evaluations. The y axis shows the best error values.

Figure 4:

Convergence curves of the IMMA with different selection operations for selected functions f5,10,30,31 (n=30). The x axis shows the number of fitness evaluations. The y axis shows the best error values.

4.2.3.  Study on the Multiple Sampling Strategy

In this section, we experimentally analyze the effect of the multiple sampling strategy on the performance of IMMA. To carry out this study, we apply the IMMA with different sampling number L on the functions. We are interested in whether the multiple sampling strategy is beneficial to the exploration capability of IMMA. The algorithmic parameter settings in the experiments are M = 60. L is varied from 1 to 5. Table 4 lists the experimental results obtained by the IMMA with different L settings in terms of the best error values and NFEs. In the table, the best results are shown in bold type. The p values of the rank sum test on the results of L=2 and L=3 against the other L values are listed in Table 5. From Table 5 we see that:

  1. On the functions that cannot be solved successfully, the IMMA with sampling size L = 1 has the best performance in terms of error value on only one function (f27), but this performance difference is not significant. Moreover, it can be seen that the IMMA with a large sampling size (L=4, 5) does not necessarily result in better FEVs.

  2. In terms of efficiency (NFEs), the IMMAs with L=2, 3 require fewer fitness evaluations to reach the accuracy level on all the multimodal functions except f6,31 (functions f1 through f3 and f28 are unimodal).

Table 4:
NFEs obtained for the selected functions with n = 30 that can be successfully solved and the best error values for the functions that cannot be solved by the IMMA with varied sampling size L.
FunctionL=1L = 2L=3L=4L=5
AVGSDN      
f1 3.08e+03 5.35e+02 5.15e+03 3.28e+02 6.55e+03 6.27e+02 8.00e+03 5.43e+02 9.31e+03 5.46e+02 
f2 1.28e+03 8.01e+02 1.38e+03 1.17e+03 1.01e+04 1.03e+03 1.16e+04 2.38e+03 1.26e+04 3.71e+03 
f3 3.21e+04 2.63e+03 3.04e+04 2.11e+03 3.13e+04 2.57e+03 3.27e+04 3.48e+03 3.29e+04 2.88e+03 
f6 1.43e+04 8.91e+03 2.18e+04 1.60e+04 3.11e+04 3.03e+04 4.74e+04 4.94e+04 3.27e+04 2.48e+04 
f7 1.71e+04 5.88e+03 1.77e+04 1.49e+04 1.02e+04 6.28e+03 6.19e+04 7.96e+04 4.64e+04 5.82e+04 
f9 4.11e+04 1.38e+04 3.51e+04 1.07e+04 3.94e+04 1.46e+04 3.99e+04 1.12e+04 4.25e+04 1.67e+04 
f10 4.96e+04 1.61e+04 4.66e+04 1.39e+04 4.20e+04 1.61e+04 4.63e+04 1.57e+04 4.42e+04 1.33e+04 
f28 3.09e+03 6.15e+02 4.83e+03 3.64e+02 6.30e+03 3.56e+02 7.93e+03 4.23e+02 9.17e+03 6.05e+02 
f29 2.69e+04 6.39e+03 2.69e+04 6.25e+03 2.67e+04 6.21e+03 2.64e+04 5.91e+03 2.70e+04 6.63e+03 
f30 2.65e+04 9.18e+03 2.11e+04 5.24e+04 7.68e+03 3.28e+03 3.57e+04 6.89e+04 3.48e+04 6.37e+04 
f31 4.07e+04 1.42e+04 4.40e+04 9.10e+03 4.69e+04 1.46e+04 5.24e+04 1.83e+04 5.38e+04 1.26e+04 
f32 1.48e+04 3.64e+03 1.37e+04 2.56e+03 1.47e+04 3.74e+03 1.72e+04 3.56e+03 1.95e+04 4.06e+03 
f33 3.02e+04 1.60e+04 2.43e+04 1.25e+04 2.02e+04 9.15e+03 2.01e+04 9.74e+03 2.38e+04 1.37e+04 
f34 3.91e+04 2.00e+04 3.79e+04 2.67e+04 4.02e+04 2.72e+04 3.79e+04 1.49e+04 4.16e+04 2.11e+04 
f37 2.04e+04 4.34e+03 2.53e+04 9.95e+03 1.93e+04 1.65e+04 2.12e+04 1.41e+04 2.39e+04 2.32e+04 
AVGSDE      
f4 2.10e+05 1.94e+04 1.11e+05 1.63e+04 2.94e+04 6.16e+03 2.80e+05 3.18e+04 2.52e+05 2.18e+04 
f5 2.47e+03 5.46e+02 1.74e+03 3.19e+02 1.68e+03 3.27e+02 1.77e+03 3.17e+02 1.69e+03 2.37e+02 
f8 2.00e+01 1.18e–10 2.00e+01 1.29e–10 2.00e+01 1.12e–10 2.00e+01 1.20e–10 2.00e+01 1.86e–10 
f26 3.92e–01 5.34e–02 3.40e–01 5.00e–02 2.84e–01 4.73e–02 2.94e–01 4.90e–02 2.96e–01 4.36e–02 
f27 3.75e+01 4.11e+01 3.79e+01 4.57e+01 4.13e+01 4.97e+01 4.93e+01 5.11e+01 3.92e+01 4.72e+01 
FunctionL=1L = 2L=3L=4L=5
AVGSDN      
f1 3.08e+03 5.35e+02 5.15e+03 3.28e+02 6.55e+03 6.27e+02 8.00e+03 5.43e+02 9.31e+03 5.46e+02 
f2 1.28e+03 8.01e+02 1.38e+03 1.17e+03 1.01e+04 1.03e+03 1.16e+04 2.38e+03 1.26e+04 3.71e+03 
f3 3.21e+04 2.63e+03 3.04e+04 2.11e+03 3.13e+04 2.57e+03 3.27e+04 3.48e+03 3.29e+04 2.88e+03 
f6 1.43e+04 8.91e+03 2.18e+04 1.60e+04 3.11e+04 3.03e+04 4.74e+04 4.94e+04 3.27e+04 2.48e+04 
f7 1.71e+04 5.88e+03 1.77e+04 1.49e+04 1.02e+04 6.28e+03 6.19e+04 7.96e+04 4.64e+04 5.82e+04 
f9 4.11e+04 1.38e+04 3.51e+04 1.07e+04 3.94e+04 1.46e+04 3.99e+04 1.12e+04 4.25e+04 1.67e+04 
f10 4.96e+04 1.61e+04 4.66e+04 1.39e+04 4.20e+04 1.61e+04 4.63e+04 1.57e+04 4.42e+04 1.33e+04 
f28 3.09e+03 6.15e+02 4.83e+03 3.64e+02 6.30e+03 3.56e+02 7.93e+03 4.23e+02 9.17e+03 6.05e+02 
f29 2.69e+04 6.39e+03 2.69e+04 6.25e+03 2.67e+04 6.21e+03 2.64e+04 5.91e+03 2.70e+04 6.63e+03 
f30 2.65e+04 9.18e+03 2.11e+04 5.24e+04 7.68e+03 3.28e+03 3.57e+04 6.89e+04 3.48e+04 6.37e+04 
f31 4.07e+04 1.42e+04 4.40e+04 9.10e+03 4.69e+04 1.46e+04 5.24e+04 1.83e+04 5.38e+04 1.26e+04 
f32 1.48e+04 3.64e+03 1.37e+04 2.56e+03 1.47e+04 3.74e+03 1.72e+04 3.56e+03 1.95e+04 4.06e+03 
f33 3.02e+04 1.60e+04 2.43e+04 1.25e+04 2.02e+04 9.15e+03 2.01e+04 9.74e+03 2.38e+04 1.37e+04 
f34 3.91e+04 2.00e+04 3.79e+04 2.67e+04 4.02e+04 2.72e+04 3.79e+04 1.49e+04 4.16e+04 2.11e+04 
f37 2.04e+04 4.34e+03 2.53e+04 9.95e+03 1.93e+04 1.65e+04 2.12e+04 1.41e+04 2.39e+04 2.32e+04 
AVGSDE      
f4 2.10e+05 1.94e+04 1.11e+05 1.63e+04 2.94e+04 6.16e+03 2.80e+05 3.18e+04 2.52e+05 2.18e+04 
f5 2.47e+03 5.46e+02 1.74e+03 3.19e+02 1.68e+03 3.27e+02 1.77e+03 3.17e+02 1.69e+03 2.37e+02 
f8 2.00e+01 1.18e–10 2.00e+01 1.29e–10 2.00e+01 1.12e–10 2.00e+01 1.20e–10 2.00e+01 1.86e–10 
f26 3.92e–01 5.34e–02 3.40e–01 5.00e–02 2.84e–01 4.73e–02 2.94e–01 4.90e–02 2.96e–01 4.36e–02 
f27 3.75e+01 4.11e+01 3.79e+01 4.57e+01 4.13e+01 4.97e+01 4.93e+01 5.11e+01 3.92e+01 4.72e+01 
Table 5:
The p values obtained through the rank sum test on the experimental results obtained by L=2 and L=3 with the other L values.
pL=2pL=3
FunctionL=1L=3L=4L=5L=1L=2L=4L=5
f1 <0.001 <0.001 <0.001 <0.001 <0.001 <0.001 0.000 <0.001 
f2 0.727 <0.001 <0.001 <0.001 <0.001 <0.001 0.008 0.003 
f3 0.019 0.189 0.009 0.002 0.287 0.189 0.119 0.049 
f6 0.049 0.190 0.022 0.079 0.014 0.190 0.173 0.842 
f7 0.863 0.028 0.012 0.025 <0.001 0.028 0.004 0.005 
f9 0.098 0.242 0.134 0.074 0.682 0.242 0.899 0.497 
f10 0.493 0.291 0.932 0.539 0.110 0.291 0.356 0.605 
f28 <0.001 <0.001 <0.001 <0.001 <0.001 <0.001 <0.001 <0.001 
f29 0.982 0.947 0.805 0.926 0.929 0.947 0.858 0.875 
f30 0.614 0.215 0.405 0.412 <0.001 0.215 0.053 0.044 
f31 0.330 0.418 0.052 0.004 0.142 0.418 0.251 0.084 
f32 0.257 0.292 0.001 <0.001 0.956 0.292 0.025 <0.001 
f33 0.160 0.195 0.200 0.898 0.012 0.195 0.987 0.279 
f34 0.858 0.769 0.997 0.587 0.877 0.769 0.721 0.831 
f37 0.034 0.135 0.253 0.782 0.752 0.135 0.662 0.433 
f4 <0.001 <0.001 <0.001 <0.001 <0.001 <0.001 <0.001 <0.001 
f5 <0.001 0.518 0.742 0.535 <0.001 0.518 0.333 0.902 
f8 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 
f26 0.002 <0.001 0.003 0.003 <0.001 <0.001 0.470 0.360 
f27 0.974 0.803 0.414 0.922 0.771 0.803 0.580 0.880 
pL=2pL=3
FunctionL=1L=3L=4L=5L=1L=2L=4L=5
f1 <0.001 <0.001 <0.001 <0.001 <0.001 <0.001 0.000 <0.001 
f2 0.727 <0.001 <0.001 <0.001 <0.001 <0.001 0.008 0.003 
f3 0.019 0.189 0.009 0.002 0.287 0.189 0.119 0.049 
f6 0.049 0.190 0.022 0.079 0.014 0.190 0.173 0.842 
f7 0.863 0.028 0.012 0.025 <0.001 0.028 0.004 0.005 
f9 0.098 0.242 0.134 0.074 0.682 0.242 0.899 0.497 
f10 0.493 0.291 0.932 0.539 0.110 0.291 0.356 0.605 
f28 <0.001 <0.001 <0.001 <0.001 <0.001 <0.001 <0.001 <0.001 
f29 0.982 0.947 0.805 0.926 0.929 0.947 0.858 0.875 
f30 0.614 0.215 0.405 0.412 <0.001 0.215 0.053 0.044 
f31 0.330 0.418 0.052 0.004 0.142 0.418 0.251 0.084 
f32 0.257 0.292 0.001 <0.001 0.956 0.292 0.025 <0.001 
f33 0.160 0.195 0.200 0.898 0.012 0.195 0.987 0.279 
f34 0.858 0.769 0.997 0.587 0.877 0.769 0.721 0.831 
f37 0.034 0.135 0.253 0.782 0.752 0.135 0.662 0.433 
f4 <0.001 <0.001 <0.001 <0.001 <0.001 <0.001 <0.001 <0.001 
f5 <0.001 0.518 0.742 0.535 <0.001 0.518 0.333 0.902 
f8 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 
f26 0.002 <0.001 0.003 0.003 <0.001 <0.001 0.470 0.360 
f27 0.974 0.803 0.414 0.922 0.771 0.803 0.580 0.880 

From these observations, we see that relatively more individuals (e.g., L=2, 3) sampled from the probability model can indeed improve exploration capability on average. However, a large L (e.g., 5) usually needs more fitness evaluations to reach the accuracy level. Therefore, we may conclude that the multiple sampling strategy with a moderate L can favour evolutionary search. Since it is difficult to develop a mathematical model to determine an optimal L for the balance between the exploration capability and the computational cost, we have to rely on experimental study. As seen from the results, the IMMA with L=2, 3 achieves better results on almost all the functions. Hence, in practice, we can simply start our search for optimal L from 2.

4.2.4.  The Superiority of the Adaptive Model

In this section, we investigate the effect of the adaptive multivariate model on the performance of IMMA. We compare two algorithms, IMMA-H and IMMA, to carry out the investigation. Both algorithms have the same algorithmic framework as described in Section 3.4. The difference is that in IMMA-H, the histogram model (Zhang et al., 2003) is embedded. The other parameters are set as M = 60 and L = 3. The comparison results are listed in Table 6, in which the best results are shown in bold type. In cases for which both algorithms reached the accuracy level in 25 trials, the rank test is performed on the NFE results and summarised in the pnh column. On the other hand, if both algorithms failed to reach the accuracy level in all 25 trials, the rank test is performed on the best error values and summarised in the pvh column.

Table 6:
Experimental results for IMMA-H and IMMA on the selected functions. The number in parentheses is the number of successful trials, while no parenthetical number means zero successful trials. The last two columns list the p value of the rank sum test on NFEs (pnh) and the best error values (pvh). — denotes that the rank sum test is not necessary because either both algorithms are fully successful (no pvh is available) or fully failed (no pnh is available), * denotes that the algorithm stops at the maximum NFEs, that is, .
IMMA-HIMMAph
FunctionAVGESDEAVGNSDN(SUC)AVGESDEAVGNSDN(SUC)pvhpnh
f1 0.00e+00 0.00e+00 6.89e+03 3.33e+02(25) 0.00e+00 0.00e+00 6.55e+03 6.27e+02(25) — .028 
f2 5.93e–12 3.05e–12 9.58e+03 1.23e+03(25) 6.22e–12 2.57e–12 1.01e+04 1.03e+03(25) — .156 
f3 2.25e–08 2.44e–08 3.06e+04 2.16e+03(25) 1.51e–08 1.54e–08 3.13e+04 2.57e+03(25) — .277 
f6 3.56e–10 4.80e–10 1.68e+04 9.04e+03(25) 3.18e–10 2.56e–10 3.11e+04 3.03e+04(25) — .033 
f7 3.16e–03 4.00e–03 7.08e+03 4.53e+03(25) 1.31e–12 4.79e–12 1.02e+04 6.28e+03(25) — .057 
f28 6.83e–33 8.28e–33 2.52e+03 4.03e+02(25) 1.93e–34 2.85e–34 6.30e+03 3.56e+02(25) — <.001 
f29 3.60e–14 5.65e–15 1.17e+04 6.39e+03(25) 2.61e–14 7.80e–15 2.67e+04 6.21e+03(25) — <.001 
f30 1.35e–13 1.01e–13 3.12e+03 4.10e+02(25) 1.63e–13 3.26e–13 7.68e+03 3.28e+03(25) — <.001 
f33 9.82e–14 2.14e–13 2.82e+04 1.88e+03(25) 3.09e–14 3.02e–14 2.02e+04 9.15e+03(25) — <.001 
f37 8.25e–11 1.30e–10 8.10e+03 3.58e+03(25) 4.73e–11 4.06e–11 1.93e+04 1.65e+04(25) — .003 
f4 3.85e+05 7.31e+04 2.94e+04 6.16e+03 <.001 — 
f5 5.37e+03 1.08e+03 1.68e+03 3.27e+02 <.001 — 
f8 2.00e+01 1.14e–10 2.00e+01 1.12e–10 1.000 — 
f26 4.28e+00 2.39e+00 2.84e–01 4.73e–02 <.001 — 
f27 5.32e+02 4.98e+01 4.13e+01 4.97e+01 <.001 — 
f9 9.04e+01 1.32e+01 1.09e–03 3.03e–03 3.94e+04 1.46e+04(25) <.001 <.001 
f10 2.36e+02 2.23e+01 1.90e–03 3.48e–03 4.20e+04 1.61e+04(25) <.001 <.001 
f31 4.18e+01 5.26e–01 3.69e–11 1.10e–10 4.69e+04 1.46e+04(25) <.001 <.001 
f32 2.66e–01 2.26e–01 1.31e+05 6.47e+04(5) 3.82e–04 7.00e–13 1.47e+04 3.74e+03(25) <.001 <.001 
f34 6.07e–02 8.43e–02 1.09e+05 1.04e+05(7) 5.56e–13 4.21e–13 4.02e+04 2.72e+04 (25) .001 .004 
IMMA-HIMMAph
FunctionAVGESDEAVGNSDN(SUC)AVGESDEAVGNSDN(SUC)pvhpnh
f1 0.00e+00 0.00e+00 6.89e+03