Abstract
For a largescale global optimization (LSGO) problem, divideandconquer is usually considered an effective strategy to decompose the problem into smaller subproblems, each of which can then be solved individually. Among these decomposition methods, variable grouping is shown to be promising in recent years. Existing variable grouping methods usually assume the problem to be blackbox (i.e., assuming that an analytical model of the objective function is unknown), and they attempt to learn appropriate variable grouping that would allow for a better decomposition of the problem. In such cases, these variable grouping methods do not make a direct use of the formula of the objective function. However, it can be argued that many realworld problems are whitebox problems, that is, the formulas of objective functions are often known a priori. These formulas of the objective functions provide rich information which can then be used to design an effective variable group method. In this article, a formulabased grouping strategy (FBG) for whitebox problems is first proposed. It groups variables directly via the formula of an objective function which usually consists of a finite number of operations (i.e., four arithmetic operations “$+$”, “$$”, “$\xd7$”, “$\xf7$” and composite operations of basic elementary functions). In FBG, the operations are classified into two classes: one resulting in nonseparable variables, and the other resulting in separable variables. In FBG, variables can be automatically grouped into a suitable number of noninteracting subcomponents, with variables in each subcomponent being interdependent. FBG can easily be applied to any whitebox problem and can be integrated into a cooperative coevolution framework. Based on FBG, a novel cooperative coevolution algorithm with formulabased variable grouping (socalled CCF) is proposed in this article for decomposing a largescale whitebox problem into several smaller subproblems and optimizing them respectively. To further enhance the efficiency of CCF, a new local search scheme is designed to improve the solution quality. To verify the efficiency of CCF, experiments are conducted on the standard LSGO benchmark suites of CEC'2008, CEC'2010, CEC'2013, and a realworld problem. Our results suggest that the performance of CCF is very competitive when compared with those of the stateoftheart LSGO algorithms.
1 Introduction
Many important classes of realworld optimization problems involve a large number of decision variables, for example, shape optimization where a large number of variables are often required to represent complex shapes (Sonoda et al., 2004; Vicini and Quagliarella, 1998), clustering (Xu and Wunsch, 2005), and feature selection (Lagrange et al., 2017), where a large number of features could be present. How to handle these largescale global optimization (LSGO) problems effectively still remains a challenging problem.
In recent years, new progress has been made toward handling LSGO problems in the field of evolutionary computation, for example, cooperative coevolution algorithms (Li and Yao, 2012; Omidvar, Li, and Yao, 2010b; Yang et al., 2008a; Potter and Jong, 1994; Yang et al., 2008b; Mei et al., 2014; Chen et al., 2010; Omidvar et al., 2014; Kazimipour et al., 2013; Dong et al., 2012; Leung and Wang, 2001), estimation of distribution algorithms (Valdez et al., 2013; Ahn et al., 2012), memetic algorithms (Iacca et al., 2012; Caraffini et al., 2013), social learning algorithms (Cheng and Jin, 2015a), and hybrid algorithms (LaTorre et al., 2013). Several studies conducted experiments on the CEC'2008 benchmark functions with up to 2000 or 5000 dimensions. More challenging LSGO benchmark suites have also been developed (Tang et al., 2009; Li et al., 2013), as well as further comparative studies (LaTorre et al., 2015).
For a LSGO problem, one common strategy is to adopt a divideandconquer approach which can divide a LSGO problem into several smaller subproblems and then solve them individually. Upon a successful decomposition, one can then adopt a cooperative coevolution (CC) framework (Potter and Jong, 1994) to solve the subproblems and, therefore, the original problem. The key idea here is to decompose the LSGO problem into several smaller subcomponents (or subproblems), with each subcomponent seen as an independent species to be evolved separately by a subpopulation using an evolutionary algorithm (EA). Individuals in a subpopulation are evaluated by how well they collaborate with the best individuals in other subcomponents via the objective function. More specifically, an individual in each species (subpopulation) is evaluated by concatenating itself with the best individuals from other subpopulations, to form a complete candidate solution, which is then fed into the objective function.
In the early days of the development of CC, variables were either divided into totally separable 1dimensional subcomponents (Potter and Jong, 1994) or fixed subgroups from the start of an optimization run (van den Bergh and Engelbrecht, 2004). Consequently, these methods cannot handle problems with nontrivial variable interactions effectively. In order to mitigate this problem, some more effective grouping methods have been proposed in recent years. For example, Yang et al. (2008a) proposed a random grouping method, where variables are randomly shuffled into a number of subcomponents after performing several iterations during an optimization run. As a result, the likelihood of interacting variables being placed in one subcomponent is increased, which then helps with improving the optimization performance. Unfortunately this method is very limited as it is effective only for subcomponents with a small number of interacting variables (Omidvar, Li, Yao, and Yang, 2010). A more sophisticated variable grouping method was proposed by Chen et al. (2010), where the grouping is detected or learned by checking pairwise dimensions via sampled points iteratively. Since the detection method is based on directly comparing fitness values of the sample points, the method failed to detect the interaction between two variables in many iterations, hence not only wasting much computational effort, but also keeping the grouping accuracy low. A more powerful variable grouping method (namely differential grouping, or DG in short) was proposed by Omidvar et al. (2014), which is able to learn and detect variable interactions for each pairwise variable comparison much more accurately and efficiently. Experiments on the CEC'2010 LSGO benchmark suite (Tang et al., 2009) have shown that DG is the stateoftheart variable grouping method (Omidvar et al., 2014). It is important to note that in DG a parameter $\u025b$ must be specified so as to determine at what threshold value a pair of decision variables may be considered as interactive (or nonseparable). The specification of this parameter may affect the variable grouping results. Furthermore, the condition given to verify variable interaction is only a sufficient condition rather than a sufficient and necessary condition, which means that when the condition is not satisfied, we cannot be 100% certain whether the two variables involved are interacting with each other or not.
Note that the variable grouping strategy is a key technique for decomposition that can be further used in the cooperative coevolutionary setting. The aforementioned variable grouping methods (including DG) do not assume the analytical model of a problem is known in advance (equivalent to assume that the problem is black box). As a result, these methods do not make use of any information from the formula of an objective function, even if such information is available. However, for many realworld problems, the formulas of objective functions are often known (we call these kinds of problems whitebox problems). For these kinds of problems, the problem information contained within the formulas of the objective functions can be used to perform effective variable grouping. In other words, more information (than pure black box) can be used for facilitating variable grouping for whitebox problems. One can harness this information to design more efficient and effective variable grouping methods, which will be of great value for solving LSGO problems. To achieve this, we will focus on using existing known formulas of the objective functions to design a more effective variable grouping strategy, and then integrate this strategy into the CC framework for ultimately solving LSGO problems.
In this article, we propose a formulabased variable grouping strategy (called FBG) for whitebox problems. The grouping strategy makes direct use of the characteristics of the objective function formula. Note that, generally speaking, a generic function consists of a finite number of four arithmetic operations “$+$,” “$$,” “$\xd7$,” and “$\xf7$,” as well as composite operations for basic elementary functions. FBG classifies these operations into two classes: the operations in one class result in nonseparable variables, and the operations in the second class result in separable variables. In this way, for any whitebox problem, a LSGO problem can be easily divided into several smaller subproblems to be solved respectively. Furthermore, to improve solution quality, a local search strategy is also developed. By integrating FBG and the local search strategy into the CC framework, we propose a new cooperative coevolutionary algorithm with formulabased grouping strategy (socalled CCF). CCF offers the following key advantages: 1) it can decompose exactly a largescale whitebox problem into several smaller subproblems, if variables are separable or partially separable; 2) it can search for each decomposed subproblem space independently; and 3) it can improve the solution quality quickly via the local search scheme.
Experiments have been carried out on the LSGO benchmark suites of CEC'2008 (Tang et al., 2007), CEC'2010 (Tang et al., 2009), CEC'2013 (Li et al., 2013), as well as a challenging realworld LSGO problem. The performance of CCF is compared with that of several stateoftheart algorithms using the following performance measures: the best solution, the worst solution, the mean solution, ERT (Expected Running Time) (Mersmann et al., 2015; Auger and Hansen, 2005), and ECDF (Empirical Cumulative Distribution Function) (Pál et al., 2012; Hansen et al., 2010). These results indicate that CCF is very effective and efficient in terms of variable grouping.
The remainder of the article is organized as follows. Section 2 describes the proposed FBG variable grouping method in detail, including an initialization method adopted and a local search algorithm. This is followed by a novel CC algorithm which incorporates both FBG and the local search, namely CCF. Section 3 presents our experimental results and analysis. Finally, Section 4 concludes and provides future research directions.
In this article, we adopt the following nomenclature:

$k$: generation number;

$D$: dimensions of a test problem;

$xk*$: local optimum of an objective function in the $k$th generation;

$fk*$: function value at $xk*$;

$x*$: global optimum of an objective function;

FEs: number of function evaluations;

MaxFEs: maximum number of function evaluations;

NonSep: set of operations or functions resulting in variables interaction (e.g., some basic functions involved in four arithmetic operations, etc.);

$M$: number of subcomponents (or subgroups).
2 The Proposed Algorithm
The proposed formulabased CC algorithm (CCF) is based on the general CC framework. The basic idea of CC is to adopt a divideandconquer strategy for a problem. The key steps of CC can be summarized as follows:
Problem decomposition: decompose a highdimensional vector into several smaller subcomponents by using a specific variable grouping strategy.
Subcomponent optimization: evolve each subcomponent through a cooperative coevolutionary mechanism (Potter and Jong, 1994) with a populationbased stochastic optimization method as a subcomponent optimizer. In order to execute such a populationbased optimization method on a subcomponent, each individual in the population representing a subcomponent must be evaluated on how well it collaborates with other subcomponents. This calls for a merge of the current candidate solution with the best solutions from all other subcomponents, and evaluate such a merged solution as a whole in the original objective function. Subcomponents are then optimized in a roundrobin manner.
For the problem decomposition part, CCF adopts the formulabased variable grouping, which we describe below. For the subcomponent optimization, we adopt an initialization method based on chaos, and a local search method based on a revised version of Quasi Newton method.
2.1 FormulaBased Variable Grouping (FBG)
The goal of variable grouping is to group interacting variables into the same subcomponent, and at the same time to keep the interdependency between different subcomponents to the minimum. In this way, a largescale problem can be divided into some relatively separate subproblems each with lower dimensions. However, grouping variables accurately is a challenging task. Especially for blackbox problems, little or no information about the objective function can be obtained, making it very difficult to design an effective variable grouping strategy.
In contrast, for whitebox problems, where more domain specific information is readily available through the problem formulation, it is possible to leverage on this information to design more effective variable grouping methods. This motivates us to propose a formulabased variable grouping method (FBG) for whitebox LSGO problems.
FBG involves the following steps: First, construct a set NonSep whose elements are the arithmetic operations and the basic elementary functions which would lead to variable interaction. Secondly, search the variables involved in the elements of set NonSep in the expression of the objective function. Finally, group the interactive variables (those variables connected by elements in NonSep) into the same subcomponent.
Note that an expression of a general objective function consists of a finite number of four arithmetic operations “$+$,” “$$,” “$\xd7$,” “$\xf7$,” and composite operations of basic elementary functions, for example, power function $ya$; exponential functions $ay$ and $ey$; logarithmic functions ln$y$ and log$ay$; trigonometric functions sin$y$, cos$y$, tan$y$, cot$y$, sec$y$, and csc$y$; inverse trigonometric functions arcsin$y$, arccos$y$, arctan$y$, arccot$y$, arcsec$y$, and arccsc$y$. We group variables based on the expression structure of the objective function according to the following three cases:
Case 1: Detecting variable interactions in four arithmetic operations. If a function is in the form: $p(x)=a1x1+a2x2+\cdots +amxm$, then each $aixi$ in this function can be minimized/maximized independently. Thus the variables $x1,x2,\u2026,xm$ in this function are separable. While if a function contains “$\xd7$” or “$\xf7$” of two variables, these two variables cannot be minimized/maximized independently. Thus they are nonseparable. We put these two operations “$\xd7$” and “$\xf7$” into set NonSep.
Case 2: Detecting variable interactions in basic elementary functions. For a basic elementary function $g(y)$ with $y\u2208R$ and an $n$dimensional function $h(x)$, if $g(y)$ is monotone and variables in function $h(x)$ are separable, then the variables in composite function $g(h(x))$ are also separable (e.g., if $g(y)=ey$, and $h(x)=x1+x2+\cdots +x10$, then variables in $g(h(x))=ex1+x2+\cdots +x10$ are separable). Otherwise, variables in $g(h(x))$ are nonseparable, that is, when $g(y)$ is a nonmonotonic function or $h(x)$ is nonseparable, their composite function $g(h(x))$ is nonseparable. Thus, the key operation to form a nonseparable composite function for this case is that $g(y)$ is nonmonotone, and we put the nonmonotonic basic elementary functions (e.g., trigonometric functions, inverse trigonometric functions, and power function with the exponent being an even integer) into set NonSep. Thus, for composite function $g(h(x))$, the key issue is to judge which of all basic elementary functions $g(y)$ is monotone when the variables in function $h(x)$ are separable:
(2.1) For the power function $g(x)=xa,x\u2208R$, when $a=2k+1,k\u2208N$ or when $a>0$ is a fraction (i.e., $a=bc>0,c\u22600$), where $b$ and $c$ are coprime and odd, then $g(x)$ is monotonically increasing. We put power function $xa$ into NonSep except for the aforementioned two cases.
(2.2) For the exponential function $g(x)=ax(a>0,a\u22601,x\u2208R)$, when $0<a<1$, $ax$ is monotonically decreasing; when $a>1$, $ax$ is monotonically increasing. Thus, $ax$ is not put into NonSep.
(2.3) For the logarithmic function $g(x)=logax(a>0,a\u22601,x>0)$, when $0<a<1$, $ax$ is monotonically decreasing; when $a>1$, $ax$ is monotonically increasing. Thus, $logax$ is not put into NonSep.
(2.4) For trigonometric functions, because all of them are periodic functions, they are put into NonSep.
(2.5) For inverse trigonometric functions (e.g., $g(x)=arcsinx$, $g(x)=arccosx$, $g(x)=arctanx$, $g(x)=arccotx$, $g(x)=arcsecx$, or $g(x)=arccscx$, where $x\u2208R$), they are not a onetoone mapping. Instead, they just have one monotonous zone. Since this leads to the interaction of variables, they are put into NonSep.
(2.6) For constant function $g(x)=c$, it does not lead to any interaction of variables, thus it will not be put into NonSep.
Case 3: Detecting variable interactions in a function obtained by one operation of “$+$,” “$$,” “$\xd7$,” and “$\xf7$” on two composite functions in (2). Note that operations “$+$” and “$$” do not change the separability of variables. If two composite functions in (2) are multiplied or divided, the variables in these two composite functions are nonseparable except for the multiplication of two exponential functions and both exponential functions being monotonically increasing or monotonically decreasing. Thus, we put these two composite functions linked by “$\xd7$” and “$\xf7$” into set NonSep except for the aforementioned case.
Through the above three cases, we can obtain set NonSep and determine the interacting variables which are linked by elements of set NonSep. Then these linked variables are put into a group.
In this research we have designed and developed a parser program to do the variable grouping using the above FBG strategy. The execution results of the FBG parser on the CEC'2008, CEC'2010, and CEC'2013 LSGO benchmarks indicate that FBG can find the correct subgroups of all these benchmark functions.
$S={50,25,25,100,50,25,25,700}$;
$D=\u2211i=1SSi=1000$;
$y=xxopt$;
$yi=y(P[Ci1+1]:P[Ci]),i\u2208{1,\u2026S}$
$zi=\Lambda 10Tasy0.2(Tosz(Riyi),i\u2208{1,\u2026S1}$;
$zS=\Lambda 10Tasy0.2(Tosz(ys)$
$Ri$ is a $Si\xd7Si$ rotation matrix;
P is a random permutation of the dimensions;
$x\u2208[32,32]D$;
The FBG variable grouping method works as follows: the parser scans the input function string from left to right. First, $\u2211i=1S1$ is scanned, then the parser expands this sum operation and obtains $w1fackley(z1)+w2fackley(z2)+\cdots +w7fackley(z7)$. For simplicity, we illustrate only the variable grouping process for $w1fackley(z1)$. The process for $w2fackley(z2)$,…,$w7fackley(z7)$ is exactly the same as that for $w1fackley(z1)$. Here, the parser reads $w1$. Because $w1$ is a constant which does not result in interaction of variables, the parser continues to invoke the predefined function $fackley$ with the parameters $z1$, which is then expanded to $\Lambda 10Tasy0.2(Tosz(R1y1)$. Following this, the parser scans $\Lambda 10$ which does not result in interaction of variables, then it continues to scan the $Tasy0.2$ and $Tosz$ functions which indeed result in variable interactions. Thus these variables $R1y1$ will be extracted and passed to the ackley function for further checking.
Thereafter, the parser begins to scan $fackley$. Because $20$ is a constant which does not result in the interaction of variables, the parser continues to scan exp. Based on the FBG strategy, the parser identifies that exponential functions do not result in the interaction of variables. So it continues. When the parser scans $1S1\u2211i=1S1zi2$, which is stored as $(1S1\u2211i=1S1(z1(i))2)12$, the parser reads “(”, and a temporary set TempSet will be constructed to keep the candidate interactive variables. After scanning constant $1S1$ which does not result in the interaction of variables, the parser checks the sum function $\u2211i=1S1(z1(i))2$ and expands it to $z1(i)$ containing variables, for example, ${(x611,x595,x579)}$. The variables involved in this sum will be put into set TempSet, i.e., TempSet= ${allvariablesinz1}={x611,x595,x579,\u2026}$. This process continues until “)” is scanned. Then the “power of 2” operation is scanned. Because this operation results in variable interaction, all the variables in TempSet as a group will be put into the NonsepGroups, in which each group contains the interactive variables, and then we clear TempSet. This procedure continues until the end of the input function. The final step is to update NonsepGroups by merging the overlapping variables linked by the groups of NonsepGroups to one group. In this way, all variables are classified into the correct groups as described in Li et al. (2013).
In short, FBG can correctly group the interactive variables of all benchmark test functions in the CEC'2008, CEC'2010, and CEC'2013 LSGO benchmark suites. The grouping results by FBG on the most recent CEC'2013 LSGO benchmark suite can be found in the supplementary materials.
Note that after variables are classified into different groups by FBG, any evolutionary algorithm can be used to each subcomponent for whitebox problems in the same way used for blackbox problems in the existing works.
2.2 Initialization Method
Initialization plays an important role in any populationbased stochastic optimization algorithm. Traditionally, basic random number generators (RNGs) are commonly used to initialize the population of CC. A large and growing number of works have proposed new ways to generate the initial population (Kazimipour et al., 2013; Dong et al., 2012; Leung and Wang, 2001). In this article, we adopt the chaotic method (Dong et al., 2012) to generate the initial population, since this type of chaosbased initialization method seems to provide an outstanding uniform distribution of points. The detailed process of the chaos initialization method is as follows:
 Generate the chaos factor by the following formula:where $zj(i)$ denotes the $jth$ chaos factor at iteration $i$.$zj(i)=\mu (12zj(i1)0.5),0\u2264zj(0)\u22641,i=1,2,\u2026,m,j=1,2,\u2026,D.$
Set $i=0$ and randomly generate $D$ uniform factors $zj(0)$ for $j=1,2,\u2026,D$.
Set $\mu =1$. Then, for $i=1,2,\u2026,m$, iteratively generate the chaos factors $zj(i)$ for $j=1,2,\u2026,D$ by formula (2).
2.3 A New Local Search Strategy
To further enhance the efficiency of the proposed algorithm CCF, which will be described in Section 2.4, we adopt a new local search scheme.
There exist some efficient local search methods, for example, Conjugate Gradient Method (Faires and Burden, 2003), Newton Method and Quasi Newton Method (Griewank and Toint, 1982), etc. However, these methods require calculation of the gradients of an objective function, hence are not suitable for solving nondifferentiable problems. In order to retain the advantages of these local search methods and avoid computing the gradients, a revised version of the Quasi Newton algorithm is designed and its pseudocode is shown in Algorithm 1, where the key novel idea is to use an approximate formula to estimate the gradient instead of computing it directly from the objective function.
In addition, for the fully separable functions which can be decomposed into 1000 independent 1D functions such as f1–f3 in the CEC' 2010 and CEC' 2013 benchmark suites, we use the linesearch strategy as the local search strategy (Nocedal and Wright, 1999).
2.4 Cooperative Coevolution with FormulaBased Grouping Strategy (CCF)
By using FBG, a largescale optimization problem can be decomposed into several smaller subproblems. We can then use an efficient optimization method to solve each subproblem through a cooperative coevolution mechanism (Potter and Jong, 1994). More specifically, first, for each subproblem, only the variables corresponding to this subproblem (subcomponent) are seen as the variables to be evolved, and the variables corresponding to all other subproblems (subcomponents) are fixed to their best values. Second, for each specific subproblem, each individual with which the values of other subcomponents are fixed to their best values can be evaluated using the original objective function. These individuals form the subpopulation for this subproblem. Third, all the subproblems are optimized in a roundrobin fashion one by one from the first subproblem to the last by evolving the corresponding subpopulation using a chosen optimization method. The above process is iterated until the termination criterion is met.
It is well known that Differential Evolution (DE) is a simple and effective populationbased optimization method (Vesterstrom and Thomsen, 2004), but the performance of a standard DE is sensitive to its control parameters (Gämperle et al., 2002). To overcome this, a selfadaptive DE with neighborhood Search (SaNSDE) was developed with a selfadapted crossover rate $CR$ and scaling factor $F$ (Qin et al., 2009). SaNSDE has been successfully applied in a variety of problems (e.g., Yang et al., 2008a,b). Thus, we choose SaNSDE as the evolutionary algorithm to optimize each subproblem containing more than ten variables. If a subproblem contains fewer than ten variables, as mentioned in the previous subsection, using SaNSDE as the optimizer for this subproblem may waste a lot of computation. Therefore, we choose to use the new local search or line search scheme described in the previous subsection. By integrating FBG with SaNSDE/the new local search scheme, we propose a new cooperative coevolution algorithm with formulabased variable grouping strategy (CCF). Its pseudocode is given in Algorithm 2.
3 Numerical Experiments
3.1 Benchmark Suite and Parameters Setting for CCF
In this section, the proposed algorithm CCF is first tested on two widelyused LSGO benchmark suites with 1000 dimensions: the CEC'2010 LSGO benchmark suite (Tang et al., 2009) and CEC'2013 LSGO benchmark suite (Li et al., 2013). We also carry out the experiments on the early CEC'2008 benchmark suite (Tang et al., 2007) with 2000 and 5000 dimensions, respectively.
CEC'2010 benchmark functions:
The dimensions of the functions are fixed at 1000, where $f1$$f3$ are fully separable functions, $f4$$f18$ are partially additively separable functions, and $f19$$f20$ are nonseparable functions.
CEC'2013 benchmark functions:
The dimensions of the functions are fixed at 1000, where $f1$$f3$ are fully separable functions, $f4$$f11$ are partially additively separable functions, and $f12$$f15$ are nonseparable functions.
CEC'2008 benchmark functions:
The dimensions of the functions are scalable, where $f1,f4$ and $f6$ are fully separable functions, and $f2,f3,f5$ and $f7$ are fully nonseparable functions. We performed the experiments with dimensions 2000 and 5000, respectively.
CCF is compared with a number of existing LSGO algorithms on all or a part of these benchmarks. In all cases, we conducted 25 independent runs for each test function except for the 5000 dimensional test functions. For each test function with 5000 dimensions, we conducted 10 independent runs to be consistent with those conducted in the compared CSO algorithm.
We used the following performance evaluation methods to compare the proposed algorithm CCF with several stateoftheart algorithms:
We recorded the smallest objective function value (“Best”), the largest function value (“Worst”), the average function value (“Mean”), and the standard deviation denoted (“Std”) in 25 runs. We compared these values with those obtained by the compared algorithms.
We compared $p$value obtained by the proposed algorithm with those obtained by the compared algorithms, where $p$value means the twotailed $t$test value at the significance level of 0.05.
We also used two performance measures ERT (Expected Running Time) (Mersmann et al., 2015; Auger and Hansen, 2005) and ECDF (Empirical Cumulative Distribution Function) (Hansen et al., 2010; Pál et al., 2012) in the comparisons, where ERT estimates the expected running time by the number of function evaluations required to reach a certain accuracy level $\epsilon >0$. To reach the same accuracy level, the lower the running time, the better the performance. ECDF is used to show the empirical cumulative probability of problems which succeeded to reach the required accuracy within the budget (ERT).
We set the maximum number of function evaluations MaxFEs to $3.0e+06$ (this is also used by the compared algorithms), and the population size $N=50$.
3.2 The Simulation Results on 1000D Problems
In Table 1, CCF is compared with five wellperforming algorithms, including DECCG (Yang et al., 2008a), CCVIL (Chen et al., 2010), DECCDG (Omidvar et al., 2014), DECCDML (Omidvar, Li, and Yao, 2010), and MASWChains (Molina et al., 2010), using the CEC'2010 benchmark suite, where MASWChains is the winner of the CEC'2010 LSGO competition. DECCG is widelyused as a baseline in LSGO algorithm comparisons (Chen et al., 2010; Omidvar et al., 2014), and CCVIL, DECCDG, and DECCDML are among the most representative LSGO algorithms evaluated on the CEC'2010 LSGO benchmarks. The results of these compared algorithms have been given in the corresponding references, and we use their results directly for our comparisons.
pvalue  
MASW  (CCF/MA  
P  CCF  Chains  DECCDG  CCVIL  DECCG  DECCDML  SWChains)  
f1  mean  0.00e+00  2.10e14  5.47e+03  1.55e17  2.93e07  1.93e25  NaN 
std  0.00e+00  1.99e14  2.02e+04  7.75e17  8.62e08  1.86e25  
f2  mean  4.85e+01  8.10e+02  4.39e+03  6.71e09  1.31e+03  2.17e+02  3.30e43 
std  3.12e+01  5.88e+01  1.97e+02  2.31e08  3.24e+01  2.98e+01  
f3  mean  4.98e12  7.28e13  1.67e+01  7.52e11  1.39e+00  1.18e13  1.51e32 
std  6.29e13  3.40e13  3.34e01  6.58e11  9.59e02  8.22e15  
f4  mean  4.84e+10  3.53e+11  4.79e+12  9.62e+12  5.00e+12  3.58e+12  8.63e29 
std  2.28e+10  3.12e+10  1.44e+12  3.43e+12  3.38e+12  1.54e+12  
f5  mean  8.08e+07  1.68e+08  1.55e+08  1.76e+08  2.63e+08  2.98e+08  5.65e19 
std  1.71e+07  1.04e+08  2.17e+07  6.47e+07  8.44e+07  9.31e+07  
f6  mean  3.42e+06  8.14e+04  1.64e+01  2.94e+05  4.96e+06  7.93e+05  2.31e01 
std  1.36e+07  2.84e+05  2.71e01  6.09e+05  8.02e+05  3.97e+06  
f7  mean  2.03e10  1.03e+02  1.16e+04  8.00e+08  1.63e+08  1.39e+08  0.00e+00 
std  2.27e12  8.70e+01  7.41e+03  2.48e+09  1.38e+08  7.72e+07  
f8  mean  1.28e+06  1.41e+07  3.04e+07  6.50e+07  6.44e+07  3.46e+07  9.69e22 
std  1.90e+06  3.68e+07  2.11e+07  3.07e+07  2.89e+07  3.56e+07  
f9  mean  7.65e+06  1.41e+07  5.96e+07  6.66e+07  3.21e+08  5.92e+07  9.43e22 
std  9.55e+05  1.15e+06  8.18e+06  1.60e+07  3.39e+07  4.71e+06  
f10  mean  1.31e+04  2.07e+03  4.52e+03  1.28e+03  1.06e+04  1.25e+04  1.43e37 
std  3.47e+02  1.44e+02  1.41e+02  7.95e+01  2.93e+02  2.66e+02  
f11  mean  2.56e+01  3.80e+01  1.03e+01  3.48e+00  2.34e+01  1.80e13  1.30e18 
std  2.50e+00  7.35e+00  1.01e+00  1.91e+00  1.79e+00  9.88e15  
f12  mean  7.20e03  3.62e06  2.52e+03  8.95e+03  8.93e+04  3.79e+06  1.60e01 
std  2.49e02  5.92e07  4.86e+02  5.39e+03  6.90e+03  1.50e+05  
f13  mean  5.58e+01  1.25e+03  4.54e+06  5.72e+02  5.12e+03  1.14e+03  4.35e36 
std  4.43e+01  5.72e+02  2.13e+06  2.55e+02  3.95e+03  4.31e+02  
f14  mean  5.54e+07  3.11e+07  3.41e+08  1.74e+08  8.08e+08  1.89e+08  7.87e19 
std  4.79e+06  1.93e+06  2.41e+07  2.68e+07  6.06e+07  1.49e+07  
f15  mean  4.32e+03  2.74e+03  5.88e+03  2.65e+03  1.22e+04  1.54e+04  7.50e28 
std  1.28e+02  1.22e+02  1.03e+02  9.34e+01  9.10e+02  3.59e+02  
f16  mean  1.92e+01  9.98e+01  7.39e13  7.18e+00  7.66e+01  5.08e02  7.36e36 
std  3.05e+00  1.40e+01  5.70e14  2.23e+00  8.14e+00  2.54e01  
f17  mean  7.18e+02  1.24e+00  4.01e+04  2.13e+04  2.87e+05  6.54e+06  4.95e12 
std  2.86e+02  1.25e01  2.85e+03  9.16e+03  1.97e+04  4.63e+05  
f18  mean  1.29e+03  1.30e+03  1.11e+10  1.33e+04  2.46e+04  2.47e+03  6.74e01 
std  1.31e+02  4.36e+02  2.04e+09  1.00e+04  1.05e+04  1.18e+03  
f19  mean  9.22e+05  2.85e+05  1.74e+06  3.52e+05  1.11e+06  1.59e+07  8.14e31 
std  3.91e+04  1.78e+04  9.54e+04  2.04e+04  5.00e+04  1.72e+06  
f20  mean  2.49e+03  1.07e+03  4.87e+07  1.11e+03  4.06e+03  9.91e+02  2.98e21 
std  2.21e+02  7.29e+01  2.27e+07  3.04e+02  3.66e+02  3.51e+01 
pvalue  
MASW  (CCF/MA  
P  CCF  Chains  DECCDG  CCVIL  DECCG  DECCDML  SWChains)  
f1  mean  0.00e+00  2.10e14  5.47e+03  1.55e17  2.93e07  1.93e25  NaN 
std  0.00e+00  1.99e14  2.02e+04  7.75e17  8.62e08  1.86e25  
f2  mean  4.85e+01  8.10e+02  4.39e+03  6.71e09  1.31e+03  2.17e+02  3.30e43 
std  3.12e+01  5.88e+01  1.97e+02  2.31e08  3.24e+01  2.98e+01  
f3  mean  4.98e12  7.28e13  1.67e+01  7.52e11  1.39e+00  1.18e13  1.51e32 
std  6.29e13  3.40e13  3.34e01  6.58e11  9.59e02  8.22e15  
f4  mean  4.84e+10  3.53e+11  4.79e+12  9.62e+12  5.00e+12  3.58e+12  8.63e29 
std  2.28e+10  3.12e+10  1.44e+12  3.43e+12  3.38e+12  1.54e+12  
f5  mean  8.08e+07  1.68e+08  1.55e+08  1.76e+08  2.63e+08  2.98e+08  5.65e19 
std  1.71e+07  1.04e+08  2.17e+07  6.47e+07  8.44e+07  9.31e+07  
f6  mean  3.42e+06  8.14e+04  1.64e+01  2.94e+05  4.96e+06  7.93e+05  2.31e01 
std  1.36e+07  2.84e+05  2.71e01  6.09e+05  8.02e+05  3.97e+06  
f7  mean  2.03e10  1.03e+02  1.16e+04  8.00e+08  1.63e+08  1.39e+08  0.00e+00 
std  2.27e12  8.70e+01  7.41e+03  2.48e+09  1.38e+08  7.72e+07  
f8  mean  1.28e+06  1.41e+07  3.04e+07  6.50e+07  6.44e+07  3.46e+07  9.69e22 
std  1.90e+06  3.68e+07  2.11e+07  3.07e+07  2.89e+07  3.56e+07  
f9  mean  7.65e+06  1.41e+07  5.96e+07  6.66e+07  3.21e+08  5.92e+07  9.43e22 
std  9.55e+05  1.15e+06  8.18e+06  1.60e+07  3.39e+07  4.71e+06  
f10  mean  1.31e+04  2.07e+03  4.52e+03  1.28e+03  1.06e+04  1.25e+04  1.43e37 
std  3.47e+02  1.44e+02  1.41e+02  7.95e+01  2.93e+02  2.66e+02  
f11  mean  2.56e+01  3.80e+01  1.03e+01  3.48e+00  2.34e+01  1.80e13  1.30e18 
std  2.50e+00  7.35e+00  1.01e+00  1.91e+00  1.79e+00  9.88e15  
f12  mean  7.20e03  3.62e06  2.52e+03  8.95e+03  8.93e+04  3.79e+06  1.60e01 
std  2.49e02  5.92e07  4.86e+02  5.39e+03  6.90e+03  1.50e+05  
f13  mean  5.58e+01  1.25e+03  4.54e+06  5.72e+02  5.12e+03  1.14e+03  4.35e36 
std  4.43e+01  5.72e+02  2.13e+06  2.55e+02  3.95e+03  4.31e+02  
f14  mean  5.54e+07  3.11e+07  3.41e+08  1.74e+08  8.08e+08  1.89e+08  7.87e19 
std  4.79e+06  1.93e+06  2.41e+07  2.68e+07  6.06e+07  1.49e+07  
f15  mean  4.32e+03  2.74e+03  5.88e+03  2.65e+03  1.22e+04  1.54e+04  7.50e28 
std  1.28e+02  1.22e+02  1.03e+02  9.34e+01  9.10e+02  3.59e+02  
f16  mean  1.92e+01  9.98e+01  7.39e13  7.18e+00  7.66e+01  5.08e02  7.36e36 
std  3.05e+00  1.40e+01  5.70e14  2.23e+00  8.14e+00  2.54e01  
f17  mean  7.18e+02  1.24e+00  4.01e+04  2.13e+04  2.87e+05  6.54e+06  4.95e12 
std  2.86e+02  1.25e01  2.85e+03  9.16e+03  1.97e+04  4.63e+05  
f18  mean  1.29e+03  1.30e+03  1.11e+10  1.33e+04  2.46e+04  2.47e+03  6.74e01 
std  1.31e+02  4.36e+02  2.04e+09  1.00e+04  1.05e+04  1.18e+03  
f19  mean  9.22e+05  2.85e+05  1.74e+06  3.52e+05  1.11e+06  1.59e+07  8.14e31 
std  3.91e+04  1.78e+04  9.54e+04  2.04e+04  5.00e+04  1.72e+06  
f20  mean  2.49e+03  1.07e+03  4.87e+07  1.11e+03  4.06e+03  9.91e+02  2.98e21 
std  2.21e+02  7.29e+01  2.27e+07  3.04e+02  3.66e+02  3.51e+01 
From Table 1, we can see that CCF performed the best on eight functions in the CEC'2010 LSGO benchmarks among all algorithms, while MASWChains performed the best on only five functions, DECCDG and CCVIL on only two functions respectively, and DECCDML on only three functions. DECCG performed the worst. Compared with MASWChains, CCF wins in 11 out of the 20 test functions, in 16 out of the 20 test functions when compared with DECCDG, in 12 out of 20 problems when compared with CCVIL, in 18 out of the 20 problems when compared with DECCG, and in 14 out of the 20 problems when compared with DECCDML. These results suggest that CCF is more effective than the compared ones.
On fully separable functions $f1$$f3$, CCF performed the best on $f1$, whereas CCVIL performed the best on $f2$, and DECCDML on $f3$. For $f3$, CCF is just slightly worse than DECCDML. For partiallyseparable functions $f4$$f18$, CCF performed better than MASWChains on 8 functions (but worse on 6 functions), than DECCDG on 11 functions, better than CCVIL on 10 functions, than DECCG on 12 functions, and than DECCDML on 10 functions. These results indicate that CCF is more effective on the partiallyseparable test functions. For nonseparable function $f19$, CCF performs better than three algorithms DECCDG, DECCG, and DECCDML, but slightly worse than CCVIL and MASWChains. For fully nonseparable test function $f20$, CCF performs better than two algorithms DECCDG and DECCG, but worse than algorithms DECCDML, CCVIL, and MASWChains. This is because the proposed FBG has little or no use to these two functions (as they are nonseparable), and only the SaNSDE adopted has an impact on the performance of CCF. It is also possible that the SaNSDE adopted in CCF is less effective than those adopted in these compared approaches.
MASWChains is the top performer in the CEC'2010 LSGO competition, and performed better than DECCDG, DECCG, DECCDML, and CCVIL. Table 1 also shows how CCF is compared with MASWChains via twotailed $t$test $p$values. It can be seen that the results of ten test functions obtained by CCF are better than those obtained by MASWChains, with seven worse than MASWChains and three ties. These results in Table 1 suggest that CCF is more effective than the compared algorithms.
In more recent years, CEC'2013 LSGO benchmark suite for largescale global optimization problems was also used for evaluating LSGO algorithms, since this benchmark suite was further extended with a few more challenging LSGO properties (Omidvar et al., 2015). Several LSGO algorithms, including SACC (Wei et al., 2013), MOS (LaTorre et al., 2013), and DECCG (Yang et al., 2008a), were evaluated using this test suite. We have conducted our experiments of CCF on this benchmark suite, and compared CCF with these three algorithms (SACC, MOS, and DECCG). The key differences between SACC and CCF are as follows: 1) SACC uses the random grouping strategy while CCF uses the proposed FBG; 2) SACC uses an auxiliary function to enhance its performance while CCF uses the new local search based on the Quasi Newton method. The algorithm MOS (LaTorre et al., 2013) is the topperforming algorithm in the CEC'2013 LSGO competition, and DECCG is commonlyused as a baseline in several studies (LaTorre et al., 2013; Omidvar et al., 2014). Some comprehensive empirical studies on existing LSGO algorithms using the CEC'2013 benchmark suite have been provided in LaTorre et al. (2015). The results for the CEC'2013 benchmark suite obtained by SACC are provided in Wei et al. (2013). Table 2 shows a direct comparison of these results with those obtained by CCF. Furthermore, to test the effectiveness of the proposed grouping scheme FBG, we replace FBG by a random grouping strategy (Yang et al., 2008b) in CCF and the resulting algorithm is denoted as CCR (with the group size being set to 100). Because it is obvious that MOS outperforms CCR, SACC, and DECCG, to further validate the performance of CCF, we also compare MOS with CCF via twotailed $t$tests (significance level set to 0.05). The results are also presented in Table 2.
P  CCF  CCR  SACC  MOS  DECCG  pvalue (CCF/MOS)  
f1  Best  0.00e+00  0.00e+00  0.00e+00  0.00e+00  1.75e13  NaN 
Worst  0.00e+00  1.81e03  6.81e23  0.00e+00  2.45e13  
Mean  0.00e+00  2.58e04  2.73e24  0.00e+00  2.03e13  
Std  0.00e+00  6.84e04  1.36e23  0.00e+00  1.78e14  
f2  Best  1.79e+01  1.91e+02  2.88e+02  7.40e+02  9.90e+02  7.02e31 
Worst  7.61e+01  4.11e+03  2.72e+03  9.28e+02  1.07e+03  
Mean  3.96e+01  2.11e+03  7.06e+02  8.32e+02  1.03e+03  
Std  2.20e+01  1.73e+03  4.72e+02  4.48e+01  2.26e+01  
f3  Best  3.97e13  2.06e13  9.24e14  8.20e13  2.63e10  5.59e20 
Worst  4.58e13  3.32e+00  3.76e+00  1.00e12  3.16e10  
Mean  4.32e13  3.76e01  1.11e+00  9.17e13  2.87e10  
Std  2.18e14  9.96e01  1.11e+00  5.12e14  1.38e11  
f4  Best  2.20e+07  7.02e+09  8.48e+09  1.10e+08  7.58e+09  8.17e07 
Worst  1.14e+09  1.35e+11  1.71e+11  5.22e+08  6.99e+10  
Mean  5.83e+08  6.09e+10  4.56e+10  1.74e+08  2.60e+10  
Std  3.35e+08  5.46e+10  3.60e+10  7.87e+08  1.47e+10  
f5  Best  2.60e+06  1.46e+06  3.36e+06  5.25e+06  7.28e+14  2.77e18 
Worst  4.67e+06  1.37e+07  1.40e+07  8.56e+06  7.28e+14  
Mean  3.35e+06  8.31e+06  7.74e+06  6.94e+06  7.28e+14  
Std  7.30e+05  4.90e+06  3.22e+06  8.85e+05  1.51e+05  
f6  Best  1.17e+05  1.12e+05  1.57e+05  1.95e+01  6.96e08  4.38e01 
Worst  1.52e+05  1.82e+05  6.00e+05  2.31e+05  1.10e+05  
Mean  1.30e+05  1.46e+05  2.47e+05  1.48e+05  4.85e+04  
Std  1.25e+04  2.56e+04  1.02e+05  6.43e+04  3.98e+04  
f7  Best  3.84e+02  2.44e+07  1.72e+06  3.49e+03  1.96e+08  3.43e10 
Worst  1.09e+03  1.02e+09  1.18e+09  3.73e+04  1.78e+09  
Mean  3.08e+03  4.65e+08  8.98e+07  1.62e+04  6.07e+08  
Std  3.62e+03  4.22e+08  2.48e+08  9.10e+03  4.09e+08  
f8  Best  5.53e+14  5.33e+13  1.47e+14  3.26e+12  1.43e+14  4.42e09 
Worst  2.35e+15  5.62e+15  3.08e+15  1.32e+13  7.75e+14  
Mean  1.40e+15  2.14e+15  1.20e+15  8.00e+12  4.26e+14  
Std  6.98e+14  1.77e+15  7.63e+14  3.07e+12  1.53e+14  
f9  Best  2.52e+08  2.54e+08  2.29e+08  2.63e+08  2.20e+08  2.05e12 
Worst  3.81e+08  4.89e+08  1.01e+09  5.42e+08  6.55e+08  
Mean  3.45e+08  3.75e+08  5.98e+08  3.83e+08  4.27e+08  
Std  4.70e+07  7.97e+07  2.03e+08  6.29e+07  9.89e+07  
f10  Best  1.34e+02  5.92e+06  1.38e+07  5.92e+02  9.29e+04  1.25e11 
Worst  2.37e+02  1.59e+07  7.75e+07  1.23e+06  1.73e+07  
Mean  1.91e+02  1.02e+07  2.95e+07  9.02e+05  1.10e+07  
Std  4.20e+01  3.16e+06  1.93e+07  5.07e+05  4.00e+06  
f11  Best  8.36e+07  3.35e+08  8.12e+07  2.06e+07  4.68e+10  1.27e06 
Worst  1.55e+08  2.93e+11  2.30e+10  9.50e+07  7.16e+11  
Mean  1.08e+08  1.01e+11  2.78e+09  5.22e+07  2.46e+11  
Std  2.79e+07  1.28e+11  5.90e+09  2.05e+07  2.03e+11  
f12  Best  2.20e+03  2.48e+03  2.43e+02  2.22e01  9.80e+02  7.05e36 
Worst  3.59e+03  3.00e+03  1.72e+03  1.17e+03  1.20e+03  
Mean  2.59e+03  2.71e+03  8.73e+02  2.47e+02  1.04e+03  
Std  4.65e+02  1.81e+02  3.71e+02  2.54e+02  5.76e+01  
f13  Best  3.54e+08  3.86e+09  6.72e+08  1.52e+06  2.09e+10  2.67e17 
Worst  3.41e+09  7.45e+09  3.40e+09  6.16e+06  4.64e+10  
Mean  1.16e+09  5.21e+09  1.78e+09  3.40e+06  3.42e+10  
Std  1.04e+09  1.30e+09  8.05e+08  1.06e+06  6.41e+09  
f14  Best  7.22e+09  3.91e+08  8.21e+07  1.54e+07  1.91e+11  7.27e03 
Worst  3.69e+10  2.52e+11  1.10e+11  4.46e+07  1.04e+12  
Mean  1.96e+10  4.75e+10  1.75e+10  2.56e+07  6.08e+11  
Std  9.77e+09  9.21e+10  2.87e+10  7.94e+06  2.06e+11  
f15  Best  2.91e+06  3.91e+08  1.26e+06  2.03e+06  4.63e+07  4.41e18 
Worst  3.87e+06  7.56e+06  4.90e+06  2.88e+06  7.15e+07  
Mean  3.39e+06  5.32e+06  2.01e+06  2.35e+06  6.05e+07  
Std  3.48e+05  2.06e+06  7.23e+05  1.94e+05  6.45e+06 
P  CCF  CCR  SACC  MOS  DECCG  pvalue (CCF/MOS)  
f1  Best  0.00e+00  0.00e+00  0.00e+00  0.00e+00  1.75e13  NaN 
Worst  0.00e+00  1.81e03  6.81e23  0.00e+00  2.45e13  
Mean  0.00e+00  2.58e04  2.73e24  0.00e+00  2.03e13  
Std  0.00e+00  6.84e04  1.36e23  0.00e+00  1.78e14  
f2  Best  1.79e+01  1.91e+02  2.88e+02  7.40e+02  9.90e+02  7.02e31 
Worst  7.61e+01  4.11e+03  2.72e+03  9.28e+02  1.07e+03  
Mean  3.96e+01  2.11e+03  7.06e+02  8.32e+02  1.03e+03  
Std  2.20e+01  1.73e+03  4.72e+02  4.48e+01  2.26e+01  
f3  Best  3.97e13  2.06e13  9.24e14  8.20e13  2.63e10  5.59e20 
Worst  4.58e13  3.32e+00  3.76e+00  1.00e12  3.16e10  
Mean  4.32e13  3.76e01  1.11e+00  9.17e13  2.87e10  
Std  2.18e14  9.96e01  1.11e+00  5.12e14  1.38e11  
f4  Best  2.20e+07  7.02e+09  8.48e+09  1.10e+08  7.58e+09  8.17e07 
Worst  1.14e+09  1.35e+11  1.71e+11  5.22e+08  6.99e+10  
Mean  5.83e+08  6.09e+10  4.56e+10  1.74e+08  2.60e+10  
Std  3.35e+08  5.46e+10  3.60e+10  7.87e+08  1.47e+10  
f5  Best  2.60e+06  1.46e+06  3.36e+06  5.25e+06  7.28e+14  2.77e18 
Worst  4.67e+06  1.37e+07  1.40e+07  8.56e+06  7.28e+14  
Mean  3.35e+06  8.31e+06  7.74e+06  6.94e+06  7.28e+14  
Std  7.30e+05  4.90e+06  3.22e+06  8.85e+05  1.51e+05  
f6  Best  1.17e+05  1.12e+05  1.57e+05  1.95e+01  6.96e08  4.38e01 
Worst  1.52e+05  1.82e+05  6.00e+05  2.31e+05  1.10e+05  
Mean  1.30e+05  1.46e+05  2.47e+05  1.48e+05  4.85e+04  
Std  1.25e+04  2.56e+04  1.02e+05  6.43e+04  3.98e+04  
f7  Best  3.84e+02  2.44e+07  1.72e+06  3.49e+03  1.96e+08  3.43e10 
Worst  1.09e+03  1.02e+09  1.18e+09  3.73e+04  1.78e+09  
Mean  3.08e+03  4.65e+08  8.98e+07  1.62e+04  6.07e+08  
Std  3.62e+03  4.22e+08  2.48e+08  9.10e+03  4.09e+08  
f8  Best  5.53e+14  5.33e+13  1.47e+14  3.26e+12  1.43e+14  4.42e09 
Worst  2.35e+15  5.62e+15  3.08e+15  1.32e+13  7.75e+14  
Mean  1.40e+15  2.14e+15  1.20e+15  8.00e+12  4.26e+14  
Std  6.98e+14  1.77e+15  7.63e+14  3.07e+12  1.53e+14  
f9  Best  2.52e+08  2.54e+08  2.29e+08  2.63e+08  2.20e+08  2.05e12 
Worst  3.81e+08  4.89e+08  1.01e+09  5.42e+08  6.55e+08  
Mean  3.45e+08  3.75e+08  5.98e+08  3.83e+08  4.27e+08  
Std  4.70e+07  7.97e+07  2.03e+08  6.29e+07  9.89e+07  
f10  Best  1.34e+02  5.92e+06  1.38e+07  5.92e+02  9.29e+04  1.25e11 
Worst  2.37e+02  1.59e+07  7.75e+07  1.23e+06  1.73e+07  
Mean  1.91e+02  1.02e+07  2.95e+07  9.02e+05  1.10e+07  
Std  4.20e+01  3.16e+06  1.93e+07  5.07e+05  4.00e+06  
f11  Best  8.36e+07  3.35e+08  8.12e+07  2.06e+07  4.68e+10  1.27e06 
Worst  1.55e+08  2.93e+11  2.30e+10  9.50e+07  7.16e+11  
Mean  1.08e+08  1.01e+11  2.78e+09  5.22e+07  2.46e+11  
Std  2.79e+07  1.28e+11  5.90e+09  2.05e+07  2.03e+11  
f12  Best  2.20e+03  2.48e+03  2.43e+02  2.22e01  9.80e+02  7.05e36 
Worst  3.59e+03  3.00e+03  1.72e+03  1.17e+03  1.20e+03  
Mean  2.59e+03  2.71e+03  8.73e+02  2.47e+02  1.04e+03  
Std  4.65e+02  1.81e+02  3.71e+02  2.54e+02  5.76e+01  
f13  Best  3.54e+08  3.86e+09  6.72e+08  1.52e+06  2.09e+10  2.67e17 
Worst  3.41e+09  7.45e+09  3.40e+09  6.16e+06  4.64e+10  
Mean  1.16e+09  5.21e+09  1.78e+09  3.40e+06  3.42e+10  
Std  1.04e+09  1.30e+09  8.05e+08  1.06e+06  6.41e+09  
f14  Best  7.22e+09  3.91e+08  8.21e+07  1.54e+07  1.91e+11  7.27e03 
Worst  3.69e+10  2.52e+11  1.10e+11  4.46e+07  1.04e+12  
Mean  1.96e+10  4.75e+10  1.75e+10  2.56e+07  6.08e+11  
Std  9.77e+09  9.21e+10  2.87e+10  7.94e+06  2.06e+11  
f15  Best  2.91e+06  3.91e+08  1.26e+06  2.03e+06  4.63e+07  4.41e18 
Worst  3.87e+06  7.56e+06  4.90e+06  2.88e+06  7.15e+07  
Mean  3.39e+06  5.32e+06  2.01e+06  2.35e+06  6.05e+07  
Std  3.48e+05  2.06e+06  7.23e+05  1.94e+05  6.45e+06 
It can be seen from Table 2 that, both CCF and MOS perform best on 7 problems among a total of 15 test functions, while other compared algorithms perform best on at most 1 or 2 test functions. This indicates that CCF and MOS perform better than other compared algorithms.
For pairwise comparisons, CCF performs better than MOS on 7 test functions, worse than MOS on 7 test functions, and the same as MOS on one test function, but for fully separable functions $f1$ to $f3$, CCF performs better than MOS on two functions, $f2$ and $f3$, and performs the same as MOS on one function $f1$. Note that MOS employs strong local search, which explains why CCF manages to have similar performances only to MOS here. In short, the results indicate that FBG is effective. On 13 test functions the results obtained by CCF are better than those by DECCG. Almost all the results obtained by CCF are better than those obtained by SACC and CCR. These results indicate that CCF performed better than SACC, CCR, and DECCG.
From the results of the twotailed $t$test, it can be seen that the results between CCF and MOS are nearly identical on 2 functions $f1$ and $f6$; and the results of CCF are clearly better than those of MOS on 6 functions. However, for nonseparable functions, the results of CCF are worse than those of MOS. These also indicate that the proposed FBG in CCF is more effective for fully separable functions and partially additively separable functions, and SaNSDE adopted in CCF performs worse than MOS for nonseparable problems.
To further show the effectiveness of CCF, we used performance index ERT to compare CCF with MASWChains (the top algorithm in the CEC'2010 LSGO competition) and MOS (the top algorithm in the CEC2013 LSGO competition) on two representative functions (fully separable functions to which FBG can be applicable), respectively. We choose $f2$ and $f3$ of the CEC'2010 benchmarks when CCF is compared with MASWChains and $f2$ and $f3$ of the CEC'2013 benchmarks when CCF is compared with MOS. Since MASWChains has its results available only in mean values (Molina et al., 2010) on the CEC'2010 benchmarks after 3E+6 function evaluations, we assume that 3E+6 is the lowest number of function evaluations for it to reach that precision. Figures 1 and 2 can show only the ranks of the mean values of CCF and MASWChains on the target precisions. While MOS gave the best, mean, and worst results on the CEC'2013 benchmarks after 3E+6 function evaluations, we assume that 3E+6 is also the lowest number of function evaluations for it to reach the precision. Figures 3 and 4 show the consensus ranks of the best, mean, and worst values (sum of the ranks with respect to the best, mean, and worst values, respectively) of CCF and MOS on the target precisions. From Figures 1 and 2 we can see that CCF always ranks better than MASWChains on $f2$ and $f3$ of the CEC'2010 benchmarks for different precisions. From Figures 3 and 4 we can see that CCF always ranks better than MOS on $f2$ and $f3$ of the CEC'2013 benchmarks for different precisions.
Figures 5, 6, and 7 show the empirical cumulative distributions (ECDF) of CCF on three function categories of the CEC'2013 benchmark suite, where the $x$axis in the figures represents the budget (function evaluations) of each dimension in log form, and the $y$axis represents the proportion of test functions by which CCF succeeds in reaching the required accuracy within the budget. The vertical black lines indicate the maximum number of function evaluations, and each curve of the other colors corresponds to one accuracy level. These three categories are: fully separable functions f1–f3, additively separable functions f4–f11 and fully nonseparable functions f12–f15. For all functions in one category, the same accuracy level $\epsilon =ffopt$ is used, but the accuracy level is different from one function category to another. For example, Figure 5 shows the ECDF for the first function category of the fully separable functions f1–f3 when the accuracy levels $\epsilon ={e+1,e1,e5,e8}$ are reached respectively. For the second and third function categories (i.e., the additively separable functions f4–f11 and the fully nonseparable functions f12–f15), considering numbers like 5E+2 are big, we can't only write the accuracy level like E+2 because 1E+2 and 5E+2 have a big difference. In this case, we have to give more accurate levels, for example, when $ffopt<5E+2$, we have to say the accuracy level $\epsilon =5E+2$ is reached instead of $\epsilon =E+2$ being reached. Figure 6 shows the ECDF for the second function group of the additively separable functions f4–f11 with the accuracy levels $\epsilon ={1.5e15,5e8,5e6,5e2}$ respectively. Figure 7 shows the ECDF for the third function group of the fully nonseparable functions f2–f15 with the accuracy levels $\epsilon ={2e10,1.2e9,3.5e5,2.6e3}$ respectively.
Figures 5, 6, and 7 provide an overall view of the accuracy levels that can be reached within a certain function group. We can see that for the fully separable function group CCF can reach a higher level of accuracy than for the other two function groups. With the increase of the number of function evaluations, the proportion of functions reaching a given accuracy level will increase. For a given accuracy level, with the increase of the proportion of functions reaching this accuracy level, the number of function evaluations will increase. For a fixed proportion of functions, with the increase of the accuracy levels, the number of function evaluations will increase. From Figure 5 we can see that CCF has a higher proportion of functions to reach all the given accuracy levels within the given budget (1E+6 function evaluations) for f1–f3, but from Figures 6 and 7 we can see that CCF can only make nearly $20%$ function evaluations to reach the accuracy level 5E+2 for f4–f11 and nearly $25%$ function evaluations to reach the accuracy level 2.6E+3 for f12–f15 within the given budget. These results indicate that setting the maximum number of function evaluations to 1E+6 for f1–f3 may be proper, but it is not enough for f4–f11 and f12–f15, especially for f12–f15 (only about $25%$ function evaluations can reach a very bad accuracy level 2.6E+3). These results also indicate that f4–f11 and f12–f15 are very difficult problems, and to get a better solution, the number of function evaluations needed should be increased.
Note that we can't carry out the comparison between MASWChains and MOS via ECDF plots because the data needed for at least one of the two algorithms is not available.
In order to show the efficiency and effectiveness of the proposed variable grouping strategy intuitively, we use the semilog line diagram to plot the convergence curve of the fitness value in Figures 8 to 10, where the horizontal axis is the number of function evaluations and the vertical axis is logarithmic scale of the mean fitness value over 25 independent runs. Because the experimental data of SACC and CCR is available and the variable grouping strategies of SACC and CCR are different from that of CCF, we select these two algorithms to compare with CCF. We select three representative functions with different characteristics: $f2$, $f7$, and $f13$ from the CEC'2013 benchmark functions, where $f2$ is a fully separable function, $f7$ is a partially additively separable function, and $f13$ is a nonseparable function. Figures 8 to 10 show the convergence curves of the fitness values for $f2$, $f7$, and $f13$, respectively, where the thick line represents the convergence curve of CCF, the thin line represents the convergence curve of SACC, and the dotted line represents the convergence curve of CCR.
3.3 The Simulation Results on 2000D and 5000D Problems
In this section, CCF is tested on the CEC'2008 LSGO benchmark suite (Tang et al., 2007) for 2000 and 5000 dimensions, and the results are compared with those obtained by CSO (Cheng and Jin, 2015a). CSO is currently the only algorithm for which authors conducted the experiments on the CEC'2008 benchmark suite with 2000 and 5000 dimensions.
We conducted the experiments on f1–f6 of the CEC'2008 benchmarks and compared the results with those of CSO. Here f1, f4, and f6 are fully separable functions, whereas f2, f3, and f5 are fully nonseparable functions. We executed 25 independent runs of CCF for each test problem on 2000 dimensions and 10 independent runs for each test problem on 5000 dimensions (the same as CSO). The maximum number of function evaluations of both CCF and CSO is set to 5000*D where D is the dimensionality.
The comparison results are listed in Table 3, where we can see that for separable functions f1, f4, and f6, CCF is better than CSO on both 2000 and 5000 dimensional problems. This is because CCF can correctly find the variable groups and the proposed local search method is more effective. For the fully nonseparable functions f2, f3, and f5, CCF performs worse than CSO. This is because the variables for these three functions cannot be divided into subgroups and all variables are in one group. For these problems, FBG doesn't offer any help and the performance of CCF completely depends on SaNSDE, while SaNSDE apparently performs worse than CSO, which results in CCF performing worse than CSO.
2000D  5000D  
functions  CCF  CSO  CCF  CSO 
f1  0.00e+00(0.00e+00)  1.66e20(3.36e22)  0.00e+00(0.00e+00)  1.43e19(3.33e21) 
f2  1.70e+02(3.22e+00)  6.17e+01(1.31e+00)  1.81e+02(1.23e+00)  9.82e+01(9.78e01) 
f3  4.12e+03(2.57e+03)  2.10e+03(5.14e+01)  9.01e+03(1.40e+03)  7.30e+03(1.26e+02) 
f4  2.35e+02(5.86e+01)  2.81e+03(3.69e+01)  6.06e+02(1.07e+02)  7.80e+03(8.73e+01) 
f5  1.63e02(4.91e02)  3.33e16(0.00e+00)  4.70e03(1.24e02)  4.44e16(0.00e+00) 
f6  6.66e13(3.95e14)  3.26e12(5.43e14)  1.65e12(6.15e14)  6.86e12(5.51e14) 
2000D  5000D  
functions  CCF  CSO  CCF  CSO 
f1  0.00e+00(0.00e+00)  1.66e20(3.36e22)  0.00e+00(0.00e+00)  1.43e19(3.33e21) 
f2  1.70e+02(3.22e+00)  6.17e+01(1.31e+00)  1.81e+02(1.23e+00)  9.82e+01(9.78e01) 
f3  4.12e+03(2.57e+03)  2.10e+03(5.14e+01)  9.01e+03(1.40e+03)  7.30e+03(1.26e+02) 
f4  2.35e+02(5.86e+01)  2.81e+03(3.69e+01)  6.06e+02(1.07e+02)  7.80e+03(8.73e+01) 
f5  1.63e02(4.91e02)  3.33e16(0.00e+00)  4.70e03(1.24e02)  4.44e16(0.00e+00) 
f6  6.66e13(3.95e14)  3.26e12(5.43e14)  1.65e12(6.15e14)  6.86e12(5.51e14) 
3.4 The LennardJones Potential Problem
The LJ problem is a fully nonseparable problem. Since the proposed algorithm CCF is mainly designed for separable and partiallyseparable problems, if CCF is applied to the LJ problem, FBG should be able to detect the nonseparability of the variables of the LJ problem and classify all variables into one group, that is, it does not make any decomposition. The performance will be largely dependent on how well the problem is solved by SaNSDE, which is adopted as an optimizer for each subproblem.
The LJ potential problem is a very difficult multimodal optimization problem and the number of local minima grows exponentially with the number of the atoms $N$ (Hoare, 1979). The CEC'2011 EAs competition test suite (Das and Suganthan, 2010) has included the LJ potential problem with the number of atoms fixed to 10. However, even the topperforming algorithm GAMPC (Elsayed et al., 2011b) could not locate the global minima with a 100% success rate. Table 4 compares CCF with four best performing EAs at the CEC'2011 competition, including GAMPC (Elsayed et al., 2011b), DE$\Lambda cr$ (ReynosoMeza et al., 2011), SAMODE (Elsayed et al., 2011a), and Adap.DE171 (Asafuddoula et al., 2011), on the LJ problem with 10 atoms. The stopping criterion is that the number of the function evaluations reaches $1.5e+5$.
It can be seen from Table 4 that the CCF performs better than SAMODE and Adap.DE171, and performs similarly to the champion algorithm GAMPC.
CCF  GAMPC  DE$\Lambda cr$  SAMODE  Adap.DE171  
Best  $$2.84e+01  $$2.84e+01  $$2.84e+01  $$2,84e+01  $$2.84e+01 
Median  $$2.75e+01  $$2.74e+01  $$2.75e+01  $$2.73e+01  $$2.72e+01 
Worst  $$2.65e+01  $$2.71e+01  $$2.64e+01  $$2,61e+01  $$1.82e+01 
Mean  $$2.73e+01  $$2.77e+01  $$2.77e+01  $$2,71e+01  $$2.68e+01 
Std  6.78e01  4.67e01  4.90e01  6.62e01  1.97e+00 
CCF  GAMPC  DE$\Lambda cr$  SAMODE  Adap.DE171  
Best  $$2.84e+01  $$2.84e+01  $$2.84e+01  $$2,84e+01  $$2.84e+01 
Median  $$2.75e+01  $$2.74e+01  $$2.75e+01  $$2.73e+01  $$2.72e+01 
Worst  $$2.65e+01  $$2.71e+01  $$2.64e+01  $$2,61e+01  $$1.82e+01 
Mean  $$2.73e+01  $$2.77e+01  $$2.77e+01  $$2,71e+01  $$2.68e+01 
Std  6.78e01  4.67e01  4.90e01  6.62e01  1.97e+00 
Note that the bestperforming algorithms currently available for solving this problem rely on computing the gradients based on the LJ problem formulation (Daven et al., 1996; Northby, 1987; Wales and Doye, 1997; Xiang et al., 2004; Cheng and Jin, 2015a,b). To make a fair comparison with the bestperforming algorithms, we also used the gradients of the LJ formula in our local search method instead of using the current local search algorithm SaNSDE. In the following experiments for the LJ problem with more atoms, we name the search algorithm in the Northby algorithm (Northby, 1987). We use a revised Northby algorithm as the local search method under the CC framework. There are mainly three modifications to the Northby algorithm (Northby, 1987):
The revised algorithm uses only the LJ potential energy as the objective function, while Northby algorithm (Northby, 1987) uses both the nearest neighbor potential and the LJ potential energy as the objective function.
We reduced the lattice search from 250 times to 60 times in order to reduce the computational cost.
Instead of optimizing all configurations obtained from the lattice search phase, we choose only some of the configurations for optimization.
Note that IC lattice means the sites forming a Mackay icosahedral at the center together with the sites which correspond to the next complete icosahedral shell. FC lattice means the sites of the outer shell are located on the faces at stacking fault locations (Northby, 1987). For more details about IC and FC lattices, please refer to Northby (1987).
We set parameters $m=60$ and $n=20$ for problems with fewer than 150 atoms, and set $m=120$ and $n=50$, otherwise. In the experiments, we compare CCF using the revised Northby algorithm as the local search method (namely CCFRN) with the two bestperforming algorithms in Northby (1987) and Romero et al. (1999) for LJ the problem. These two algorithms produced many bestknown global minima and their results listed in Table 5 are still the best results to this date. We compare CCFRN with these two algorithms in Table 5, where “$$” means the values are not available.
CCFRN  (Northby, 1987)  (Romero et al., 1999)  
N  Best  Worst  Mean  Best  Best 
50  $$244.550  $$244.550  $$244.550  $$244.550   
60  $$305.876  $$305.876  $$305.876  $$305.876   
70  $$366.892  $$366.892  $$366.892  $$366.892   
80  $$427.829  $$427.829  $$427.829  $$428.084   
90  $$492.434  $$492.434  $$492.434  $$492.434   
100  $$557.040  $$557.040  $$557.040  $$557.040   
110  $$621.788  $$619.770  $$620.551  $$621.788   
120  $$687.022  $$686.689  $$686.937  $$687.022   
130  $$755.271  $$755.271  $$755.271  $$755.271   
140  $$826.175  $$826.175  $$826.175  $$826.175   
150  $$893.310  $$893.310  $$893.310  $$893.310  $$893.310 
160  $$957.110  $$957.106  $$957.107    $$957.110 
170  $$1024.155  $$1024.155  $$1024.155    $$1024.791 
180  $$1092.209  $$1092.209  $$1092.209    $$1092.632 
190  $$1160.511  $$1160.511  $$1160.511    $$1161.301 
200  $$1229.081  $$1288.850  $$1228.887    $$1229.184 
210  $$1299.963  $$1299.963  $$1299.963    $$1300.006 
220  $$1368.262  $$1367.976  $$1368.205    $$1368.349 
230  $$1439.358  $$1437.010  $$1437.264    $$1439.358 
240  $$1508.284  $$1507.980  $$1508.101    $$1508.562 
CCFRN  (Northby, 1987)  (Romero et al., 1999)  
N  Best  Worst  Mean  Best  Best 
50  $$244.550  $$244.550  $$244.550  $$244.550   
60  $$305.876  $$305.876  $$305.876  $$305.876   
70  $$366.892  $$366.892  $$366.892  $$366.892   
80  $$427.829  $$427.829  $$427.829  $$428.084   
90  $$492.434  $$492.434  $$492.434  $$492.434   
100  $$557.040  $$557.040  $$557.040  $$557.040   
110  $$621.788  $$619.770  $$620.551  $$621.788   
120  $$687.022  $$686.689  $$686.937  $$687.022   
130  $$755.271  $$755.271  $$755.271  $$755.271   
140  $$826.175  $$826.175  $$826.175  $$826.175   
150  $$893.310  $$893.310  $$893.310  $$893.310  $$893.310 
160  $$957.110  $$957.106  $$957.107    $$957.110 
170  $$1024.155  $$1024.155  $$1024.155    $$1024.791 
180  $$1092.209  $$1092.209  $$1092.209    $$1092.632 
190  $$1160.511  $$1160.511  $$1160.511    $$1161.301 
200  $$1229.081  $$1288.850  $$1228.887    $$1229.184 
210  $$1299.963  $$1299.963  $$1299.963    $$1300.006 
220  $$1368.262  $$1367.976  $$1368.205    $$1368.349 
230  $$1439.358  $$1437.010  $$1437.264    $$1439.358 
240  $$1508.284  $$1507.980  $$1508.101    $$1508.562 
Note that Table 5 lists only the best potential energy values for the deterministic algorithms such as in Northby (1987) and Romero et al. (1999). But for CCF, we provide the best, worst, and mean potential energy values obtained by CCFRN over 15 independent runs.
From Table 5, we can see that the mean values obtained by CCFRN for 10 cases out of 11 ($N=50$ to 150) are either identical or very close to those obtained by the Northby algorithm (Northby, 1987), As the number of atoms (cluster size) gets larger, more computational cost is needed and it is harder to locate the global minima. In order to get better results, the algorithm in Romero et al. (1999) uses not only the IC and FC icosahedral based structures but also decahedral and fcc (facecenteredcubic) geometric information. It can be seen from Table 5 that CCFRN can locate the global minima in the case of cluster size $N=150$ with a 100% success rate. Furthermore, for cases of cluster size $N\u2265150$, all the mean values obtained by CCFRN are very close to the best known global minima for all cases $N\u2265150$. Also, CCFRN uses only the icosahedral based configurations, and the computational cost used by CCFRN is less than that was used in Romero et al. (1999). All above results indicate that CCFRN performs competitively and reliably.
From the above analysis to the performance of CCF on the LJ problem, we have the following observations: 1) FBG is effective, but CCF has some limitations. In fact, for fully nonseparable LSGO problems, especially for very hard LSGO problems such as the LJ problem, although the FBG can identify nonseparability of the variables correctly and then classify all variables into one group correctly, CCF will become ineffective when the local search strategy adopted is ineffective. This is because in this case the CCF has to rely on the local search strategy to optimize the whole problem (all variables are classified into one group and cannot be divided into several subgroups). Apparently, a typical LSGO problem usually cannot be effectively optimized by a local search algorithm itself such as SaNSDE. 2) In order to solve a very difficult LSGO problem such as the LJ problem, it is necessary for us to use as much information as possible for the problem (e.g., gradients) and the domain knowledge (e.g., lattice structure of the atoms for the LJ problem) when designing the search algorithm.
4 Conclusions
In this article, we have developed a new formulabased variable grouping method (FBG), which can be used in conjunction with the cooperative coevolutionary algorithm for solving largescale global optimization problems. The key merits of employing FBG is that it makes use of the formula of an objective function to carry out the variable grouping task effectively. In contrast, existing variable grouping strategies do not make use of the information of the formulas of the objective functions, since they usually assume blackbox optimization. Nevertheless, the formulas (expressions) of many LSGO problems are known or given in advance. Our proposed FBG variable grouping method provides a new approach for effective variable grouping.
Since FBG can leverage on the information contained within the formula of the objective function, it can achieve more accurate variable grouping than the existing methods. FBG can offer distinct advantages over other methods especially when the formulas are complex and nonintuitive. Moreover, the FBG parser provides an elegant and efficient way of automating the variable grouping process. Our experiments evaluating the FBGbased CC algorithm (i.e., CCF), on three benchmark suites (the CEC'2008, CEC'2010, and CEC'2013 benchmark suites) and a realworld problem (the LJ problem) have demonstrated that FBG can be used as an effective variable grouping method, which is an important step for decomposing any largescale global optimization problem.
Our future work will include developing a sophisticated variable grouping strategy that can consider more scenarios than those described in this article. Furthermore, for problems with weak interdependency between subproblems, how we can come up with effective variable grouping that can be used to enhance optimization performance is still an unanswered question.
Acknowledgments
This work was supported by the National Natural Science Foundation of China (No. 61472297 and No. U1404622).