## Abstract

For a large-scale global optimization (LSGO) problem, divide-and-conquer is usually considered an effective strategy to decompose the problem into smaller subproblems, each of which can then be solved individually. Among these decomposition methods, variable grouping is shown to be promising in recent years. Existing variable grouping methods usually assume the problem to be black-box (i.e., assuming that an analytical model of the objective function is unknown), and they attempt to learn appropriate variable grouping that would allow for a better decomposition of the problem. In such cases, these variable grouping methods do not make a direct use of the formula of the objective function. However, it can be argued that many real-world problems are white-box problems, that is, the formulas of objective functions are often known a priori. These formulas of the objective functions provide rich information which can then be used to design an effective variable group method. In this article, a formula-based grouping strategy (FBG) for white-box problems is first proposed. It groups variables directly via the formula of an objective function which usually consists of a finite number of operations (i.e., four arithmetic operations “$+$”, “$-$”, “$×$”, “$÷$” and composite operations of basic elementary functions). In FBG, the operations are classified into two classes: one resulting in nonseparable variables, and the other resulting in separable variables. In FBG, variables can be automatically grouped into a suitable number of non-interacting subcomponents, with variables in each subcomponent being interdependent. FBG can easily be applied to any white-box problem and can be integrated into a cooperative coevolution framework. Based on FBG, a novel cooperative coevolution algorithm with formula-based variable grouping (so-called CCF) is proposed in this article for decomposing a large-scale white-box problem into several smaller subproblems and optimizing them respectively. To further enhance the efficiency of CCF, a new local search scheme is designed to improve the solution quality. To verify the efficiency of CCF, experiments are conducted on the standard LSGO benchmark suites of CEC'2008, CEC'2010, CEC'2013, and a real-world problem. Our results suggest that the performance of CCF is very competitive when compared with those of the state-of-the-art LSGO algorithms.

## 1  Introduction

Many important classes of real-world optimization problems involve a large number of decision variables, for example, shape optimization where a large number of variables are often required to represent complex shapes (Sonoda et al., 2004; Vicini and Quagliarella, 1998), clustering (Xu and Wunsch, 2005), and feature selection (Lagrange et al., 2017), where a large number of features could be present. How to handle these large-scale global optimization (LSGO) problems effectively still remains a challenging problem.

In recent years, new progress has been made toward handling LSGO problems in the field of evolutionary computation, for example, cooperative coevolution algorithms (Li and Yao, 2012; Omidvar, Li, and Yao, 2010b; Yang et al., 2008a; Potter and Jong, 1994; Yang et al., 2008b; Mei et al., 2014; Chen et al., 2010; Omidvar et al., 2014; Kazimipour et al., 2013; Dong et al., 2012; Leung and Wang, 2001), estimation of distribution algorithms (Valdez et al., 2013; Ahn et al., 2012), memetic algorithms (Iacca et al., 2012; Caraffini et al., 2013), social learning algorithms (Cheng and Jin, 2015a), and hybrid algorithms (LaTorre et al., 2013). Several studies conducted experiments on the CEC'2008 benchmark functions with up to 2000 or 5000 dimensions. More challenging LSGO benchmark suites have also been developed (Tang et al., 2009; Li et al., 2013), as well as further comparative studies (LaTorre et al., 2015).

For a LSGO problem, one common strategy is to adopt a divide-and-conquer approach which can divide a LSGO problem into several smaller subproblems and then solve them individually. Upon a successful decomposition, one can then adopt a cooperative coevolution (CC) framework (Potter and Jong, 1994) to solve the subproblems and, therefore, the original problem. The key idea here is to decompose the LSGO problem into several smaller subcomponents (or subproblems), with each subcomponent seen as an independent species to be evolved separately by a subpopulation using an evolutionary algorithm (EA). Individuals in a subpopulation are evaluated by how well they collaborate with the best individuals in other subcomponents via the objective function. More specifically, an individual in each species (subpopulation) is evaluated by concatenating itself with the best individuals from other subpopulations, to form a complete candidate solution, which is then fed into the objective function.

In the early days of the development of CC, variables were either divided into totally separable 1-dimensional subcomponents (Potter and Jong, 1994) or fixed subgroups from the start of an optimization run (van den Bergh and Engelbrecht, 2004). Consequently, these methods cannot handle problems with nontrivial variable interactions effectively. In order to mitigate this problem, some more effective grouping methods have been proposed in recent years. For example, Yang et al. (2008a) proposed a random grouping method, where variables are randomly shuffled into a number of subcomponents after performing several iterations during an optimization run. As a result, the likelihood of interacting variables being placed in one subcomponent is increased, which then helps with improving the optimization performance. Unfortunately this method is very limited as it is effective only for subcomponents with a small number of interacting variables (Omidvar, Li, Yao, and Yang, 2010). A more sophisticated variable grouping method was proposed by Chen et al. (2010), where the grouping is detected or learned by checking pair-wise dimensions via sampled points iteratively. Since the detection method is based on directly comparing fitness values of the sample points, the method failed to detect the interaction between two variables in many iterations, hence not only wasting much computational effort, but also keeping the grouping accuracy low. A more powerful variable grouping method (namely differential grouping, or DG in short) was proposed by Omidvar et al. (2014), which is able to learn and detect variable interactions for each pair-wise variable comparison much more accurately and efficiently. Experiments on the CEC'2010 LSGO benchmark suite (Tang et al., 2009) have shown that DG is the state-of-the-art variable grouping method (Omidvar et al., 2014). It is important to note that in DG a parameter $ɛ$ must be specified so as to determine at what threshold value a pair of decision variables may be considered as interactive (or nonseparable). The specification of this parameter may affect the variable grouping results. Furthermore, the condition given to verify variable interaction is only a sufficient condition rather than a sufficient and necessary condition, which means that when the condition is not satisfied, we cannot be 100% certain whether the two variables involved are interacting with each other or not.

Note that the variable grouping strategy is a key technique for decomposition that can be further used in the cooperative coevolutionary setting. The aforementioned variable grouping methods (including DG) do not assume the analytical model of a problem is known in advance (equivalent to assume that the problem is black box). As a result, these methods do not make use of any information from the formula of an objective function, even if such information is available. However, for many real-world problems, the formulas of objective functions are often known (we call these kinds of problems white-box problems). For these kinds of problems, the problem information contained within the formulas of the objective functions can be used to perform effective variable grouping. In other words, more information (than pure black box) can be used for facilitating variable grouping for white-box problems. One can harness this information to design more efficient and effective variable grouping methods, which will be of great value for solving LSGO problems. To achieve this, we will focus on using existing known formulas of the objective functions to design a more effective variable grouping strategy, and then integrate this strategy into the CC framework for ultimately solving LSGO problems.

In this article, we propose a formula-based variable grouping strategy (called FBG) for white-box problems. The grouping strategy makes direct use of the characteristics of the objective function formula. Note that, generally speaking, a generic function consists of a finite number of four arithmetic operations “$+$,” “$-$,” “$×$,” and “$÷$,” as well as composite operations for basic elementary functions. FBG classifies these operations into two classes: the operations in one class result in nonseparable variables, and the operations in the second class result in separable variables. In this way, for any white-box problem, a LSGO problem can be easily divided into several smaller subproblems to be solved respectively. Furthermore, to improve solution quality, a local search strategy is also developed. By integrating FBG and the local search strategy into the CC framework, we propose a new cooperative coevolutionary algorithm with formula-based grouping strategy (so-called CCF). CCF offers the following key advantages: 1) it can decompose exactly a large-scale white-box problem into several smaller subproblems, if variables are separable or partially separable; 2) it can search for each decomposed subproblem space independently; and 3) it can improve the solution quality quickly via the local search scheme.

Experiments have been carried out on the LSGO benchmark suites of CEC'2008 (Tang et al., 2007), CEC'2010 (Tang et al., 2009), CEC'2013 (Li et al., 2013), as well as a challenging real-world LSGO problem. The performance of CCF is compared with that of several state-of-the-art algorithms using the following performance measures: the best solution, the worst solution, the mean solution, ERT (Expected Running Time) (Mersmann et al., 2015; Auger and Hansen, 2005), and ECDF (Empirical Cumulative Distribution Function) (Pál et al., 2012; Hansen et al., 2010). These results indicate that CCF is very effective and efficient in terms of variable grouping.

The remainder of the article is organized as follows. Section 2 describes the proposed FBG variable grouping method in detail, including an initialization method adopted and a local search algorithm. This is followed by a novel CC algorithm which incorporates both FBG and the local search, namely CCF. Section 3 presents our experimental results and analysis. Finally, Section 4 concludes and provides future research directions.

•

$k$: generation number;

•

$D$: dimensions of a test problem;

•

$xk*$: local optimum of an objective function in the $k$-th generation;

•

$fk*$: function value at $xk*$;

•

$x*$: global optimum of an objective function;

•

FEs: number of function evaluations;

•

MaxFEs: maximum number of function evaluations;

•

Non-Sep: set of operations or functions resulting in variables interaction (e.g., some basic functions involved in four arithmetic operations, etc.);

•

$M$: number of subcomponents (or subgroups).

## 2  The Proposed Algorithm

The proposed formula-based CC algorithm (CCF) is based on the general CC framework. The basic idea of CC is to adopt a divide-and-conquer strategy for a problem. The key steps of CC can be summarized as follows:

1. Problem decomposition: decompose a high-dimensional vector into several smaller subcomponents by using a specific variable grouping strategy.

2. Subcomponent optimization: evolve each subcomponent through a cooperative coevolutionary mechanism (Potter and Jong, 1994) with a population-based stochastic optimization method as a subcomponent optimizer. In order to execute such a population-based optimization method on a subcomponent, each individual in the population representing a subcomponent must be evaluated on how well it collaborates with other subcomponents. This calls for a merge of the current candidate solution with the best solutions from all other subcomponents, and evaluate such a merged solution as a whole in the original objective function. Subcomponents are then optimized in a round-robin manner.

For the problem decomposition part, CCF adopts the formula-based variable grouping, which we describe below. For the subcomponent optimization, we adopt an initialization method based on chaos, and a local search method based on a revised version of Quasi Newton method.

### 2.1  Formula-Based Variable Grouping (FBG)

The goal of variable grouping is to group interacting variables into the same subcomponent, and at the same time to keep the interdependency between different subcomponents to the minimum. In this way, a large-scale problem can be divided into some relatively separate subproblems each with lower dimensions. However, grouping variables accurately is a challenging task. Especially for black-box problems, little or no information about the objective function can be obtained, making it very difficult to design an effective variable grouping strategy.

In contrast, for white-box problems, where more domain specific information is readily available through the problem formulation, it is possible to leverage on this information to design more effective variable grouping methods. This motivates us to propose a formula-based variable grouping method (FBG) for white-box LSGO problems.

FBG involves the following steps: First, construct a set Non-Sep whose elements are the arithmetic operations and the basic elementary functions which would lead to variable interaction. Secondly, search the variables involved in the elements of set Non-Sep in the expression of the objective function. Finally, group the interactive variables (those variables connected by elements in Non-Sep) into the same subcomponent.

Note that an expression of a general objective function consists of a finite number of four arithmetic operations “$+$,” “$-$,” “$×$,” “$÷$,” and composite operations of basic elementary functions, for example, power function $ya$; exponential functions $ay$ and $ey$; logarithmic functions ln$y$ and log$ay$; trigonometric functions sin$y$, cos$y$, tan$y$, cot$y$, sec$y$, and csc$y$; inverse trigonometric functions arcsin$y$, arccos$y$, arctan$y$, arccot$y$, arcsec$y$, and arccsc$y$. We group variables based on the expression structure of the objective function according to the following three cases:

Case 1: Detecting variable interactions in four arithmetic operations. If a function is in the form: $p(x)=a1x1+a2x2+⋯+amxm$, then each $aixi$ in this function can be minimized/maximized independently. Thus the variables $x1,x2,…,xm$ in this function are separable. While if a function contains “$×$” or “$÷$” of two variables, these two variables cannot be minimized/maximized independently. Thus they are nonseparable. We put these two operations “$×$” and “$÷$” into set Non-Sep.

Case 2: Detecting variable interactions in basic elementary functions. For a basic elementary function $g(y)$ with $y∈R$ and an $n$-dimensional function $h(x)$, if $g(y)$ is monotone and variables in function $h(x)$ are separable, then the variables in composite function $g(h(x))$ are also separable (e.g., if $g(y)=ey$, and $h(x)=x1+x2+⋯+x10$, then variables in $g(h(x))=ex1+x2+⋯+x10$ are separable). Otherwise, variables in $g(h(x))$ are nonseparable, that is, when $g(y)$ is a nonmonotonic function or $h(x)$ is nonseparable, their composite function $g(h(x))$ is nonseparable. Thus, the key operation to form a nonseparable composite function for this case is that $g(y)$ is nonmonotone, and we put the nonmonotonic basic elementary functions (e.g., trigonometric functions, inverse trigonometric functions, and power function with the exponent being an even integer) into set Non-Sep. Thus, for composite function $g(h(x))$, the key issue is to judge which of all basic elementary functions $g(y)$ is monotone when the variables in function $h(x)$ are separable:

(2.1) For the power function $g(x)=xa,x∈R$, when $a=2k+1,k∈N$ or when $a>0$ is a fraction (i.e., $a=bc>0,c≠0$), where $b$ and $c$ are coprime and odd, then $g(x)$ is monotonically increasing. We put power function $xa$ into Non-Sep except for the aforementioned two cases.

(2.2) For the exponential function $g(x)=ax(a>0,a≠1,x∈R)$, when $0, $ax$ is monotonically decreasing; when $a>1$, $ax$ is monotonically increasing. Thus, $ax$ is not put into Non-Sep.

(2.3) For the logarithmic function $g(x)=logax(a>0,a≠1,x>0)$, when $0, $ax$ is monotonically decreasing; when $a>1$, $ax$ is monotonically increasing. Thus, $logax$ is not put into Non-Sep.

(2.4) For trigonometric functions, because all of them are periodic functions, they are put into Non-Sep.

(2.5) For inverse trigonometric functions (e.g., $g(x)=arcsinx$, $g(x)=arccosx$, $g(x)=arctanx$, $g(x)=arccotx$, $g(x)=arcsecx$, or $g(x)=arccscx$, where $x∈R$), they are not a one-to-one mapping. Instead, they just have one monotonous zone. Since this leads to the interaction of variables, they are put into Non-Sep.

(2.6) For constant function $g(x)=c$, it does not lead to any interaction of variables, thus it will not be put into Non-Sep.

Case 3: Detecting variable interactions in a function obtained by one operation of “$+$,” “$-$,” “$×$,” and “$÷$” on two composite functions in (2). Note that operations “$+$” and “$-$” do not change the separability of variables. If two composite functions in (2) are multiplied or divided, the variables in these two composite functions are nonseparable except for the multiplication of two exponential functions and both exponential functions being monotonically increasing or monotonically decreasing. Thus, we put these two composite functions linked by “$×$” and “$÷$” into set Non-Sep except for the aforementioned case.

Through the above three cases, we can obtain set Non-Sep and determine the interacting variables which are linked by elements of set Non-Sep. Then these linked variables are put into a group.

In this research we have designed and developed a parser program to do the variable grouping using the above FBG strategy. The execution results of the FBG parser on the CEC'2008, CEC'2010, and CEC'2013 LSGO benchmarks indicate that FBG can find the correct subgroups of all these benchmark functions.

To gain a better understanding of FBG, here we choose f6 in the CEC'2013 benchmark suite (Li et al., 2013) as an example to illustrate the process of FBG, because f6 is a typical test function containing various operations such as matrix operation, piecewise functions, and several basic elementary functions, as shown below:
$f6(z)=∑i=1|S|-1wifackley(zi)+fackley(z|S|)$
(1)
where
• $S={50,25,25,100,50,25,25,700}$;

• $D=∑i=1|S|Si=1000$;

• $y=x-xopt$;

• $yi=y(P[Ci-1+1]:P[Ci]),i∈{1,…|S|}$

• $zi=Λ10Tasy0.2(Tosz(Riyi),i∈{1,…|S|-1}$;

• $z|S|=Λ10Tasy0.2(Tosz(y|s|)$

• $Ri$ is a $|Si|×|Si|$ rotation matrix;

• P is a random permutation of the dimensions;

• $x∈[-32,32]D$;

The FBG variable grouping method works as follows: the parser scans the input function string from left to right. First, $∑i=1|S|-1$ is scanned, then the parser expands this sum operation and obtains $w1fackley(z1)+w2fackley(z2)+⋯+w7fackley(z7)$. For simplicity, we illustrate only the variable grouping process for $w1fackley(z1)$. The process for $w2fackley(z2)$,…,$w7fackley(z7)$ is exactly the same as that for $w1fackley(z1)$. Here, the parser reads $w1$. Because $w1$ is a constant which does not result in interaction of variables, the parser continues to invoke the predefined function $fackley$ with the parameters $z1$, which is then expanded to $Λ10Tasy0.2(Tosz(R1y1)$. Following this, the parser scans $Λ10$ which does not result in interaction of variables, then it continues to scan the $Tasy0.2$ and $Tosz$ functions which indeed result in variable interactions. Thus these variables $R1y1$ will be extracted and passed to the ackley function for further checking.

Thereafter, the parser begins to scan $fackley$. Because $-20$ is a constant which does not result in the interaction of variables, the parser continues to scan exp. Based on the FBG strategy, the parser identifies that exponential functions do not result in the interaction of variables. So it continues. When the parser scans $1S1∑i=1S1zi2$, which is stored as $(1S1∑i=1S1(z1(i))2)12$, the parser reads “(”, and a temporary set Temp-Set will be constructed to keep the candidate interactive variables. After scanning constant $1S1$ which does not result in the interaction of variables, the parser checks the sum function $∑i=1S1(z1(i))2$ and expands it to $z1(i)$ containing variables, for example, ${(x611,x595,x579)}$. The variables involved in this sum will be put into set Temp-Set, i.e., Temp-Set= ${allvariablesinz1}={x611,x595,x579,…}$. This process continues until “)” is scanned. Then the “power of 2” operation is scanned. Because this operation results in variable interaction, all the variables in Temp-Set as a group will be put into the Nonsep-Groups, in which each group contains the interactive variables, and then we clear Temp-Set. This procedure continues until the end of the input function. The final step is to update Nonsep-Groups by merging the overlapping variables linked by the groups of Nonsep-Groups to one group. In this way, all variables are classified into the correct groups as described in Li et al. (2013).

In short, FBG can correctly group the interactive variables of all benchmark test functions in the CEC'2008, CEC'2010, and CEC'2013 LSGO benchmark suites. The grouping results by FBG on the most recent CEC'2013 LSGO benchmark suite can be found in the supplementary materials.

Note that after variables are classified into different groups by FBG, any evolutionary algorithm can be used to each subcomponent for white-box problems in the same way used for black-box problems in the existing works.

### 2.2  Initialization Method

Initialization plays an important role in any population-based stochastic optimization algorithm. Traditionally, basic random number generators (RNGs) are commonly used to initialize the population of CC. A large and growing number of works have proposed new ways to generate the initial population (Kazimipour et al., 2013; Dong et al., 2012; Leung and Wang, 2001). In this article, we adopt the chaotic method (Dong et al., 2012) to generate the initial population, since this type of chaos-based initialization method seems to provide an outstanding uniform distribution of points. The detailed process of the chaos initialization method is as follows:

1. Generate the chaos factor by the following formula:
$zj(i)=μ(1-2|zj(i-1)-0.5|),0≤zj(0)≤1,i=1,2,…,m,j=1,2,…,D.$
where $zj(i)$ denotes the $jth$ chaos factor at iteration $i$.
2. Set $i=0$ and randomly generate $D$ uniform factors $zj(0)$ for $j=1,2,…,D$.

3. Set $μ=1$. Then, for $i=1,2,…,m$, iteratively generate the chaos factors $zj(i)$ for $j=1,2,…,D$ by formula (2).

After that, these chaos factors can be used to generate an initial population as follows:
$Xi=(xi1,xi2,…,xiD),i=1,2,…,m$
(3)
where
$xij=xijmin+zj(i)(xijmax-xijmin),j=1,2,…,D,$
(4)
and $xijmin$ and $xijmax$ are the lower and upper bounds of the $i$-th variable of the problem.

### 2.3  A New Local Search Strategy

To further enhance the efficiency of the proposed algorithm CCF, which will be described in Section 2.4, we adopt a new local search scheme.

There exist some efficient local search methods, for example, Conjugate Gradient Method (Faires and Burden, 2003), Newton Method and Quasi Newton Method (Griewank and Toint, 1982), etc. However, these methods require calculation of the gradients of an objective function, hence are not suitable for solving nondifferentiable problems. In order to retain the advantages of these local search methods and avoid computing the gradients, a revised version of the Quasi Newton algorithm is designed and its pseudocode is shown in Algorithm 1, where the key novel idea is to use an approximate formula to estimate the gradient instead of computing it directly from the objective function.

In addition, for the fully separable functions which can be decomposed into 1000 independent 1-D functions such as f1–f3 in the CEC' 2010 and CEC' 2013 benchmark suites, we use the line-search strategy as the local search strategy (Nocedal and Wright, 1999).

### 2.4  Cooperative Coevolution with Formula-Based Grouping Strategy (CCF)

By using FBG, a large-scale optimization problem can be decomposed into several smaller subproblems. We can then use an efficient optimization method to solve each subproblem through a cooperative coevolution mechanism (Potter and Jong, 1994). More specifically, first, for each subproblem, only the variables corresponding to this subproblem (subcomponent) are seen as the variables to be evolved, and the variables corresponding to all other subproblems (subcomponents) are fixed to their best values. Second, for each specific subproblem, each individual with which the values of other subcomponents are fixed to their best values can be evaluated using the original objective function. These individuals form the subpopulation for this subproblem. Third, all the subproblems are optimized in a round-robin fashion one by one from the first subproblem to the last by evolving the corresponding subpopulation using a chosen optimization method. The above process is iterated until the termination criterion is met.

It is well known that Differential Evolution (DE) is a simple and effective population-based optimization method (Vesterstrom and Thomsen, 2004), but the performance of a standard DE is sensitive to its control parameters (Gämperle et al., 2002). To overcome this, a self-adaptive DE with neighborhood Search (SaNSDE) was developed with a self-adapted crossover rate $CR$ and scaling factor $F$ (Qin et al., 2009). SaNSDE has been successfully applied in a variety of problems (e.g., Yang et al., 2008a,b). Thus, we choose SaNSDE as the evolutionary algorithm to optimize each subproblem containing more than ten variables. If a subproblem contains fewer than ten variables, as mentioned in the previous subsection, using SaNSDE as the optimizer for this subproblem may waste a lot of computation. Therefore, we choose to use the new local search or line search scheme described in the previous subsection. By integrating FBG with SaNSDE/the new local search scheme, we propose a new cooperative coevolution algorithm with formula-based variable grouping strategy (CCF). Its pseudocode is given in Algorithm 2.

## 3  Numerical Experiments

### 3.1  Benchmark Suite and Parameters Setting for CCF

In this section, the proposed algorithm CCF is first tested on two widely-used LSGO benchmark suites with 1000 dimensions: the CEC'2010 LSGO benchmark suite (Tang et al., 2009) and CEC'2013 LSGO benchmark suite (Li et al., 2013). We also carry out the experiments on the early CEC'2008 benchmark suite (Tang et al., 2007) with 2000 and 5000 dimensions, respectively.

• CEC'2010 benchmark functions:

The dimensions of the functions are fixed at 1000, where $f1$-$f3$ are fully separable functions, $f4$-$f18$ are partially additively separable functions, and $f19$-$f20$ are nonseparable functions.

• CEC'2013 benchmark functions:

The dimensions of the functions are fixed at 1000, where $f1$-$f3$ are fully separable functions, $f4$-$f11$ are partially additively separable functions, and $f12$-$f15$ are nonseparable functions.

• CEC'2008 benchmark functions:

The dimensions of the functions are scalable, where $f1,f4$ and $f6$ are fully separable functions, and $f2,f3,f5$ and $f7$ are fully nonseparable functions. We performed the experiments with dimensions 2000 and 5000, respectively.

CCF is compared with a number of existing LSGO algorithms on all or a part of these benchmarks. In all cases, we conducted 25 independent runs for each test function except for the 5000 dimensional test functions. For each test function with 5000 dimensions, we conducted 10 independent runs to be consistent with those conducted in the compared CSO algorithm.

We used the following performance evaluation methods to compare the proposed algorithm CCF with several state-of-the-art algorithms:

• We recorded the smallest objective function value (“Best”), the largest function value (“Worst”), the average function value (“Mean”), and the standard deviation denoted (“Std”) in 25 runs. We compared these values with those obtained by the compared algorithms.

• We compared $p$-value obtained by the proposed algorithm with those obtained by the compared algorithms, where $p$-value means the two-tailed $t$-test value at the significance level of 0.05.

• We also used two performance measures ERT (Expected Running Time) (Mersmann et al., 2015; Auger and Hansen, 2005) and ECDF (Empirical Cumulative Distribution Function) (Hansen et al., 2010; Pál et al., 2012) in the comparisons, where ERT estimates the expected running time by the number of function evaluations required to reach a certain accuracy level $ε>0$. To reach the same accuracy level, the lower the running time, the better the performance. ECDF is used to show the empirical cumulative probability of problems which succeeded to reach the required accuracy within the budget (ERT).

We set the maximum number of function evaluations MaxFEs to $3.0e+06$ (this is also used by the compared algorithms), and the population size $N=50$.

### 3.2  The Simulation Results on 1000-D Problems

In Table 1, CCF is compared with five well-performing algorithms, including DECC-G (Yang et al., 2008a), CCVIL (Chen et al., 2010), DECC-DG (Omidvar et al., 2014), DECC-DML (Omidvar, Li, and Yao, 2010), and MA-SW-Chains (Molina et al., 2010), using the CEC'2010 benchmark suite, where MA-SW-Chains is the winner of the CEC'2010 LSGO competition. DECC-G is widely-used as a baseline in LSGO algorithm comparisons (Chen et al., 2010; Omidvar et al., 2014), and CCVIL, DECC-DG, and DECC-DML are among the most representative LSGO algorithms evaluated on the CEC'2010 LSGO benchmarks. The results of these compared algorithms have been given in the corresponding references, and we use their results directly for our comparisons.

Table 1:
Comparison of CCF with other algorithms on the CEC'2010 LSGO benchmark suite ($D=1000$).
 p-value MA-SW- (CCF/MA- P CCF Chains DECC-DG CCVIL DECC-G DECC-DML SW-Chains) f1 mean 0.00e+00 2.10e-14 5.47e+03 1.55e-17 2.93e-07 1.93e-25 NaN std 0.00e+00 1.99e-14 2.02e+04 7.75e-17 8.62e-08 1.86e-25 f2 mean 4.85e+01 8.10e+02 4.39e+03 6.71e-09 1.31e+03 2.17e+02 3.30e-43 std 3.12e+01 5.88e+01 1.97e+02 2.31e-08 3.24e+01 2.98e+01 f3 mean 4.98e-12 7.28e-13 1.67e+01 7.52e-11 1.39e+00 1.18e-13 1.51e-32 std 6.29e-13 3.40e-13 3.34e-01 6.58e-11 9.59e-02 8.22e-15 f4 mean 4.84e+10 3.53e+11 4.79e+12 9.62e+12 5.00e+12 3.58e+12 8.63e-29 std 2.28e+10 3.12e+10 1.44e+12 3.43e+12 3.38e+12 1.54e+12 f5 mean 8.08e+07 1.68e+08 1.55e+08 1.76e+08 2.63e+08 2.98e+08 5.65e-19 std 1.71e+07 1.04e+08 2.17e+07 6.47e+07 8.44e+07 9.31e+07 f6 mean 3.42e+06 8.14e+04 1.64e+01 2.94e+05 4.96e+06 7.93e+05 2.31e-01 std 1.36e+07 2.84e+05 2.71e-01 6.09e+05 8.02e+05 3.97e+06 f7 mean 2.03e-10 1.03e+02 1.16e+04 8.00e+08 1.63e+08 1.39e+08 0.00e+00 std 2.27e-12 8.70e+01 7.41e+03 2.48e+09 1.38e+08 7.72e+07 f8 mean 1.28e+06 1.41e+07 3.04e+07 6.50e+07 6.44e+07 3.46e+07 9.69e-22 std 1.90e+06 3.68e+07 2.11e+07 3.07e+07 2.89e+07 3.56e+07 f9 mean 7.65e+06 1.41e+07 5.96e+07 6.66e+07 3.21e+08 5.92e+07 9.43e-22 std 9.55e+05 1.15e+06 8.18e+06 1.60e+07 3.39e+07 4.71e+06 f10 mean 1.31e+04 2.07e+03 4.52e+03 1.28e+03 1.06e+04 1.25e+04 1.43e-37 std 3.47e+02 1.44e+02 1.41e+02 7.95e+01 2.93e+02 2.66e+02 f11 mean 2.56e+01 3.80e+01 1.03e+01 3.48e+00 2.34e+01 1.80e-13 1.30e-18 std 2.50e+00 7.35e+00 1.01e+00 1.91e+00 1.79e+00 9.88e-15 f12 mean 7.20e-03 3.62e-06 2.52e+03 8.95e+03 8.93e+04 3.79e+06 1.60e-01 std 2.49e-02 5.92e-07 4.86e+02 5.39e+03 6.90e+03 1.50e+05 f13 mean 5.58e+01 1.25e+03 4.54e+06 5.72e+02 5.12e+03 1.14e+03 4.35e-36 std 4.43e+01 5.72e+02 2.13e+06 2.55e+02 3.95e+03 4.31e+02 f14 mean 5.54e+07 3.11e+07 3.41e+08 1.74e+08 8.08e+08 1.89e+08 7.87e-19 std 4.79e+06 1.93e+06 2.41e+07 2.68e+07 6.06e+07 1.49e+07 f15 mean 4.32e+03 2.74e+03 5.88e+03 2.65e+03 1.22e+04 1.54e+04 7.50e-28 std 1.28e+02 1.22e+02 1.03e+02 9.34e+01 9.10e+02 3.59e+02 f16 mean 1.92e+01 9.98e+01 7.39e-13 7.18e+00 7.66e+01 5.08e-02 7.36e-36 std 3.05e+00 1.40e+01 5.70e-14 2.23e+00 8.14e+00 2.54e-01 f17 mean 7.18e+02 1.24e+00 4.01e+04 2.13e+04 2.87e+05 6.54e+06 4.95e-12 std 2.86e+02 1.25e-01 2.85e+03 9.16e+03 1.97e+04 4.63e+05 f18 mean 1.29e+03 1.30e+03 1.11e+10 1.33e+04 2.46e+04 2.47e+03 6.74e-01 std 1.31e+02 4.36e+02 2.04e+09 1.00e+04 1.05e+04 1.18e+03 f19 mean 9.22e+05 2.85e+05 1.74e+06 3.52e+05 1.11e+06 1.59e+07 8.14e-31 std 3.91e+04 1.78e+04 9.54e+04 2.04e+04 5.00e+04 1.72e+06 f20 mean 2.49e+03 1.07e+03 4.87e+07 1.11e+03 4.06e+03 9.91e+02 2.98e-21 std 2.21e+02 7.29e+01 2.27e+07 3.04e+02 3.66e+02 3.51e+01
 p-value MA-SW- (CCF/MA- P CCF Chains DECC-DG CCVIL DECC-G DECC-DML SW-Chains) f1 mean 0.00e+00 2.10e-14 5.47e+03 1.55e-17 2.93e-07 1.93e-25 NaN std 0.00e+00 1.99e-14 2.02e+04 7.75e-17 8.62e-08 1.86e-25 f2 mean 4.85e+01 8.10e+02 4.39e+03 6.71e-09 1.31e+03 2.17e+02 3.30e-43 std 3.12e+01 5.88e+01 1.97e+02 2.31e-08 3.24e+01 2.98e+01 f3 mean 4.98e-12 7.28e-13 1.67e+01 7.52e-11 1.39e+00 1.18e-13 1.51e-32 std 6.29e-13 3.40e-13 3.34e-01 6.58e-11 9.59e-02 8.22e-15 f4 mean 4.84e+10 3.53e+11 4.79e+12 9.62e+12 5.00e+12 3.58e+12 8.63e-29 std 2.28e+10 3.12e+10 1.44e+12 3.43e+12 3.38e+12 1.54e+12 f5 mean 8.08e+07 1.68e+08 1.55e+08 1.76e+08 2.63e+08 2.98e+08 5.65e-19 std 1.71e+07 1.04e+08 2.17e+07 6.47e+07 8.44e+07 9.31e+07 f6 mean 3.42e+06 8.14e+04 1.64e+01 2.94e+05 4.96e+06 7.93e+05 2.31e-01 std 1.36e+07 2.84e+05 2.71e-01 6.09e+05 8.02e+05 3.97e+06 f7 mean 2.03e-10 1.03e+02 1.16e+04 8.00e+08 1.63e+08 1.39e+08 0.00e+00 std 2.27e-12 8.70e+01 7.41e+03 2.48e+09 1.38e+08 7.72e+07 f8 mean 1.28e+06 1.41e+07 3.04e+07 6.50e+07 6.44e+07 3.46e+07 9.69e-22 std 1.90e+06 3.68e+07 2.11e+07 3.07e+07 2.89e+07 3.56e+07 f9 mean 7.65e+06 1.41e+07 5.96e+07 6.66e+07 3.21e+08 5.92e+07 9.43e-22 std 9.55e+05 1.15e+06 8.18e+06 1.60e+07 3.39e+07 4.71e+06 f10 mean 1.31e+04 2.07e+03 4.52e+03 1.28e+03 1.06e+04 1.25e+04 1.43e-37 std 3.47e+02 1.44e+02 1.41e+02 7.95e+01 2.93e+02 2.66e+02 f11 mean 2.56e+01 3.80e+01 1.03e+01 3.48e+00 2.34e+01 1.80e-13 1.30e-18 std 2.50e+00 7.35e+00 1.01e+00 1.91e+00 1.79e+00 9.88e-15 f12 mean 7.20e-03 3.62e-06 2.52e+03 8.95e+03 8.93e+04 3.79e+06 1.60e-01 std 2.49e-02 5.92e-07 4.86e+02 5.39e+03 6.90e+03 1.50e+05 f13 mean 5.58e+01 1.25e+03 4.54e+06 5.72e+02 5.12e+03 1.14e+03 4.35e-36 std 4.43e+01 5.72e+02 2.13e+06 2.55e+02 3.95e+03 4.31e+02 f14 mean 5.54e+07 3.11e+07 3.41e+08 1.74e+08 8.08e+08 1.89e+08 7.87e-19 std 4.79e+06 1.93e+06 2.41e+07 2.68e+07 6.06e+07 1.49e+07 f15 mean 4.32e+03 2.74e+03 5.88e+03 2.65e+03 1.22e+04 1.54e+04 7.50e-28 std 1.28e+02 1.22e+02 1.03e+02 9.34e+01 9.10e+02 3.59e+02 f16 mean 1.92e+01 9.98e+01 7.39e-13 7.18e+00 7.66e+01 5.08e-02 7.36e-36 std 3.05e+00 1.40e+01 5.70e-14 2.23e+00 8.14e+00 2.54e-01 f17 mean 7.18e+02 1.24e+00 4.01e+04 2.13e+04 2.87e+05 6.54e+06 4.95e-12 std 2.86e+02 1.25e-01 2.85e+03 9.16e+03 1.97e+04 4.63e+05 f18 mean 1.29e+03 1.30e+03 1.11e+10 1.33e+04 2.46e+04 2.47e+03 6.74e-01 std 1.31e+02 4.36e+02 2.04e+09 1.00e+04 1.05e+04 1.18e+03 f19 mean 9.22e+05 2.85e+05 1.74e+06 3.52e+05 1.11e+06 1.59e+07 8.14e-31 std 3.91e+04 1.78e+04 9.54e+04 2.04e+04 5.00e+04 1.72e+06 f20 mean 2.49e+03 1.07e+03 4.87e+07 1.11e+03 4.06e+03 9.91e+02 2.98e-21 std 2.21e+02 7.29e+01 2.27e+07 3.04e+02 3.66e+02 3.51e+01

From Table 1, we can see that CCF performed the best on eight functions in the CEC'2010 LSGO benchmarks among all algorithms, while MA-SW-Chains performed the best on only five functions, DECC-DG and CCVIL on only two functions respectively, and DECC-DML on only three functions. DECC-G performed the worst. Compared with MA-SW-Chains, CCF wins in 11 out of the 20 test functions, in 16 out of the 20 test functions when compared with DECC-DG, in 12 out of 20 problems when compared with CCVIL, in 18 out of the 20 problems when compared with DECC-G, and in 14 out of the 20 problems when compared with DECC-DML. These results suggest that CCF is more effective than the compared ones.

On fully separable functions $f1$-$f3$, CCF performed the best on $f1$, whereas CCVIL performed the best on $f2$, and DECC-DML on $f3$. For $f3$, CCF is just slightly worse than DECC-DML. For partially-separable functions $f4$-$f18$, CCF performed better than MA-SW-Chains on 8 functions (but worse on 6 functions), than DECC-DG on 11 functions, better than CCVIL on 10 functions, than DECC-G on 12 functions, and than DECC-DML on 10 functions. These results indicate that CCF is more effective on the partially-separable test functions. For nonseparable function $f19$, CCF performs better than three algorithms DECC-DG, DECC-G, and DECC-DML, but slightly worse than CCVIL and MA-SW-Chains. For fully nonseparable test function $f20$, CCF performs better than two algorithms DECC-DG and DECC-G, but worse than algorithms DECC-DML, CCVIL, and MA-SW-Chains. This is because the proposed FBG has little or no use to these two functions (as they are nonseparable), and only the SaNSDE adopted has an impact on the performance of CCF. It is also possible that the SaNSDE adopted in CCF is less effective than those adopted in these compared approaches.

MA-SW-Chains is the top performer in the CEC'2010 LSGO competition, and performed better than DECC-DG, DECC-G, DECC-DML, and CCVIL. Table 1 also shows how CCF is compared with MA-SW-Chains via two-tailed $t$-test $p$-values. It can be seen that the results of ten test functions obtained by CCF are better than those obtained by MA-SW-Chains, with seven worse than MA-SW-Chains and three ties. These results in Table 1 suggest that CCF is more effective than the compared algorithms.

In more recent years, CEC'2013 LSGO benchmark suite for large-scale global optimization problems was also used for evaluating LSGO algorithms, since this benchmark suite was further extended with a few more challenging LSGO properties (Omidvar et al., 2015). Several LSGO algorithms, including SACC (Wei et al., 2013), MOS (LaTorre et al., 2013), and DECC-G (Yang et al., 2008a), were evaluated using this test suite. We have conducted our experiments of CCF on this benchmark suite, and compared CCF with these three algorithms (SACC, MOS, and DECC-G). The key differences between SACC and CCF are as follows: 1) SACC uses the random grouping strategy while CCF uses the proposed FBG; 2) SACC uses an auxiliary function to enhance its performance while CCF uses the new local search based on the Quasi Newton method. The algorithm MOS (LaTorre et al., 2013) is the top-performing algorithm in the CEC'2013 LSGO competition, and DECC-G is commonly-used as a baseline in several studies (LaTorre et al., 2013; Omidvar et al., 2014). Some comprehensive empirical studies on existing LSGO algorithms using the CEC'2013 benchmark suite have been provided in LaTorre et al. (2015). The results for the CEC'2013 benchmark suite obtained by SACC are provided in Wei et al. (2013). Table 2 shows a direct comparison of these results with those obtained by CCF. Furthermore, to test the effectiveness of the proposed grouping scheme FBG, we replace FBG by a random grouping strategy (Yang et al., 2008b) in CCF and the resulting algorithm is denoted as CCR (with the group size being set to 100). Because it is obvious that MOS outperforms CCR, SACC, and DECC-G, to further validate the performance of CCF, we also compare MOS with CCF via two-tailed $t$-tests (significance level set to 0.05). The results are also presented in Table 2.

Table 2:
Comparison between CCF and other algorithms on the CEC'2013 LSGO benchmark suite ($D=1000$).
 P CCF CCR SACC MOS DECC-G p-value (CCF/MOS) f1 Best 0.00e+00 0.00e+00 0.00e+00 0.00e+00 1.75e-13 NaN Worst 0.00e+00 1.81e-03 6.81e-23 0.00e+00 2.45e-13 Mean 0.00e+00 2.58e-04 2.73e-24 0.00e+00 2.03e-13 Std 0.00e+00 6.84e-04 1.36e-23 0.00e+00 1.78e-14 f2 Best 1.79e+01 1.91e+02 2.88e+02 7.40e+02 9.90e+02 7.02e-31 Worst 7.61e+01 4.11e+03 2.72e+03 9.28e+02 1.07e+03 Mean 3.96e+01 2.11e+03 7.06e+02 8.32e+02 1.03e+03 Std 2.20e+01 1.73e+03 4.72e+02 4.48e+01 2.26e+01 f3 Best 3.97e-13 2.06e-13 9.24e-14 8.20e-13 2.63e-10 5.59e-20 Worst 4.58e-13 3.32e+00 3.76e+00 1.00e-12 3.16e-10 Mean 4.32e-13 3.76e-01 1.11e+00 9.17e-13 2.87e-10 Std 2.18e-14 9.96e-01 1.11e+00 5.12e-14 1.38e-11 f4 Best 2.20e+07 7.02e+09 8.48e+09 1.10e+08 7.58e+09 8.17e-07 Worst 1.14e+09 1.35e+11 1.71e+11 5.22e+08 6.99e+10 Mean 5.83e+08 6.09e+10 4.56e+10 1.74e+08 2.60e+10 Std 3.35e+08 5.46e+10 3.60e+10 7.87e+08 1.47e+10 f5 Best 2.60e+06 1.46e+06 3.36e+06 5.25e+06 7.28e+14 2.77e-18 Worst 4.67e+06 1.37e+07 1.40e+07 8.56e+06 7.28e+14 Mean 3.35e+06 8.31e+06 7.74e+06 6.94e+06 7.28e+14 Std 7.30e+05 4.90e+06 3.22e+06 8.85e+05 1.51e+05 f6 Best 1.17e+05 1.12e+05 1.57e+05 1.95e+01 6.96e-08 4.38e-01 Worst 1.52e+05 1.82e+05 6.00e+05 2.31e+05 1.10e+05 Mean 1.30e+05 1.46e+05 2.47e+05 1.48e+05 4.85e+04 Std 1.25e+04 2.56e+04 1.02e+05 6.43e+04 3.98e+04 f7 Best 3.84e+02 2.44e+07 1.72e+06 3.49e+03 1.96e+08 3.43e-10 Worst 1.09e+03 1.02e+09 1.18e+09 3.73e+04 1.78e+09 Mean 3.08e+03 4.65e+08 8.98e+07 1.62e+04 6.07e+08 Std 3.62e+03 4.22e+08 2.48e+08 9.10e+03 4.09e+08 f8 Best 5.53e+14 5.33e+13 1.47e+14 3.26e+12 1.43e+14 4.42e-09 Worst 2.35e+15 5.62e+15 3.08e+15 1.32e+13 7.75e+14 Mean 1.40e+15 2.14e+15 1.20e+15 8.00e+12 4.26e+14 Std 6.98e+14 1.77e+15 7.63e+14 3.07e+12 1.53e+14 f9 Best 2.52e+08 2.54e+08 2.29e+08 2.63e+08 2.20e+08 2.05e-12 Worst 3.81e+08 4.89e+08 1.01e+09 5.42e+08 6.55e+08 Mean 3.45e+08 3.75e+08 5.98e+08 3.83e+08 4.27e+08 Std 4.70e+07 7.97e+07 2.03e+08 6.29e+07 9.89e+07 f10 Best 1.34e+02 5.92e+06 1.38e+07 5.92e+02 9.29e+04 1.25e-11 Worst 2.37e+02 1.59e+07 7.75e+07 1.23e+06 1.73e+07 Mean 1.91e+02 1.02e+07 2.95e+07 9.02e+05 1.10e+07 Std 4.20e+01 3.16e+06 1.93e+07 5.07e+05 4.00e+06 f11 Best 8.36e+07 3.35e+08 8.12e+07 2.06e+07 4.68e+10 1.27e-06 Worst 1.55e+08 2.93e+11 2.30e+10 9.50e+07 7.16e+11 Mean 1.08e+08 1.01e+11 2.78e+09 5.22e+07 2.46e+11 Std 2.79e+07 1.28e+11 5.90e+09 2.05e+07 2.03e+11 f12 Best 2.20e+03 2.48e+03 2.43e+02 2.22e-01 9.80e+02 7.05e-36 Worst 3.59e+03 3.00e+03 1.72e+03 1.17e+03 1.20e+03 Mean 2.59e+03 2.71e+03 8.73e+02 2.47e+02 1.04e+03 Std 4.65e+02 1.81e+02 3.71e+02 2.54e+02 5.76e+01 f13 Best 3.54e+08 3.86e+09 6.72e+08 1.52e+06 2.09e+10 2.67e-17 Worst 3.41e+09 7.45e+09 3.40e+09 6.16e+06 4.64e+10 Mean 1.16e+09 5.21e+09 1.78e+09 3.40e+06 3.42e+10 Std 1.04e+09 1.30e+09 8.05e+08 1.06e+06 6.41e+09 f14 Best 7.22e+09 3.91e+08 8.21e+07 1.54e+07 1.91e+11 7.27e-03 Worst 3.69e+10 2.52e+11 1.10e+11 4.46e+07 1.04e+12 Mean 1.96e+10 4.75e+10 1.75e+10 2.56e+07 6.08e+11 Std 9.77e+09 9.21e+10 2.87e+10 7.94e+06 2.06e+11 f15 Best 2.91e+06 3.91e+08 1.26e+06 2.03e+06 4.63e+07 4.41e-18 Worst 3.87e+06 7.56e+06 4.90e+06 2.88e+06 7.15e+07 Mean 3.39e+06 5.32e+06 2.01e+06 2.35e+06 6.05e+07 Std 3.48e+05 2.06e+06 7.23e+05 1.94e+05 6.45e+06
 P CCF CCR SACC MOS DECC-G p-value (CCF/MOS) f1 Best 0.00e+00 0.00e+00 0.00e+00 0.00e+00 1.75e-13 NaN Worst 0.00e+00 1.81e-03 6.81e-23 0.00e+00 2.45e-13 Mean 0.00e+00 2.58e-04 2.73e-24 0.00e+00 2.03e-13 Std 0.00e+00 6.84e-04 1.36e-23 0.00e+00 1.78e-14 f2 Best 1.79e+01 1.91e+02 2.88e+02 7.40e+02 9.90e+02 7.02e-31 Worst 7.61e+01 4.11e+03 2.72e+03 9.28e+02 1.07e+03 Mean 3.96e+01 2.11e+03 7.06e+02 8.32e+02 1.03e+03 Std 2.20e+01 1.73e+03 4.72e+02 4.48e+01 2.26e+01 f3 Best 3.97e-13 2.06e-13 9.24e-14 8.20e-13 2.63e-10 5.59e-20 Worst 4.58e-13 3.32e+00 3.76e+00 1.00e-12 3.16e-10 Mean 4.32e-13 3.76e-01 1.11e+00 9.17e-13 2.87e-10 Std 2.18e-14 9.96e-01 1.11e+00 5.12e-14 1.38e-11 f4 Best 2.20e+07 7.02e+09 8.48e+09 1.10e+08 7.58e+09 8.17e-07 Worst 1.14e+09 1.35e+11 1.71e+11 5.22e+08 6.99e+10 Mean 5.83e+08 6.09e+10 4.56e+10 1.74e+08 2.60e+10 Std 3.35e+08 5.46e+10 3.60e+10 7.87e+08 1.47e+10 f5 Best 2.60e+06 1.46e+06 3.36e+06 5.25e+06 7.28e+14 2.77e-18 Worst 4.67e+06 1.37e+07 1.40e+07 8.56e+06 7.28e+14 Mean 3.35e+06 8.31e+06 7.74e+06 6.94e+06 7.28e+14 Std 7.30e+05 4.90e+06 3.22e+06 8.85e+05 1.51e+05 f6 Best 1.17e+05 1.12e+05 1.57e+05 1.95e+01 6.96e-08 4.38e-01 Worst 1.52e+05 1.82e+05 6.00e+05 2.31e+05 1.10e+05 Mean 1.30e+05 1.46e+05 2.47e+05 1.48e+05 4.85e+04 Std 1.25e+04 2.56e+04 1.02e+05 6.43e+04 3.98e+04 f7 Best 3.84e+02 2.44e+07 1.72e+06 3.49e+03 1.96e+08 3.43e-10 Worst 1.09e+03 1.02e+09 1.18e+09 3.73e+04 1.78e+09 Mean 3.08e+03 4.65e+08 8.98e+07 1.62e+04 6.07e+08 Std 3.62e+03 4.22e+08 2.48e+08 9.10e+03 4.09e+08 f8 Best 5.53e+14 5.33e+13 1.47e+14 3.26e+12 1.43e+14 4.42e-09 Worst 2.35e+15 5.62e+15 3.08e+15 1.32e+13 7.75e+14 Mean 1.40e+15 2.14e+15 1.20e+15 8.00e+12 4.26e+14 Std 6.98e+14 1.77e+15 7.63e+14 3.07e+12 1.53e+14 f9 Best 2.52e+08 2.54e+08 2.29e+08 2.63e+08 2.20e+08 2.05e-12 Worst 3.81e+08 4.89e+08 1.01e+09 5.42e+08 6.55e+08 Mean 3.45e+08 3.75e+08 5.98e+08 3.83e+08 4.27e+08 Std 4.70e+07 7.97e+07 2.03e+08 6.29e+07 9.89e+07 f10 Best 1.34e+02 5.92e+06 1.38e+07 5.92e+02 9.29e+04 1.25e-11 Worst 2.37e+02 1.59e+07 7.75e+07 1.23e+06 1.73e+07 Mean 1.91e+02 1.02e+07 2.95e+07 9.02e+05 1.10e+07 Std 4.20e+01 3.16e+06 1.93e+07 5.07e+05 4.00e+06 f11 Best 8.36e+07 3.35e+08 8.12e+07 2.06e+07 4.68e+10 1.27e-06 Worst 1.55e+08 2.93e+11 2.30e+10 9.50e+07 7.16e+11 Mean 1.08e+08 1.01e+11 2.78e+09 5.22e+07 2.46e+11 Std 2.79e+07 1.28e+11 5.90e+09 2.05e+07 2.03e+11 f12 Best 2.20e+03 2.48e+03 2.43e+02 2.22e-01 9.80e+02 7.05e-36 Worst 3.59e+03 3.00e+03 1.72e+03 1.17e+03 1.20e+03 Mean 2.59e+03 2.71e+03 8.73e+02 2.47e+02 1.04e+03 Std 4.65e+02 1.81e+02 3.71e+02 2.54e+02 5.76e+01 f13 Best 3.54e+08 3.86e+09 6.72e+08 1.52e+06 2.09e+10 2.67e-17 Worst 3.41e+09 7.45e+09 3.40e+09 6.16e+06 4.64e+10 Mean 1.16e+09 5.21e+09 1.78e+09 3.40e+06 3.42e+10 Std 1.04e+09 1.30e+09 8.05e+08 1.06e+06 6.41e+09 f14 Best 7.22e+09 3.91e+08 8.21e+07 1.54e+07 1.91e+11 7.27e-03 Worst 3.69e+10 2.52e+11 1.10e+11 4.46e+07 1.04e+12 Mean 1.96e+10 4.75e+10 1.75e+10 2.56e+07 6.08e+11 Std 9.77e+09 9.21e+10 2.87e+10 7.94e+06 2.06e+11 f15 Best 2.91e+06 3.91e+08 1.26e+06 2.03e+06 4.63e+07 4.41e-18 Worst 3.87e+06 7.56e+06 4.90e+06 2.88e+06 7.15e+07 Mean 3.39e+06 5.32e+06 2.01e+06 2.35e+06 6.05e+07 Std 3.48e+05 2.06e+06 7.23e+05 1.94e+05 6.45e+06

It can be seen from Table 2 that, both CCF and MOS perform best on 7 problems among a total of 15 test functions, while other compared algorithms perform best on at most 1 or 2 test functions. This indicates that CCF and MOS perform better than other compared algorithms.

For pairwise comparisons, CCF performs better than MOS on 7 test functions, worse than MOS on 7 test functions, and the same as MOS on one test function, but for fully separable functions $f1$ to $f3$, CCF performs better than MOS on two functions, $f2$ and $f3$, and performs the same as MOS on one function $f1$. Note that MOS employs strong local search, which explains why CCF manages to have similar performances only to MOS here. In short, the results indicate that FBG is effective. On 13 test functions the results obtained by CCF are better than those by DECC-G. Almost all the results obtained by CCF are better than those obtained by SACC and CCR. These results indicate that CCF performed better than SACC, CCR, and DECC-G.

From the results of the two-tailed $t$-test, it can be seen that the results between CCF and MOS are nearly identical on 2 functions $f1$ and $f6$; and the results of CCF are clearly better than those of MOS on 6 functions. However, for nonseparable functions, the results of CCF are worse than those of MOS. These also indicate that the proposed FBG in CCF is more effective for fully separable functions and partially additively separable functions, and SaNSDE adopted in CCF performs worse than MOS for nonseparable problems.

To further show the effectiveness of CCF, we used performance index ERT to compare CCF with MA-SW-Chains (the top algorithm in the CEC'2010 LSGO competition) and MOS (the top algorithm in the CEC2013 LSGO competition) on two representative functions (fully separable functions to which FBG can be applicable), respectively. We choose $f2$ and $f3$ of the CEC'2010 benchmarks when CCF is compared with MA-SW-Chains and $f2$ and $f3$ of the CEC'2013 benchmarks when CCF is compared with MOS. Since MA-SW-Chains has its results available only in mean values (Molina et al., 2010) on the CEC'2010 benchmarks after 3E+6 function evaluations, we assume that 3E+6 is the lowest number of function evaluations for it to reach that precision. Figures 1 and 2 can show only the ranks of the mean values of CCF and MA-SW-Chains on the target precisions. While MOS gave the best, mean, and worst results on the CEC'2013 benchmarks after 3E+6 function evaluations, we assume that 3E+6 is also the lowest number of function evaluations for it to reach the precision. Figures 3 and 4 show the consensus ranks of the best, mean, and worst values (sum of the ranks with respect to the best, mean, and worst values, respectively) of CCF and MOS on the target precisions. From Figures 1 and 2 we can see that CCF always ranks better than MA-SW-Chains on $f2$ and $f3$ of the CEC'2010 benchmarks for different precisions. From Figures 3 and 4 we can see that CCF always ranks better than MOS on $f2$ and $f3$ of the CEC'2013 benchmarks for different precisions.

Figure 1:

Ranks of CCF and MA-SW-Chains on f2 of the CEC'2010 benchmarks.

Figure 1:

Ranks of CCF and MA-SW-Chains on f2 of the CEC'2010 benchmarks.

Figure 2:

Ranks of CCF and MA-SW-Chains on f3 of the CEC'2010 benchmarks.

Figure 2:

Ranks of CCF and MA-SW-Chains on f3 of the CEC'2010 benchmarks.

Figure 3:

Ranks of CCF and MOS on f2 of the CEC'2013 benchmarks.

Figure 3:

Ranks of CCF and MOS on f2 of the CEC'2013 benchmarks.

Figure 4:

Ranks of CCF and MOS on f3 of the CEC'2013 benchmarks.

Figure 4:

Ranks of CCF and MOS on f3 of the CEC'2013 benchmarks.

Figures 5, 6, and 7 show the empirical cumulative distributions (ECDF) of CCF on three function categories of the CEC'2013 benchmark suite, where the $x$-axis in the figures represents the budget (function evaluations) of each dimension in log form, and the $y$-axis represents the proportion of test functions by which CCF succeeds in reaching the required accuracy within the budget. The vertical black lines indicate the maximum number of function evaluations, and each curve of the other colors corresponds to one accuracy level. These three categories are: fully separable functions f1–f3, additively separable functions f4–f11 and fully nonseparable functions f12–f15. For all functions in one category, the same accuracy level $ε=f-fopt$ is used, but the accuracy level is different from one function category to another. For example, Figure 5 shows the ECDF for the first function category of the fully separable functions f1–f3 when the accuracy levels $ε={e+1,e-1,e-5,e-8}$ are reached respectively. For the second and third function categories (i.e., the additively separable functions f4–f11 and the fully nonseparable functions f12–f15), considering numbers like 5E+2 are big, we can't only write the accuracy level like E+2 because 1E+2 and 5E+2 have a big difference. In this case, we have to give more accurate levels, for example, when $f-fopt<5E+2$, we have to say the accuracy level $ε=5E+2$ is reached instead of $ε=E+2$ being reached. Figure 6 shows the ECDF for the second function group of the additively separable functions f4–f11 with the accuracy levels $ε={1.5e15,5e8,5e6,5e2}$ respectively. Figure 7 shows the ECDF for the third function group of the fully nonseparable functions f2–f15 with the accuracy levels $ε={2e10,1.2e9,3.5e5,2.6e3}$ respectively.

Figure 5:

ECDF for fully separable functions f1–f3 of the CEC'2013 benchmarks.

Figure 5:

ECDF for fully separable functions f1–f3 of the CEC'2013 benchmarks.

Figures 5, 6, and 7 provide an overall view of the accuracy levels that can be reached within a certain function group. We can see that for the fully separable function group CCF can reach a higher level of accuracy than for the other two function groups. With the increase of the number of function evaluations, the proportion of functions reaching a given accuracy level will increase. For a given accuracy level, with the increase of the proportion of functions reaching this accuracy level, the number of function evaluations will increase. For a fixed proportion of functions, with the increase of the accuracy levels, the number of function evaluations will increase. From Figure 5 we can see that CCF has a higher proportion of functions to reach all the given accuracy levels within the given budget (1E+6 function evaluations) for f1–f3, but from Figures 6 and 7 we can see that CCF can only make nearly $20%$ function evaluations to reach the accuracy level 5E+2 for f4–f11 and nearly $25%$ function evaluations to reach the accuracy level 2.6E+3 for f12–f15 within the given budget. These results indicate that setting the maximum number of function evaluations to 1E+6 for f1–f3 may be proper, but it is not enough for f4–f11 and f12–f15, especially for f12–f15 (only about $25%$ function evaluations can reach a very bad accuracy level 2.6E+3). These results also indicate that f4–f11 and f12–f15 are very difficult problems, and to get a better solution, the number of function evaluations needed should be increased.

Figure 6:

ECDF for additively separable functions f4–f11 of the CEC'2013 benchmarks.

Figure 6:

ECDF for additively separable functions f4–f11 of the CEC'2013 benchmarks.

Figure 7:

ECDF for nonseparable functions f12–f15 of the CEC'2013 benchmarks.

Figure 7:

ECDF for nonseparable functions f12–f15 of the CEC'2013 benchmarks.

Note that we can't carry out the comparison between MA-SW-Chains and MOS via ECDF plots because the data needed for at least one of the two algorithms is not available.

In order to show the efficiency and effectiveness of the proposed variable grouping strategy intuitively, we use the semilog line diagram to plot the convergence curve of the fitness value in Figures 8 to 10, where the horizontal axis is the number of function evaluations and the vertical axis is logarithmic scale of the mean fitness value over 25 independent runs. Because the experimental data of SACC and CCR is available and the variable grouping strategies of SACC and CCR are different from that of CCF, we select these two algorithms to compare with CCF. We select three representative functions with different characteristics: $f2$, $f7$, and $f13$ from the CEC'2013 benchmark functions, where $f2$ is a fully separable function, $f7$ is a partially additively separable function, and $f13$ is a nonseparable function. Figures 8 to 10 show the convergence curves of the fitness values for $f2$, $f7$, and $f13$, respectively, where the thick line represents the convergence curve of CCF, the thin line represents the convergence curve of SACC, and the dotted line represents the convergence curve of CCR.

Figure 8:

Convergence curves on f2.

Figure 8:

Convergence curves on f2.

Figure 9:

Convergence curves on f7.

Figure 9:

Convergence curves on f7.

Figure 10:

Convergence curves on f13.

Figure 10:

Convergence curves on f13.

From Figures 8, 9, and 10, it can be seen that CCF converges much faster than the other two algorithms and can find better solutions more easily. This shows that the proposed grouping scheme FBG in CCF is effective.

### 3.3  The Simulation Results on 2000-D and 5000-D Problems

In this section, CCF is tested on the CEC'2008 LSGO benchmark suite (Tang et al., 2007) for 2000 and 5000 dimensions, and the results are compared with those obtained by CSO (Cheng and Jin, 2015a). CSO is currently the only algorithm for which authors conducted the experiments on the CEC'2008 benchmark suite with 2000 and 5000 dimensions.

We conducted the experiments on f1–f6 of the CEC'2008 benchmarks and compared the results with those of CSO. Here f1, f4, and f6 are fully separable functions, whereas f2, f3, and f5 are fully nonseparable functions. We executed 25 independent runs of CCF for each test problem on 2000 dimensions and 10 independent runs for each test problem on 5000 dimensions (the same as CSO). The maximum number of function evaluations of both CCF and CSO is set to 5000*D where D is the dimensionality.

The comparison results are listed in Table 3, where we can see that for separable functions f1, f4, and f6, CCF is better than CSO on both 2000 and 5000 dimensional problems. This is because CCF can correctly find the variable groups and the proposed local search method is more effective. For the fully nonseparable functions f2, f3, and f5, CCF performs worse than CSO. This is because the variables for these three functions cannot be divided into subgroups and all variables are in one group. For these problems, FBG doesn't offer any help and the performance of CCF completely depends on SaNSDE, while SaNSDE apparently performs worse than CSO, which results in CCF performing worse than CSO.

Table 3:
Comparison with CSO on 2000 and 5000 dimensions.
 2000D 5000D functions CCF CSO CCF CSO f1 0.00e+00(0.00e+00) 1.66e-20(3.36e-22) 0.00e+00(0.00e+00) 1.43e-19(3.33e-21) f2 1.70e+02(3.22e+00) 6.17e+01(1.31e+00) 1.81e+02(1.23e+00) 9.82e+01(9.78e-01) f3 4.12e+03(2.57e+03) 2.10e+03(5.14e+01) 9.01e+03(1.40e+03) 7.30e+03(1.26e+02) f4 2.35e+02(5.86e+01) 2.81e+03(3.69e+01) 6.06e+02(1.07e+02) 7.80e+03(8.73e+01) f5 1.63e-02(4.91e-02) 3.33e-16(0.00e+00) 4.70e-03(1.24e-02) 4.44e-16(0.00e+00) f6 6.66e-13(3.95e-14) 3.26e-12(5.43e-14) 1.65e-12(6.15e-14) 6.86e-12(5.51e-14)
 2000D 5000D functions CCF CSO CCF CSO f1 0.00e+00(0.00e+00) 1.66e-20(3.36e-22) 0.00e+00(0.00e+00) 1.43e-19(3.33e-21) f2 1.70e+02(3.22e+00) 6.17e+01(1.31e+00) 1.81e+02(1.23e+00) 9.82e+01(9.78e-01) f3 4.12e+03(2.57e+03) 2.10e+03(5.14e+01) 9.01e+03(1.40e+03) 7.30e+03(1.26e+02) f4 2.35e+02(5.86e+01) 2.81e+03(3.69e+01) 6.06e+02(1.07e+02) 7.80e+03(8.73e+01) f5 1.63e-02(4.91e-02) 3.33e-16(0.00e+00) 4.70e-03(1.24e-02) 4.44e-16(0.00e+00) f6 6.66e-13(3.95e-14) 3.26e-12(5.43e-14) 1.65e-12(6.15e-14) 6.86e-12(5.51e-14)

### 3.4  The Lennard-Jones Potential Problem

The Lennard-Jones Potential Problem (LJ problem) is a potential energy minimization problem whose potential energy is defined as follows:
$MinimizeE=4ε∑i=1N-1∑j=i+1Nσrij12-σrij6,$
(5)
where $rij$ is the distance between atoms $i$ and $j$, $N$ is the number of the atoms (cluster size), and $ε$ and $21/6σ$ are the pair equilibrium well depth and separation, respectively.

The LJ problem is a fully nonseparable problem. Since the proposed algorithm CCF is mainly designed for separable and partially-separable problems, if CCF is applied to the LJ problem, FBG should be able to detect the nonseparability of the variables of the LJ problem and classify all variables into one group, that is, it does not make any decomposition. The performance will be largely dependent on how well the problem is solved by SaNSDE, which is adopted as an optimizer for each subproblem.

The LJ potential problem is a very difficult multimodal optimization problem and the number of local minima grows exponentially with the number of the atoms $N$ (Hoare, 1979). The CEC'2011 EAs competition test suite (Das and Suganthan, 2010) has included the LJ potential problem with the number of atoms fixed to 10. However, even the top-performing algorithm GA-MPC (Elsayed et al., 2011b) could not locate the global minima with a 100% success rate. Table 4 compares CCF with four best performing EAs at the CEC'2011 competition, including GA-MPC (Elsayed et al., 2011b), DE-$Λcr$ (Reynoso-Meza et al., 2011), SAMODE (Elsayed et al., 2011a), and Adap.DE171 (Asafuddoula et al., 2011), on the LJ problem with 10 atoms. The stopping criterion is that the number of the function evaluations reaches $1.5e+5$.

It can be seen from Table 4 that the CCF performs better than SAMODE and Adap.DE171, and performs similarly to the champion algorithm GA-MPC.

Table 4:
Comparison of CCF with four best CEC2011 EAs competition algorithms on the LJ problem with 10 atoms in 25 independent runs.
 CCF GA-MPC DE-$Λcr$ SAMODE Adap.DE171 Best $-$2.84e+01 $-$2.84e+01 $-$2.84e+01 $-$2,84e+01 $-$2.84e+01 Median $-$2.75e+01 $-$2.74e+01 $-$2.75e+01 $-$2.73e+01 $-$2.72e+01 Worst $-$2.65e+01 $-$2.71e+01 $-$2.64e+01 $-$2,61e+01 $-$1.82e+01 Mean $-$2.73e+01 $-$2.77e+01 $-$2.77e+01 $-$2,71e+01 $-$2.68e+01 Std 6.78e-01 4.67e-01 4.90e-01 6.62e-01 1.97e+00
 CCF GA-MPC DE-$Λcr$ SAMODE Adap.DE171 Best $-$2.84e+01 $-$2.84e+01 $-$2.84e+01 $-$2,84e+01 $-$2.84e+01 Median $-$2.75e+01 $-$2.74e+01 $-$2.75e+01 $-$2.73e+01 $-$2.72e+01 Worst $-$2.65e+01 $-$2.71e+01 $-$2.64e+01 $-$2,61e+01 $-$1.82e+01 Mean $-$2.73e+01 $-$2.77e+01 $-$2.77e+01 $-$2,71e+01 $-$2.68e+01 Std 6.78e-01 4.67e-01 4.90e-01 6.62e-01 1.97e+00

Note that the best-performing algorithms currently available for solving this problem rely on computing the gradients based on the LJ problem formulation (Daven et al., 1996; Northby, 1987; Wales and Doye, 1997; Xiang et al., 2004; Cheng and Jin, 2015a,b). To make a fair comparison with the best-performing algorithms, we also used the gradients of the LJ formula in our local search method instead of using the current local search algorithm SaNSDE. In the following experiments for the LJ problem with more atoms, we name the search algorithm in the Northby algorithm (Northby, 1987). We use a revised Northby algorithm as the local search method under the CC framework. There are mainly three modifications to the Northby algorithm (Northby, 1987):

• The revised algorithm uses only the LJ potential energy as the objective function, while Northby algorithm (Northby, 1987) uses both the nearest neighbor potential and the LJ potential energy as the objective function.

• We reduced the lattice search from 250 times to 60 times in order to reduce the computational cost.

• Instead of optimizing all configurations obtained from the lattice search phase, we choose only some of the configurations for optimization.

Note that IC lattice means the sites forming a Mackay icosahedral at the center together with the sites which correspond to the next complete icosahedral shell. FC lattice means the sites of the outer shell are located on the faces at stacking fault locations (Northby, 1987). For more details about IC and FC lattices, please refer to Northby (1987).

We set parameters $m=60$ and $n=20$ for problems with fewer than 150 atoms, and set $m=120$ and $n=50$, otherwise. In the experiments, we compare CCF using the revised Northby algorithm as the local search method (namely CCF-RN) with the two best-performing algorithms in Northby (1987) and Romero et al. (1999) for LJ the problem. These two algorithms produced many best-known global minima and their results listed in Table 5 are still the best results to this date. We compare CCF-RN with these two algorithms in Table 5, where “$-$” means the values are not available.

Table 5:
Comparison of CCF-RN with algorithms in Northby (1987) and Romero et al. (1999).
 CCF-RN (Northby, 1987) (Romero et al., 1999) N Best Worst Mean Best Best 50 $-$244.550 $-$244.550 $-$244.550 $-$244.550 - 60 $-$305.876 $-$305.876 $-$305.876 $-$305.876 - 70 $-$366.892 $-$366.892 $-$366.892 $-$366.892 - 80 $-$427.829 $-$427.829 $-$427.829 $-$428.084 - 90 $-$492.434 $-$492.434 $-$492.434 $-$492.434 - 100 $-$557.040 $-$557.040 $-$557.040 $-$557.040 - 110 $-$621.788 $-$619.770 $-$620.551 $-$621.788 - 120 $-$687.022 $-$686.689 $-$686.937 $-$687.022 - 130 $-$755.271 $-$755.271 $-$755.271 $-$755.271 - 140 $-$826.175 $-$826.175 $-$826.175 $-$826.175 - 150 $-$893.310 $-$893.310 $-$893.310 $-$893.310 $-$893.310 160 $-$957.110 $-$957.106 $-$957.107 - $-$957.110 170 $-$1024.155 $-$1024.155 $-$1024.155 - $-$1024.791 180 $-$1092.209 $-$1092.209 $-$1092.209 - $-$1092.632 190 $-$1160.511 $-$1160.511 $-$1160.511 - $-$1161.301 200 $-$1229.081 $-$1288.850 $-$1228.887 - $-$1229.184 210 $-$1299.963 $-$1299.963 $-$1299.963 - $-$1300.006 220 $-$1368.262 $-$1367.976 $-$1368.205 - $-$1368.349 230 $-$1439.358 $-$1437.010 $-$1437.264 - $-$1439.358 240 $-$1508.284 $-$1507.980 $-$1508.101 - $-$1508.562
 CCF-RN (Northby, 1987) (Romero et al., 1999) N Best Worst Mean Best Best 50 $-$244.550 $-$244.550 $-$244.550 $-$244.550 - 60 $-$305.876 $-$305.876 $-$305.876 $-$305.876 - 70 $-$366.892 $-$366.892 $-$366.892 $-$366.892 - 80 $-$427.829 $-$427.829 $-$427.829 $-$428.084 - 90 $-$492.434 $-$492.434 $-$492.434 $-$492.434 - 100 $-$557.040 $-$557.040 $-$557.040 $-$557.040 - 110 $-$621.788 $-$619.770 $-$620.551 $-$621.788 - 120 $-$687.022 $-$686.689 $-$686.937 $-$687.022 - 130 $-$755.271 $-$755.271 $-$755.271 $-$755.271 - 140 $-$826.175 $-$826.175 $-$826.175 $-$826.175 - 150 $-$893.310 $-$893.310 $-$893.310 $-$893.310 $-$893.310 160 $-$957.110 $-$957.106 $-$957.107 - $-$957.110 170 $-$1024.155 $-$1024.155 $-$1024.155 - $-$1024.791 180 $-$1092.209 $-$1092.209 $-$1092.209 - $-$1092.632 190 $-$1160.511 $-$1160.511 $-$1160.511 - $-$1161.301 200 $-$1229.081 $-$1288.850 $-$1228.887 - $-$1229.184 210 $-$1299.963 $-$1299.963 $-$1299.963 - $-$1300.006 220 $-$1368.262 $-$1367.976 $-$1368.205 - $-$1368.349 230 $-$1439.358 $-$1437.010 $-$1437.264 - $-$1439.358 240 $-$1508.284 $-$1507.980 $-$1508.101 - $-$1508.562

Note that Table 5 lists only the best potential energy values for the deterministic algorithms such as in Northby (1987) and Romero et al. (1999). But for CCF, we provide the best, worst, and mean potential energy values obtained by CCF-RN over 15 independent runs.

From Table 5, we can see that the mean values obtained by CCF-RN for 10 cases out of 11 ($N=50$ to 150) are either identical or very close to those obtained by the Northby algorithm (Northby, 1987), As the number of atoms (cluster size) gets larger, more computational cost is needed and it is harder to locate the global minima. In order to get better results, the algorithm in Romero et al. (1999) uses not only the IC and FC icosahedral based structures but also decahedral and fcc (face-centered-cubic) geometric information. It can be seen from Table 5 that CCF-RN can locate the global minima in the case of cluster size $N=150$ with a 100% success rate. Furthermore, for cases of cluster size $N≥150$, all the mean values obtained by CCF-RN are very close to the best known global minima for all cases $N≥150$. Also, CCF-RN uses only the icosahedral based configurations, and the computational cost used by CCF-RN is less than that was used in Romero et al. (1999). All above results indicate that CCF-RN performs competitively and reliably.

From the above analysis to the performance of CCF on the LJ problem, we have the following observations: 1) FBG is effective, but CCF has some limitations. In fact, for fully nonseparable LSGO problems, especially for very hard LSGO problems such as the LJ problem, although the FBG can identify nonseparability of the variables correctly and then classify all variables into one group correctly, CCF will become ineffective when the local search strategy adopted is ineffective. This is because in this case the CCF has to rely on the local search strategy to optimize the whole problem (all variables are classified into one group and cannot be divided into several subgroups). Apparently, a typical LSGO problem usually cannot be effectively optimized by a local search algorithm itself such as SaNSDE. 2) In order to solve a very difficult LSGO problem such as the LJ problem, it is necessary for us to use as much information as possible for the problem (e.g., gradients) and the domain knowledge (e.g., lattice structure of the atoms for the LJ problem) when designing the search algorithm.

## 4  Conclusions

In this article, we have developed a new formula-based variable grouping method (FBG), which can be used in conjunction with the cooperative coevolutionary algorithm for solving large-scale global optimization problems. The key merits of employing FBG is that it makes use of the formula of an objective function to carry out the variable grouping task effectively. In contrast, existing variable grouping strategies do not make use of the information of the formulas of the objective functions, since they usually assume black-box optimization. Nevertheless, the formulas (expressions) of many LSGO problems are known or given in advance. Our proposed FBG variable grouping method provides a new approach for effective variable grouping.

Since FBG can leverage on the information contained within the formula of the objective function, it can achieve more accurate variable grouping than the existing methods. FBG can offer distinct advantages over other methods especially when the formulas are complex and nonintuitive. Moreover, the FBG parser provides an elegant and efficient way of automating the variable grouping process. Our experiments evaluating the FBG-based CC algorithm (i.e., CCF), on three benchmark suites (the CEC'2008, CEC'2010, and CEC'2013 benchmark suites) and a real-world problem (the LJ problem) have demonstrated that FBG can be used as an effective variable grouping method, which is an important step for decomposing any large-scale global optimization problem.

Our future work will include developing a sophisticated variable grouping strategy that can consider more scenarios than those described in this article. Furthermore, for problems with weak interdependency between subproblems, how we can come up with effective variable grouping that can be used to enhance optimization performance is still an unanswered question.

## Acknowledgments

This work was supported by the National Natural Science Foundation of China (No. 61472297 and No. U1404622).

## References

Ahn
,
C. W.
,
An
,
J.
, and
Yoo
,
J.-C.
(
2012
).
Estimation of particle swarm distribution algorithms: Combining the benefits of pso and edas
.
Information Sciences
,
192:109
119
.
Asafuddoula
,
M.
,
Ray
,
T.
, and
Sarker
,
R
. (
2011
). An adaptive differential evolution algorithm and its performance on real world optimization problems. In
2011 IEEE Congress of Evolutionary Computation
, pp.
1057
1062
.
Auger
,
A.
, and
Hansen
,
N
. (
2005
). Performance evaluation of an advanced local search evolutionary algorithm. In
The 2005 IEEE Congress on Evolutionary Computation, 2005
,
Vol. 2
, pp.
1777
1784
.
Caraffini
,
F.
,
Neri
,
F.
,
Iacca
,
G.
, and
Mol
,
A.
(
2013
).
Parallel memetic structures
.
Information Sciences
,
227:60
82
.
Chen
,
W.
,
Weise
,
T.
,
Yang
,
Z.
, and
Tang
,
K
. (
2010
). Large-scale global optimization using cooperative coevolution with variable interaction learning. In
International Conference on Parallel Problem Solving from Nature
, pp.
300
309
.
Cheng
,
R.
, and
Jin
,
Y
. (
2015a
).
A competitive swarm optimizer for large scale optimization
.
IEEE Transactions on Cybernetics
,
45
(
2
):
191
204
.
Cheng
,
R.
, and
Jin
,
Y.
(
2015b
).
A social learning particle swarm optimization algorithm for scalable optimization
.
Information Sciences
,
291:43
60
.
Das
,
S.
, and
Suganthan
,
P. N.
(
2010
).
Problem definitions and evaluation criteria for CEC 2011 competition on testing evolutionary algorithms on real world optimization problems
.
Technical report, Jadavpur University, India and Nanyang Technological University, Singapore
.
Daven
,
D.
,
Tit
,
N.
,
Morris
,
J. R.
, and
Ho
,
K
. (
1996
).
Structure optimization of Lennard–Jones clusters by a genetic algorithm
.
Chemical Physics Letters
,
256
(
1
):
195
200
.
Dong
,
N.
,
Wu
,
C.-H.
,
Ip
,
W.-H.
,
Chen
,
Z.-Q.
,
Chan
,
C.-Y.
, and
Yung
,
K.-L
. (
2012
).
An opposition-based chaotic ga/pso hybrid algorithm and its application in circle detection
.
Computers & Mathematics with Applications
,
64
(
6
):
1886
1902
.
Elsayed
,
S. M.
,
Sarker
,
R. A.
, and
Essam
,
D. L
. (
2011a
). Differential evolution with multiple strategies for solving CEC2011 real-world numerical optimization problems. In
2011 IEEE Congress of Evolutionary Computation
, pp.
1041
1048
.
Elsayed
,
S. M.
,
Sarker
,
R. A.
, and
Essam
,
D. L
. (
2011b
). Ga with a new multi-parent crossover for solving IEEE-CEC2011 competition problems. In
2011 IEEE Congress of Evolutionary Computation
, pp.
1034
1040
.
Faires
,
J.
, and
Burden
,
R
. (
2003
).
Numerical methods. 3rd ed
.
Boston
:
PWS Publishing Company
.
Gämperle
,
R.
,
Müller
,
S. D.
, and
Koumoutsakos
,
P
. (
2002
).
A parameter study for differential evolution
. Vol.
10
, pp.
293
298
.
Interlaken, Switzerland
:
WSEAS Press
.
Griewank
,
A.
, and
Toint
,
P
. (
1982
).
Local convergence analysis for partitioned quasi-Newton updates
.
Numerische Mathematik
,
39
(
3
):
429
448
.
Hansen
,
N.
,
Auger
,
A.
,
Ros
,
R.
,
Finck
,
S.
, and
Pošík
,
P
. (
2010
). Comparing results of 31 algorithms from the black-box optimization benchmarking bbob-2009. In
Proceedings of the 12th Annual Conference Companion on Genetic and Evolutionary Computation, GECCO '10
, pp.
1689
1696
.
Hoare
,
M
. (
1979
).
Structure and dynamics of simple microclusters
.
,
40
(
1979
):
49
135
.
Iacca
,
G.
,
Neri
,
F.
,
Mininno
,
E.
,
Ong
,
Y.-S.
, and
Lim
,
M.-H.
(
2012
).
Ockhams razor in memetic computing: Three stage optimal memetic exploration
.
Information Sciences
,
188:17
43
.
Kazimipour
,
B.
,
Li
,
X.
, and
Qin
,
A. K
. (
2013
). Initialization methods for large scale global optimization. In
2013 IEEE Congress on Evolutionary Computation
, pp.
2750
2757
.
Lagrange
,
A.
,
Fauvel
,
M.
, and
Grizonnet
,
M
. (
2017
).
Large-scale feature selection with Gaussian mixture models for the classification of high dimensional remote sensing images
.
IEEE Transactions on Computational Imaging
,
3
(
2
):
230
242
.
LaTorre
,
A.
,
Muelas
,
S.
, and
na
,
J. M. P
. (
2013
). Large scale global optimization: Experimental results with mos-based hybrid algorithms. In
2013 IEEE Congress on Evolutionary Computation
, pp.
2742
2749
.
LaTorre
,
A.
,
Muelas
,
S.
, and
na
,
J.-M. P.
(
2015
).
A comprehensive comparison of large scale global optimizers
.
Information Sciences
,
316:517
549
.
Leung
,
Y.-W.
, and
Wang
,
Y
. (
2001
).
An orthogonal genetic algorithm with quantization for global numerical optimization
.
IEEE Transactions on Evolutionary Computation
,
5
(
1
):
41
53
.
Li
,
X.
,
Tang
,
K.
,
Omidvar
,
M.
,
Yang
,
Z.
, and
Qin
,
K.
(
2013
).
Benchmark functions for the CEC'2013 special session and competition on large scale global optimization
.
Technical report, RMIT University
.
Li
,
X.
, and
Yao
,
X
. (
2012
).
Cooperatively coevolving particle swarms for large scale optimization
.
IEEE Transactions on Evolutionary Computation
,
16
(
2
):
210
224
.
Mei
,
Y.
,
Li
,
X.
, and
Yao
,
X
. (
2014
).
Cooperative coevolution with route distance grouping for large-scale capacitated arc routing problems
.
IEEE Transactions on Evolutionary Computation
,
18
(
3
):
435
449
.
Mersmann
,
O.
,
Preuss
,
M.
,
Trautmann
,
H.
,
Bischl
,
B.
, and
Weihs
,
C
. (
2015
).
Analyzing the bbob results by means of benchmarking concepts
.
Evolutionary Computation Journal
,
23
(
1
):
161
185
.
Molina
,
D.
,
Lozano
,
M.
, and
Herrera
,
F
. (
2010
). Ma-sw-chains: Memetic algorithm based on local search chains for large scale continuous global optimization. In
IEEE Congress on Evolutionary Computation
, pp.
1
8
.
Nocedal
,
J.
, and
Wright
,
S. J
. (
1999
).
Numerical optimization
.
New York
:
Springer-Verlag
.
Northby
,
J
. (
1987
).
Structure and binding of Lennard–Jones clusters: 13 ≤ n ≤ 147
.
The Journal of Physical Chemistry
,
87
(
10
):
6166
6177
.
Omidvar
,
M.
,
Li
,
X.
,
Yao
,
X.
, and
Yang
,
Z.
(
2010
).
Cooperative co-evolution for large scale optimization through more frequent random grouping
. In
Proceedings of the 2010 Congress on Evolutionary Computation
, pp.
1754
1761
.
IEEE
.
Omidvar
,
M. N.
,
Li
,
X.
,
Mei
,
Y.
, and
Yao
,
X
. (
2014
).
Cooperative co-evolution with differential grouping for large scale optimization
.
IEEE Transactions on Evolutionary Computation
,
18
(
3
):
378
393
.
Omidvar
,
M. N.
,
Li
,
X.
, and
Tang
,
K.
(
2015
).
Designing benchmark problems for large-scale continuous optimization
.
Information Sciences
,
316:419
436
.
Omidvar
,
M. N.
,
Li
,
X.
, and
Yao
,
X
. (
2010
). Cooperative co-evolution with delta grouping for large scale non-separable function optimization. In
IEEE Congress on Evolutionary Computation
, pp.
1
8
.
Pál
,
L.
,
Csendes
,
T.
,
Markót
,
M. C.
, and
Neumaier
,
A
. (
2012
).
Black box optimization benchmarking of the global method
.
Evolutionary Computation
,
20
(
4
):
609
639
.
Potter
,
M. A.
, and
Jong
,
K. A. D
. (
1994
). A cooperative coevolutionary approach to function optimization. In
Proceedings of the International Conference on Evolutionary Computation. The Third Conference on Parallel Problem Solving from Nature: Parallel Problem Solving from Nature
, pp.
249
257
.
Qin
,
A.
,
Huang
,
V.
, and
Suganthan
,
P
. (
2009
).
Differential evolution with strategy adaptation for global numerical optimization
.
IEEE Transactions on Evolutionary Computation
,
13
(
2
):
398
417
.
Reynoso-Meza
,
G.
,
Sanchis
,
J.
,
Blasco
,
X.
, and
Herrero
,
J. M
. (
2011
). Hybrid de algorithm with adaptive crossover operator for solving real-world numerical optimization problems. In
2011 IEEE Congress on Evolutionary Computation
, pp.
1551
1556
.
Romero
,
D.
,
Barrón
,
C.
, and
Gómez
,
S
. (
1999
).
The optimal geometry of Lennard–Jones clusters: 148–309
.
Computer Physics Communications
,
123
(
1
):
87
96
.
Sonoda
,
T.
,
Yamaguchi
,
Y.
,
Arima
,
T.
,
Olhofer
,
M.
,
Sendhoff
,
B.
, and
Schreiber
,
H.-A
. (
2004
).
Advanced high turning compressor airfoils for low reynolds number condition, part 1: Design and optimization
.
Journal of Turbomachinery
,
126
(
3
):
350
359
.
Tang
,
K.
,
Li
,
X.
,
Suganthan
,
P. N.
,
Yang
,
Z.
, and
Weise
,
T.
(
2009
).
Benchmark functions for the CEC'2010 special session and competition on large-scale global optimization
.
Technical report, Nature Inspired Computation and Applications Laboratory, USTC, China
.
Tang
,
K.
,
Yao
,
X.
,
Suganthan
,
P. N.
,
MacNish
,
C.
,
Chen
,
Y. P.
,
Chen
,
C. M.
, and
Yang
,
Z.
(
2007
).
Benchmark functions for the CEC'2008 special session and competition on large scale global optimization
.
Technical report, Nature Inspired Computation and Applications Laboratory, USTC, China
.
Valdez
,
S. I.
,
HernáNdez
,
A.
, and
Botello
,
S.
(
2013
).
A Boltzmann based estimation of distribution algorithm
.
Information Sciences
,
236:126
137
.
van den Bergh
,
F.
, and
Engelbrecht
,
A. P
. (
2004
).
A cooperative approach to particle swarm optimization
.
IEEE Transactions on Evolutionary Computation
,
8
(
3
):
225
239
.
Vesterstrom
,
J.
, and
Thomsen
,
R
. (
2004
). A comparative study of differential evolution, particle swarm optimization, and evolutionary algorithms on numerical benchmark problems. In
Proceedings of the 2004 Congress on Evolutionary Computation
, pp.
1980
1987
.
Vicini
,
A.
, and
Quagliarella
,
D
. (
1998
). Airfoil and wing design through hybrid optimization strategies. In
16th Applied Aerodynamics Conference
, pp.
1
11
.
Wales
,
D. J.
, and
Doye
,
J. P. K
. (
1997
).
Global optimization by basin-hopping and the lowest energy sstructure of Lennard–Jones clusters containing up to 110 atoms
.
The Journal of Physical Chemistry
,
101
(
28
):
5111
5116
.
Wei
,
F.
,
Wang
,
Y.
, and
Huo
,
Y
. (
2013
). Smoothing and auxiliary functions based cooperative coevolution for global optimization. In
2013 IEEE Congress on Evolutionary Computation
, pp.
2736
2741
.
Xiang
,
Y.
,
Jiang
,
H.
,
Cai
,
W.
, and
Shao
,
X
. (
2004
).
An efficient method based on lattice construction and the genetic algorithm for optimization of large Lennard–Jones clusters
.
The Journal of Physical Chemistry
,
108
(
16
):
3586
3592
.
Xu
,
R.
, and
Wunsch
,
D
. (
2005
).
Survey of clustering algorithms
.
IEEE Transactions on Neural Networks
,
16
(
3
):
645
678
.
Yang
,
Z.
,
Tang
,
K.
, and
Yao
,
X
. (
2008a
).
Large scale evolutionary optimization using cooperative coevolution
.
Information Sciences
,
178
(
15
):
2985
2999
.
Yang
,
Z.
,
Tang
,
K.
, and
Yao
,
X
. (
2008b
). Multilevel cooperative coevolution for large scale optimization. In
2008 IEEE Congress on Evolutionary Computation (IEEE World Congress on Computational Intelligence)
, pp.
1663
1670
.