Abstract

Biogeography-based optimization (BBO) is an evolutionary algorithm inspired by biogeography, which is the study of the migration of species between habitats. This paper derives a mathematical description of the dynamics of BBO based on ideas from statistical mechanics. Rather than trying to exactly predict the evolution of the population, statistical mechanics methods describe the evolution of statistical properties of the population fitness. This paper uses the one-max problem, which has only one optimum and whose fitness function is the number of 1s in a binary string, to derive equations that predict the statistical properties of BBO each generation in terms of those of the previous generation. These equations reveal the effect of migration and mutation on the population fitness dynamics of BBO. The results obtained in this paper are similar to those for the simple genetic algorithm with selection and mutation. The paper also derives equations for the population fitness dynamics of general separable functions, and we find that the results obtained for separable functions are the same as those for the one-max problem. The statistical mechanics theory of BBO is shown to be in good agreement with simulation.

1  Introduction

Mathematical models of biogeography describe the migration of species between habitats. Biogeography-based optimization (BBO) (Simon, 2008) is a new evolutionary algorithm for global optimization, introduced in 2008, that is an extension of biogeography theory to evolutionary algorithms (EAs). BBO has demonstrated good performance on various unconstrained and constrained benchmark functions (Boussaid et al., 2012; Du et al., 2009; Ergezer et al., 2009; Gong et al., 2010; Gong et al., 2011; Li and Yin, 2012; Ma et al., 2009; Ma and Simon, 2011). It has also been applied to real-world optimization problems, including economic load dispatch (Bhattacharya and Chattopadhyay, 2010), wireless network power allocation (Boussaid et al., 2011a), satellite image classification (Panchal et al., 2009), flexible job shop scheduling (Rahmati and Zandieh, 2012), power system optimization (Jamuna and Swarup, 2012; Kankanala et al., 2012; Lohokare et al., 2012; Rarick et al., 2009), antenna design (Singh et al., 2010), and others (Boussaid et al., 2011b; Chatterjee et al., 2012; Hadidi and Nazari, 2013; Van Laarhoven and Aarts, 1987). BBO probabilistically shares information between candidate solutions, like other EAs (Ahn, 2006; Vose, 1999; Yao et al., 1999). In BBO each candidate solution is composed of a set of independent variables, also called features. Each candidate solution immigrates features from other candidate solutions based on its immigration rate, and emigrates features to other candidate solutions based on its emigration rate.

BBO has a close relation with other EAs such as genetic algorithms (GAs), differential evolution (DE), and particle swarm optimization (PSO). This was highlighted in earlier work (Simon et al., 2011b), which also showed that in spite of commonality with other EAs, BBO has distinct features that give it unique properties. Because of these unique properties, it is useful to retain the distinction between BBO and other EAs. Another reason that it is useful to retain BBO as a distinct EA is that it easily allows the incorporation of behaviors from natural biogeography into BBO, such as the effect of geographical proximity on migration rates, nonlinear migration curves, the effect of species populations (including mortality and reproduction), predator/prey relationships, species mobilities, directional migration momentum, habitat area and isolation, and others (Ma, 2010; Costa e Silva et al., 2012; Goel et al., 2012).

Although BBO has shown good performance on various problems, it is not well understood for which problems it is effective, and why. The efficiency of BBO depends on the problem representation and the BBO tuning parameters. These parameters include the population size, the immigration curve shape, the emigration curve shape, and the mutation rate. For many problems, BBO can be very effective when a good representation is chosen and the parameters are set to appropriate values for the particular problem. When poor choices are made, BBO can perform like random search. If a mathematical model as a function of the BBO parameters could predict the improvement in fitness from one generation to the next, it could be used to find optimal values of the tuning parameters.

For example, consider a problem with very expensive fitness function evaluations; we may even need to do physical experiments to evaluate fitness. If we can find a model that predicts fitness improvement, we can use the model during early generations to tune BBO parameters or to predict if BBO will perform better than other EAs. So a mathematical model of the evolution of BBO would be useful to develop effective BBO modifications. More generally, a model of BBO dynamics could be useful in producing insights as to how the algorithm behaves and when it is likely to be effective.

There have been some attempts to describe the dynamics of BBO. The most well-developed method is the Markov chain model (Nix and Vose, 1992; Suzuki, 1995), which is an exact model of the evolution of BBO (Simon et al., 2009; 2011a; 2011b). It gives the probability of arriving at any population given the starting population as the generation count approaches infinity. But it is of little practical use for real-world problems because transition probabilities are never known in real applications. In this paper, we present a formalism based on ideas from statistical mechanics to study the dynamics of BBO. This method could lead to better understanding of the evolution of BBO in realistic high-dimensional optimization problems.

Statistical mechanics is a branch of physics that applies probability theory to the study of the thermodynamic behavior of systems consisting of a huge number of interacting entities. It provides methods to calculate the average properties of these systems by treating the small-scale motions as random. It is usually used to compute thermodynamic properties of particles interacting with a heat bath, but it has recently been applied to other problems, for example, mathematical modeling for the genetic algorithm (Prügel-Bennett and Shapiro, 1994; Rattray, 1995; 1996; Rattray and Shapiro, 1996; Shapiro et al., 1994) and simulated annealing (Van Laarhoven and Aarts, 1987). However, the modeling of GAs and simulated annealing using statistical mechanics has been limited to specific problems. This paper extends that work to BBO and also, for the first time, uses statistical mechanics ideas to model the population dynamics for a general but separable fitness function.

The advantage of statistical mechanics for describing the dynamics of BBO is that we only assume knowledge of the cumulants of the fitness distribution and the average fitness correlation within the population. The cumulant is defined by the generating function, which is given by the natural logarithm of the Fourier transform of the sample fitness probability distribution function. The cumulants contain the same information as the distribution function. The first and second cumulants are respectively called the mean and variance of the distribution, and the third cumulant is called the skewness and measures the asymmetry of the distribution. All cumulants except the first two are zero for a Gaussian distribution, so the higher-order cumulants measure the non-Gaussian content of the distribution.

In this paper, the dynamics description of BBO is computed in terms of the change of the cumulants due to the migration and mutation. The cumulants are used for a number of reasons. Cumulants can readily be measured, they are more stable than other statistical parameters because they tend to self-average, and they provide interesting results concerning how BBO works.

Section 2 describes the BBO algorithm. Section 3 introduces statistical mechanics and derives the statistical mechanics approximation of BBO for the one-max problem. Section 4 uses simulations to study the effect of migration and mutation, and then compares BBO with GA based on statistical mechanics models. Section 5 derives the statistical mechanics approximation of BBO for separable functions to generalize the model and to further reveal how BBO works. Section 6 provides some concluding remarks and suggestions for future work.

2  Biogeography-Based Optimization

BBO is a new evolutionary optimization algorithm that is inspired by biogeography (Simon, 2008). In BBO a biogeography habitat denotes a candidate optimization problem solution. Each candidate solution is composed of a set of features, also called decision variables or independent variables, that are similar to genes in GAs. A set of biogeography habitats denotes a population of candidate solutions. Like other EAs, each candidate solution in BBO probabilistically shares decision variables with other candidate solutions to improve candidate solution fitness. This sharing process is analogous to migration in biogeography. That is, each candidate solution immigrates decision variables from other candidate solutions based on its immigration rate, and emigrates decision variables to other candidate solutions based on its emigration rate. BBO consists of two main steps: migration and mutation.

Migration is a probabilistic operator that is intended to improve a candidate solution yk. For each decision variable of a given candidate solution yk, the candidate solution’s immigration rate is used to probabilistically decide whether or not to immigrate. If immigration is selected, then the emigrating candidate solution yj is probabilistically chosen based on the emigration rate . Migration is written as
formula
1
where s is a decision variable. In BBO, each candidate solution yk has its own immigration rate and emigration rate . A good candidate solution has a relatively high emigration rate and low immigration rate, whereas the converse is true for a poor candidate solution. Immigration rate and emigration rate are based on particular migration curves, such as the linear migration curves presented in Figure 1, where the maximum immigration rate and maximum emigration rate are assumed both equal to 1. Nonlinear immigration rates and emigration rates were discussed by Ma (2010).
Figure 1:

Linear migration curves for BBO. is the immigration rate and is the emigration rate, n is the best fitness. The maximum immigration rate and maximum emigration rate are both assumed equal to 1.

Figure 1:

Linear migration curves for BBO. is the immigration rate and is the emigration rate, n is the best fitness. The maximum immigration rate and maximum emigration rate are both assumed equal to 1.

The probability of immigrating to yk and the probability of emigrating from yk, for , are calculated as
formula
2
where N is the population size.

Mutation is a probabilistic operator that randomly modifies a decision variable of a candidate solution. The purpose of mutation is to increase diversity among the population, just as in other EAs. At the end of each generation, mutation of the kth candidate solution is implemented as shown in Algorithm 1.

In the mutation logic of Algorithm 1, is a uniformly distributed random number between a and b, is the mutation rate, and and are the lower and upper search bounds of decision variable s. The logic mutates each independent variable with a probability of . If mutation occurs for a given independent variable, then that independent variable is replaced with a random number in its search domain.

formula
formula

A description of one generation of BBO is given in Algorithm 2. Note that Algorithm 2 includes the use of a temporary population, so migration completes before the original population is replaced with the new population at the end of each generation.

Now we make several clarifications about Algorithm 2. First, a solution can emigrate to itself, which means that j might be chosen to be equal to k in the statement “.” That is, when an independent variable in a candidate solution zk is replaced via migration, the new independent variable might be chosen to come from zk itself. In this case, the independent variable is not actually changed in zk. However, the probabilistic choice of the emigrating candidate solution allows this to happen on occasion.

Second, in this paper we assume that the migration rates and are independent of the population distribution. That is, the best fitness that is used to define (Figure 1) is the best fitness for the problem, not the best fitness in the current population. Similarly, the worst fitness that is used to define is the worst fitness for the problem. Using the worst fitness to define is not much of a limitation because, in practice, we can assign an arbitrary small fitness value to poor candidate solutions. Alternatives to these assumptions are commonly used in practice, but these assumptions are needed for the statistical mechanics model development.

3  Statistical Mechanics Model of BBO

This section provides a preliminary foundation for the definition of the cumulants of a distribution function and then derives the statistical mechanics model of BBO under migration and mutation.

3.1  Preliminary Foundation

The cumulants of a probability distribution describe the shape of the distribution. Cumulant dynamics have already been developed for some evolutionary algorithms, such as the simple genetic algorithm (Reeves and Rowe, 2003). They are natural statistics to describe distributions that are close to Gaussian, since the higher-order cumulants are a measure of deviation from a Gaussian distribution. The first two cumulants are the mean and variance of the distribution, and the third and fourth cumulants are related to skewness and kurtosis, respectively. Since populations in evolutionary algorithms are often initialized randomly, and fitness is a function of many variables, and the central limit theorem states that the accumulated distribution of random variables becomes more Gaussian as the number of variables increases, many real-world problems have a Gaussian fitness distribution during the early generations.

Considering a scalar random variable X that can take non-negative integer values, for each , there is a certain probability Pk that X takes the value k. So the characteristic function of the probability distribution for X and its Taylor series can be described as
formula
3
where the coefficients are called the moments of the probability distribution, given by
formula
4
Expanding the natural logarithm of the characteristic function as a Taylor series gives
formula
5
where the coefficients for are called the cumulants of the probability distribution. In particular, the relations between the first three cumulants and the moments of the probability distribution have simple expressions:
formula
6
From these equations, it is seen that the first cumulant is the mean of the distribution, and the second cumulant is the variance. Since the Gaussian distribution is completely defined by its mean and variance and all the higher cumulants are zero, any distribution with very small values for its higher cumulants will look like a Gaussian. Namely, as long as populations' fitness are distributed approximately Gaussian, we can truncate the series expansion as shown in (5) and use just the first few cumulants to approximate fitness distributions.
The cumulants of a probability distribution have a linearity property that can be formulated as follows (Grinstead and Snell, 1997). If Xi is an independent random variable for , then
formula
7

An Example

To clarify the notations, the one-max problem is presented, which has only one optimum. The fitness of an individual is the number of 1s in the binary string. Given n as the length of a binary string, and X as the possible fitness values, the problem is defined as follows:
formula
8
where is bit j of a binary string. In this example, the length of the binary string n is set to 5, which is the maximum fitness, and the size of a population N is set to 24. An initial population is randomly generated with each bit independent of the others. It might be the following:
formula
Before calculating the cumulants, the independence of the bits is checked to confirm the linearity property in (7). Considering bit 1 and bit 2, for example, we obtain
formula
We see that
formula
which is consistent with the fact that bit 1 and bit 2 are independent. Similarly, we can numerically confirm that all the other bits in the population are independent, which confirms that the cumulants have the linearity property shown in (7) at the first generation. After migration at the end of the first generation, each bit in a given individual will still be independent because migration to each individual bit is independent of migration to the other bits (see Algorithm 2). Similarly, after mutation in the first generation, each bit in a given individual will be independent because mutation of each individual bit is independent (see Algorithm 1).
In this example, the population distribution over the set of possible fitness values is obtained as
formula
Based on (3), the characteristic function of the probability distribution and its Taylor series are
formula
Based on (6), the cumulants of the probability distribution are obtained as
formula
We find that the mean is 2.414 and the variance is 1.324. This distribution is approximately Gaussian, as shown in Figure 2. From this example, it is seen that cumulants can approximate the population distribution by truncating its series expansion.
Figure 2:

Probability distribution of the fitness X. The solid line denotes a Gaussian distribution, and the dots denote the real probabilities of the population.

Figure 2:

Probability distribution of the fitness X. The solid line denotes a Gaussian distribution, and the dots denote the real probabilities of the population.

The argument is that after migration, each bit in a given individual will remain independent because migration to each bit is independent of migration to the other bits, as shown in Algorithm 2. However, this is a simplistic assumption. Because of selection, BBO will introduce correlations between bits, and they will not remain independent for very long. Our hypothesis here is that the variables will remain mostly independent, at least during the early generations, so that statistical mechanics will provide a good fitness approximation. This hypothesis is verified in Section 4.

3.2  BBO Migration

Cumulants of a BBO fitness probability distribution provide certain average properties. These cumulants describe the fitness as a whole each generation. The cumulant dynamics tell us the trajectory of the distribution of the population fitness during the evolutionary process. In the following sections the equations of the dynamic evolution of BBO are derived based on the cumulants of its fitness distribution, and the specific one-max problem described in (8) is used to study the properties of BBO. Although the one-max problem is a simple optimization problem, the dynamic equations of BBO can provide predictive power and insight into BBO dynamics. In Section 5 we extend this model to general but separable fitness functions.

Consider a single bit position xj. Suppose that it has a randomly distributed population, and consider the effect of migration on the values in this bit position. There are two ways in which a bit can be equal to 1 after an immigration trial (that is, after one iteration of the ”for each decision variable s” loop in Algorithm 2). First, it can immigrate 1 from another individual in the population (that is, another individual emigrates a 1 to this bit). Second, it can start as 1 and not immigrate. So the expected value of this bit after an immigration trial is computed by
formula
9
Consider the first probability on the right side, that is, the probability that an emigrating bit is 1, which is proportional to the sum of the emigration rates of all individuals whose jth bit is equal to 1:
formula
10
where = , and N denotes the size of the population. Now consider the third probability on the right side of (9): If immigration does not occur, the probability that the value of bit j is 1 after an immigration trial is equal to the probability that the value of bit j is 1 before the immigration trial, which can be written as
formula
11
where denotes the value of bit j before an immigration trial.
Now consider the probability that the value of bit j is 1 before the immigration trial:
formula
12
Note that the mean value of bit j before the immigration trial is
formula
13
where denotes the number of individuals in the population whose jth bit is equal to 1. Combining this with (6), the first cumulant of the jth bit before the immigration trial is
formula
14
Next we consider the probabilities Pr(immigration) and Pr(no immigration) in (9), which are the average values of the immigration curve; that is, they are independent of any additional information about fitness value of any individual. Assuming that the migration curve is as shown in Figure 1, the emigration rate is normalized as
formula
15
So the probability Pr(immigration) can written as
formula
16
Furthermore, the average number of 1 bits in any given individual is , which denotes the mean of the fitness, and which is equal to the cumulant . The probabilities Pr(immigration) and Pr(no immigration) can thus be written as
formula
17
Note that if there is an immigration curve different from the one shown in Figure 1, we would need to recompute the average of the immigration curve to find Pr(immigration) and Pr(no immigration).
Substituting (10), (11), and (17) into (9), we write the expected bit value after an immigration trial as
formula
18
Note that we have normalized the emigration rate so that it has a minimum of 0 and a maximum of n/(n+1). Other normalizations are possible as long as the emigration rate is between 0 and 1, but this is a typical approach. Then we use (15) to obtain
formula
19
Next we note that
formula
20
Further notice that
formula
21
Assuming that the cumulants have the linear property described in (7), the cumulant of the binary string is equal to the sum of cumulants of each bit of the binary string:
formula
22
Combining (14), (20), (21), and (22) gives
formula
23
Now the expected value of bit xj after an immigration trial can be written as
formula
24
Note that the subscripts and t denote quantities after and before migration, respectively. For the other cumulants of bit xj, we note that bits can only take the value 0 or 1, so
formula
25
for all values of k. So the moments of all orders are equal, namely,
formula
26
Based on the relations of moments and cumulants described in (6), we obtain the equations of cumulants and as follows:
formula
27
The cumulant approximations of the population after migration are
formula
28
The derivations of (27) and (28) are in the Appendix; note that and are approximations. Cumulants and denote the mean and the variance of the population fitness distribution, and the cumulant denotes the skewness, which measures the degree of asymmetry of the distribution.

Cumulant and other higher-order cumulants are not discussed in this section. denotes the kurtosis, which measures the peakedness of distribution. EA researchers are typically more interested in the mean and the variance of fitness distributions, namely, and , and only affects based on (28). In fact, it is apparent from (28) that the influence of on is relatively small. So we make the approximation here that and other higher-order cumulants are zero, which gives us an approximation of the migration dynamics using just three variables.

Equation (28) gives the migration dynamics of the statistical properties of BBO. Figure 3 shows the evolution of the BBO fitness distribution due to migration for the one-max problem, where the probability distributions of the population fitness X are shown every 4 generations, up to 20 generations. From the figure we see that, as expected, the mean of the best fitness becomes larger as the generation count increases.

Figure 3:

Evolution of BBO fitness due only to migration. Probability distributions of the fitness X of the one-max problem are shown every 4 generations up to 20.

Figure 3:

Evolution of BBO fitness due only to migration. Probability distributions of the fitness X of the one-max problem are shown every 4 generations up to 20.

3.3  BBO Mutation

So far, we have considered only migration. In this section, mutation is added to the statistical mechanics model. Mutation in BBO is the operator that modifies a decision variable of a candidate solution with a small probability. It is the same as mutation in GA (see Algorithm 1). So the theory of GA mutation presented by Reeves and Rowe (2003) can directly apply to BBO. The results for the first three cumulants for the one-max problem considering only mutation are given here:
formula
29
where m is the mutation rate, and t+1 and t denote the quantities after and before mutation, respectively. Note that these mutation-only equations were previously obtained for the one-max problem in the case that the variables were taken to be (Prgel-Bennett, 1997, Sec. 3). We combine these mutation equations with those of BBO migration derived in the previous section. This gives the following model for BBO with both migration and mutation:
formula
30
These equations give statistical properties of BBO fitness values after migration and mutation in terms of those at the previous generation. So it is possible to iterate these equations to obtain predictions for the evolution of BBO populations.

4  Simulation Results

This section confirms the BBO statistical mechanics approximation model derived in Section 3 with simulation and then compares the approximation results of BBO with those of the GA.

4.1  Comparisons between Theory and Simulation

This section uses the 100-bit one-max problem described in (8) to verify the statistical model of BBO with simulation. The statistical approximations of BBO are obtained by iterating (28) when considering only migration and by iterating (30) when considering both migration and mutation. We use = 0 in the iterations of (28) and (30). The simulation parameters of BBO are as follows: population size 50, maximum immigration rate and maximum emigration rate 1, and 200 Monte Carlo runs. Figures 46 show comparisons between theoretical (statistical approximation) and simulated BBO results.

Figure 4:

BBO approximation and simulation results of cumulant with mutation rate m = 0.1, 0.01, and 0.001 for the 100-bit one-max problem.

Figure 4:

BBO approximation and simulation results of cumulant with mutation rate m = 0.1, 0.01, and 0.001 for the 100-bit one-max problem.

Figure 5:

BBO approximation and simulation results of cumulant with mutation rate m = 0.1, 0.01, and 0.001 for the 100-bit one-max problem.

Figure 5:

BBO approximation and simulation results of cumulant with mutation rate m = 0.1, 0.01, and 0.001 for the 100-bit one-max problem.

Figure 6:

BBO approximation and simulation results of cumulant with mutation rate m = 0.1, 0.01, and 0.001 for the 100-bit one-max problem.

Figure 6:

BBO approximation and simulation results of cumulant with mutation rate m = 0.1, 0.01, and 0.001 for the 100-bit one-max problem.

Several things are notable about Figures 46. First, the approximation results of BBO show that as the generation count increases, the first cumulant increases and the second cumulant decreases; namely, the mean increases and the variance decreases with the generation count. This is consistent with the expected behavior of BBO: as the generation count increases, the mean fitness of the population increases, and the population becomes more uniform.

Second, the cumulant approximations and the simulation results match well. This indicates that the statistical mechanics approximation model of BBO is reasonable, and that the statistical mechanics approximation is valid for the one-max problem. This fact allows us to make quantitative conclusions from statistical mechanics approximations, and to use those approximations to obtain valid conclusions regarding the performance of BBO with various tuning parameters.

Third, as the mutation rate decreases, the approximation for the mean becomes larger; namely, BBO achieves better fitness values. This indicates that low mutation rates have better performance for the one-max problem. A high mutation rate of 0.1 results in too much exploration, and the population remains too widely distributed across the search space. However, a low mutation rate of 0.01 or 0.001 allows BBO to exploit better fitness values in the population. These results are similar to those obtained by Markov model analysis by Simon et al. (2011a).

4.2  Comparisons between BBO and GA

To further investigate the statistical properties of BBO, we compare the approximation model of BBO with that of a simple GA using proportional selection and mutation presented by Prügel-Bennett and Shapiro (1994) and Reeves and Rowe (2003). The statistical mechanics model of a GA with selection only is
formula
31
Recall that the mutation model is given in (29). So the three-parameter approximation equations for a simple GA with selection and mutation are
formula
32

We see that the approximation equations of BBO in (30) are similar to those of the GA in (32), and the difference is the additional coefficient in some terms of the BBO model. As the problem size n becomes large, the BBO approximation approaches the GA approximation. These equations can be used to compare the statistical approximations of BBO and GA for the 100-bit one-max problem, as shown in Figures 79.

Figure 7:

Comparison of the approximation values of cumulant between BBO and GA with mutation rate m = 0.1, 0.01, and 0.001 for the 100-bit one-max problem.

Figure 7:

Comparison of the approximation values of cumulant between BBO and GA with mutation rate m = 0.1, 0.01, and 0.001 for the 100-bit one-max problem.

Figure 8:

Comparison of the approximation values of cumulant between BBO and GA with mutation rate m = 0.1, 0.01, and 0.001 for the 100-bit one-max problem.

Figure 8:

Comparison of the approximation values of cumulant between BBO and GA with mutation rate m = 0.1, 0.01, and 0.001 for the 100-bit one-max problem.

Figure 9:

Comparison of the approximation values of cumulant between BBO and GA with mutation rate m = 0.1, 0.01, and 0.001 for the 100-bit one-max problem.

Figure 9:

Comparison of the approximation values of cumulant between BBO and GA with mutation rate m = 0.1, 0.01, and 0.001 for the 100-bit one-max problem.

From Figures 79, several things are notable about the approximation comparisons between BBO and GA. First, the trends of the cumulants are very similar for BBO as for GA. When the generation count increases, the mean (the first cumulant ) increases, and the variance (the second cumulant ) decreases. The only qualitative difference we see is that the skewness (the third cumulant ) of GA suddenly changes slope during the first few generations, while the skewness of BBO changes more smoothly. These comparisons indicate that the expected behavior of BBO is similar to that of GA. This point has already been implied by Eqs. (30) and (32). It is not too surprising that BBO is similar to GA, because many EAs can be expressed in terms of each other. Simon et al. (2011b) compared the optimization performance between BBO and GA and reported similar results. On the other hand, the figures show that differences do exist between BBO and GA. This indicates that BBO has its own particular characteristics that are distinct from GA. A BBO individual uses its own fitness before deciding how likely it is to accept features from other candidate solutions. This simple and intuitive idea does not have an analogy in genetics but is motivated by biogeography. This implies that it is useful to retain the distinction between BBO and GA, and that the biogeographical foundation of BBO can motivate many extensions.

Second, Figure 7 shows that the mean fitness of BBO is smaller than that of GA, and Figure 8 shows that the fitness variance of BBO is larger than that of GA. This indicates that GA might have better convergence properties than BBO for high mutation rates for the one-max problem.

We should be wary of drawing general conclusions about BBO versus GA comparisons from these results because the BBO and GA versions used here are basic versions, and the results in this section are limited to the 100-bit one-max problem.

5  Separable Fitness Functions

In this section we extend the analysis of Section 3, which was limited to the one-max problem, to a general separable function , which is written as
formula
33
where
formula
34
where is bit j of a binary string, and n is the length of this binary string and also the number of features in each candidate solution.
Now we consider the effect of migration on the fitness in a single bit position xj. The expected fitness of this bit after an immigration trial is computed by
formula
35
First, we analyze the probability of xj = 1 as follows:
formula
36
Consider the probability that an emigrating bit is 1, which is proportional to the sum of the emigration rates of all individuals whose jth bit is equal to 1:
formula
37
where (1) = {population indices of individuals whose jth bit is equal to 1}, and N denotes the size of the population. We write
formula
38
where denotes the value of bit j before an immigration trial. We write the following probabilities Pr(immigration) and Pr(no immigration), which are the same as (17) assuming that the immigration curve is as shown in Figure 1:
formula
39
Substituting (37), (38), and (39) into (36), we obtain
formula
40
Now consider the probability that the value of bit j is 1 before the immigration trial:
formula
41
where denotes the number of individuals in the population whose jth bit is equal to 1.
The emigration rate , based on the linear migration rate function of Figure 1, is normalized as
formula
42
Then we can write
formula
43
and
formula
44
Now we substitute (41), (43), and (44) into (40) to obtain
formula
45
Combining (35) and (41), we write the expected fitness of bit xj before an immigration trial as
formula
46
Note that the subscript t denotes a quantity before migration. For ease of notation, we use and instead of and , respectively. Equation (46) can then be written as
formula
47
The expected fitness of bit xj after an immigration trial can be written as follows, which we obtain by substituting (45) into (35):
formula
48
Note that the subscript t+1 denotes a quantity after migration. Now we can use (47) to obtain
formula
49
Next, we use (7) to obtain
formula
50
Furthermore, using (41) and (47), we obtain
formula
51
Substituting (51) into (50), we obtain
formula
52
Next, we again use (41) and (47) to compute
formula
53
We use (45), (47), and (49) to obtain
formula
54
The derivations of equations (53) and (54) are in the Appendix. Substituting (51) and (53) into (54), we obtain
formula
55
which gives
formula
56
Note that the derivation of (56) is the same as that of (28), as shown in the Appendix. Comparing (52) and (56) with the first two equations of (28), we find that the cumulants and are the same. Namely, the dynamics of the mean and the variance are identical for all separable functions. The skewness (the third cumulant ) of separable functions is not derived here but is relegated to future work. Note that higher-order cumulants quantify the non-Gaussian content of the distribution.

These results indicate that formulations of the dynamics of BBO based on statistical mechanics can model how BBO works not only for the simple one-max problem but also for general but separable fitness functions. Many complex, nonseparable functions have been approximated as separable functions (Simon, 2011).

Figures 10 and 11 show comparisons between approximation results based on (52) and (56), and experimental results, for the 30-dimensional sphere function where each independent variable is restricted to 0, 1. We use = 0 when iterating (56). Figure 10 shows that the mean approximation based on (52) matches the empirical results quite well. Figure 11 shows that the variance approximation based on (56) matches very well for high mutation rates; when the mutation rate is low, the variance approximation still matches the empirical results well but not as well as for high mutation rates.

Figure 10:

BBO approximation and simulation results of cumulant with mutation rate m = 0.1, 0.01, and 0.001 for the 30-bit sphere function.

Figure 10:

BBO approximation and simulation results of cumulant with mutation rate m = 0.1, 0.01, and 0.001 for the 30-bit sphere function.

Figure 11:

BBO approximation and simulation results of cumulant with mutation rate m = 0.1, 0.01, and 0.001 for the 30-bit sphere function.

Figure 11:

BBO approximation and simulation results of cumulant with mutation rate m = 0.1, 0.01, and 0.001 for the 30-bit sphere function.

6  Conclusion

This paper presents formalisms for studying the dynamics of BBO for the one-max problem and general but separable functions, based on ideas from statistical mechanics. The derived model describes the evolution of the statistical properties of the BBO population fitness. The approximation ability of the statistical mechanics model has been explored, and describes the effect of the BBO migration and mutation operators. These new theoretical results are confirmed with simulation results. In addition, the theoretical results for BBO are compared with a simple GA.

This paper has obtained an approximate statistical mechanics model of BBO for separable problems, which can be used to study the effect of BBO tuning parameters on optimization performance. The numerical results in this paper are limited to a 100-bit one-max problem, which implies that GA outperforms BBO for that problem, and a 30-bit sphere problem. However, the results in this paper could conceivably be useful as a general tool for BBO tuning parameter trade-off studies.

For future work we suggest several important directions. First, the analysis of this paper is based on the one-max problem and general but separable functions. Future work could explore how well the model predicts performance on separable benchmark functions; how well it predicts performance on nonseparable benchmark functions; and how it could be used to estimate cumulants on more general and complicated optimization problems, including continuous optimization problems and the traveling salesman problem.

Second, this paper uses linear migration curves. It is also of interest to combine the statistical mechanics approximation with nonlinear migration curves (Ma, 2010) to explore the evolution of BBO populations in that case. This could provide insight into optimal BBO migration curves.

The third important direction for future work is to examine BBO performance variations with population size, immigration and emigration curves, and other BBO parameters, based on the statistical mechanics model of BBO obtained in this paper. The fourth direction is to generalize the statistical mechanics models of BBO and GA into a unified framework and to use the methods of this paper to model other evolutionary algorithms. The fifth direction is to further develop the optimization ability of BBO by using principles from natural biogeography theory to modify the BBO algorithm.

Acknowledgments

This material is based upon work supported by the National Science Foundation under Grant Nos. 0826124, 1344954, and by the National Natural Science Foundation of China under Grant Nos. 61305078, 61074032, 61179041. The comments of the anonymous reviewers were instrumental in improving this paper from its original version.

References

Ahn
,
C.
(
2006
).
Advances in evolutionary algorithms: Theory, design and practice
.
New York
:
Springer
.
Bhattacharya
,
A.
, and
Chattopadhyay
,
P.
(
2010
).
Hybrid differential evolution with biogeography-based optimization for solution of economic load dispatch
.
IEEE Transactions on Power Systems
,
25:1955
1964
.
Boussaid
,
I.
,
Chatterjee
,
A.
,
Siarry
,
P.
, and
Ahmed-Nacer
,
M.
(
2011a
).
Hybridizing biogeography-based optimization with differential evolution for optimal power allocation in wireless sensor networks
.
IEEE Transactions on Vehicular Technology
,
60
:
2347
2353
.
Boussaid
,
I.
,
Chatterjee
,
A.
,
Siarry
,
P.
, and
Ahmed-Nacer
,
M.
(
2011b
).
Two-stage update biogeography-based optimization using differential evolution algorithm (DBBO)
.
Computers and Operations Research
,
38
:
1188
1198
.
Boussaid
,
I.
,
Chatterjee
,
A.
,
Siarry
,
P.
, and
Ahmed-Nacer
,
M.
(
2012
).
Biogeography-based optimization for constrained optimization problems
.
Computers and Operations Research
,
39
:
3293
3304
.
Chatterjee
,
A.
,
Siarry
,
P.
,
Nakib
,
A.
, and
Blanc
,
R.
(
2012
).
An improved biogeography-based optimization approach for segmentation of human head CT-scan images employing fuzzy entropy
.
Engineering Applications of Artificial Intelligence
,
25
:
1698
1709
.
Costa e Silva
,
M.
,
Coelho
,
L.
, and
Lebensztajn
,
L.
(
2012
).
Multiobjective biogeography-based optimization based on predator-prey approach
.
IEEE Transactions on Magnetics
,
48
:
951
954
.
Du
,
D.
,
Simon
,
D.
, and
Ergezer
,
M.
(
2009
).
Biogeography-based optimization combined with evolutionary strategy and immigration refusal
. In
Proceedings of IEEE Conference on Systems, Man, and Cybernetics,
pp.
1023
1028
.
Ergezer
,
M.
,
Simon
,
D.
, and
Du
,
D
. (
2009
).
Oppositional biogeography-based optimization
. In
Proceedings of IEEE Conference on Systems, Man, and Cybernetics
, pp.
1035
1040
.
Goel
,
L.
,
Gupta
,
D.
, and
Panchal
,
V
. (
2012
).
Extended species abundance models of biogeography based optimization
. In
Proceedings of International Conference on Computational Intelligence, Modelling and Simulation
, pp.
7
12
.
Gong
,
W.
,
Cai
,
Z.
, and
Ling
,
C. X.
(
2011
).
DE/BBO: A hybrid differential evolution with biogeography-based optimization for global numerical optimization
.
Soft Computing
,
15
:
645
665
.
Gong
,
W.
,
Cai
,
Z.
,
Ling
,
C. X.
, and
Li
,
H.
(
2010
).
A real-coded biogeography-based optimization with mutation
.
Applied Mathematics and Computation
,
216
:
2749
2758
.
Grinstead
,
C.
, and
Snell
,
J.
(
1997
).
Introduction to probability.
Providence, RI
:
American Mathematical Society
.
Hadidi
,
A.
, and
Nazari
,
A.
(
2013
).
Design and economic optimization of shell-and-tube heat exchangers using biogeography-based (BBO) algorithm
.
Applied Thermal Engineering
,
51
:
1263
1272
.
Jamuna
,
K.
, and
Swarup
,
K.
(
2012
).
Multi-objective biogeography-based optimization for optimal PMU placement
.
Applied Soft Computing
,
12
:
1503
1510
.
Kankanala
,
P.
,
Srivastava
,
S.
,
Srivastava
,
A.
, and
Schulz
,
N.
(
2012
).
Optimal control of voltage and power in a multi-zonal MVDC shipboard power system
.
IEEE Transactions on Power Systems
,
27
:
642
650
.
Li
,
X.
, and
Yin
,
M.
(
2012
).
Multi-operator based biogeography-based optimization with mutation for global numerical optimization
.
Computers and Mathematics with Applications
,
64
:
2833
2844
.
Lohokare
,
M.
,
Panigrahi
,
B.
,
Pattnaik
,
S.
,
Devi
,
S.
, and
Mohapatra
,
A.
(
2012
).
Neighborhood search-driven accelerated biogeography-based optimization for optimal load dispatch
.
IEEE Transactions on Systems, Man, and Cybernetics, Part C: Applications and Reviews
,
42
:
641
652
.
Ma
,
H.
(
2010
).
An analysis of the equilibrium of migration models for biogeography-based optimization
.
Information Sciences
,
180
:
3444
3464
.
Ma
,
H.
,
Ni
,
S.
, and
Sun
,
M
. (
2009
).
Equilibrium species counts and migration model tradeoffs for biogeography-based optimization
. In
Proceedings of IEEE Conference on Decision and Control
, pp.
3306
3310
.
Ma
,
H.
, and
Simon
,
D.
(
2011
).
Blended biogeography-based optimization for constrained optimization
.
Engineering Applications of Artificial Intelligence
,
24
:
517
525
.
Nix
,
A.
, and
Vose
,
M.
(
1992
).
Modeling genetic algorithms with Markov chains
.
Annals of Mathematics and Artificial Intelligence
,
5
:
79
88
.
Panchal
,
V.
,
Singh
,
P.
,
Kaur
,
N.
, and
Kundra
,
H.
(
2009
).
Biogeography-based satellite image classification
.
International Journal of Computer Science and Information Security
,
6
:
269
274
.
Prügel-Bennett
,
A.
(
1997
).
Modelling evolving populations
.
Journal of Theoretical Biology
,
185
:
81
95
.
Prügel-Bennett
,
A.
, and
Shapiro
,
J. L.
(
1994
).
An analysis of genetic algorithms using statistic mechanics
.
Physical Review Letters
,
72
:
1305
1324
.
Rahmati
,
S.
, and
Zandieh
,
M.
(
2012
).
A new biogeography-based optimization (BBO) algorithm for the flexible job shop scheduling problem
.
International Journal of Advanced Manufacturing Technology
,
58
:
1115
1129
.
Rarick
,
R.
,
Simon
,
D.
,
Villaseca
,
F. E.
, and
Vyakaranam
,
B
. (
2009
).
Biogeography-based optimization and the solution of the power flow problem
. In
Proceedings of IEEE Conference on Systems, Man, and Cybernetics
, pp.
1029
1034
.
Rattray
,
M.
(
1995
).
The dynamics of a genetic algorithm under stabilizing selection
.
Complex Systems
,
9
:
213
234
.
Rattray
,
M.
(
1996
).
Modeling the dynamics of genetic algorithms using statistical mechanics
.
Unpublished doctoral dissertation, University of Manchester, England
.
Rattray
,
M.
, and
Shapiro
,
J. L.
(
1996
).
The dynamics of a genetic algorithm for a simple learning problem
.
Journal of Physics A
,
29
:
7451
7473
.
Reeves
,
C.
, and
Rowe
,
J.
(
2003
).
Genetic algorithms: Principles and perspectives.
Norwell, MA
:
Kluwer
.
Shapiro
,
J. L.
,
Prügel-Bennett
,
A.
, and
Rattray
,
M.
(
1994
).
A statistical mechanical formulation of the dynamics of genetic algorithms
. In
Proceedings of Workshop on Evolutionary Computing
, pp.
17
27
.
Lecture Notes in Computer Science
, vol.
865
.
Simon
,
D.
(
2008
).
Biogeography-based optimization
.
IEEE Transactions on Evolutionary Computation
,
12
:
702
713
.
Simon
,
D.
(
2011
).
A probabilistic analysis of a simplified biogeography-based optimization algorithm
.
Evolutionary Computation
,
19
:
167
188
.
Simon
,
D.
,
Ergezer
,
M.
, and
Du
,
D
. (
2009
).
Population distributions in biogeography-based optimization algorithms with elitism
. In
Proceedings of IEEE Conference on Systems, Man, and Cybernetics
, pp.
1017
1022
.
Simon
,
D.
,
Ergezer
,
M.
,
Du
,
D.
, and
Rarick
,
R.
(
2011a
).
Markov models for biogeography-based optimization
.
IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics
,
41
:
299
306
.
Simon
,
D.
,
Rarick
,
R.
,
Ergezer
,
M.
, and
Du
,
D.
(
2011b
).
Analytical and numerical comparisons of biogeography-based optimization and genetic algorithms
.
Information Sciences
,
181
:
1224
1248
.
Singh
,
U.
,
Singla
,
H.
, and
Kamal
,
T.
(
2010
).
Design of Yagi-Uda antenna using biogeography based optimization
.
IEEE Transactions on Antennas and Propagation
,
58
:
3375
3379
.
Suzuki
,
J.
(
1995
).
A Markov chain analysis on simple genetic algorithms
.
IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics
,
25
:
655
659
.
Van Laarhoven
,
P. J.
, and
Aarts
,
E. H.
(
1987
).
Simulated annealing: Theory and applications.
New York
:
Springer
.
Vose
,
M. D.
(
1999
).
The simple genetic algorithm: Foundations and theory.
Cambridge, MA
:
MIT Press
.
Wang
,
L.
, and
Xu
,
Y.
(
2011
).
An effective hybrid biogeography-based optimization algorithm for parameter estimation of chaotic systems
.
Expert Systems with Applications
,
38
:
15103
15109
.
Yao
,
X.
,
Liu
,
Y.
, and
Lin
,
G.
(
1999
).
Evolutionary programming made faster
.
IEEE Transactions on Evolutionary Computation
,
3
:
82
102
.

Appendix

First we derive the cumulants and of bit j for the one-max problem in Eq. (27). Based on (6) and (26), they are obtained as follows:
formula
Next we derive the cumulants , , and of the population after migration in Eq. (28):
formula
where the approximation is valid if (that is, the variance is less than the mean), in which case the square term has only a small effect on the final result.
formula
where the approximation is valid if (that is, the variance and skewness are both less than the mean).
Finally, we derive the cumulants and of bit j for the separable function in Eqs. (53) and (54), which are obtained as follows:
formula