## Abstract

Biogeography-based optimization (BBO) is an evolutionary algorithm inspired by biogeography, which is the study of the migration of species between habitats. This paper derives a mathematical description of the dynamics of BBO based on ideas from statistical mechanics. Rather than trying to exactly predict the evolution of the population, statistical mechanics methods describe the evolution of statistical properties of the population fitness. This paper uses the one-max problem, which has only one optimum and whose fitness function is the number of 1s in a binary string, to derive equations that predict the statistical properties of BBO each generation in terms of those of the previous generation. These equations reveal the effect of migration and mutation on the population fitness dynamics of BBO. The results obtained in this paper are similar to those for the simple genetic algorithm with selection and mutation. The paper also derives equations for the population fitness dynamics of general separable functions, and we find that the results obtained for separable functions are the same as those for the one-max problem. The statistical mechanics theory of BBO is shown to be in good agreement with simulation.

## 1 Introduction

Mathematical models of biogeography describe the migration of species between habitats. Biogeography-based optimization (BBO) (Simon, 2008) is a new evolutionary algorithm for global optimization, introduced in 2008, that is an extension of biogeography theory to evolutionary algorithms (EAs). BBO has demonstrated good performance on various unconstrained and constrained benchmark functions (Boussaid et al., 2012; Du et al., 2009; Ergezer et al., 2009; Gong et al., 2010; Gong et al., 2011; Li and Yin, 2012; Ma et al., 2009; Ma and Simon, 2011). It has also been applied to real-world optimization problems, including economic load dispatch (Bhattacharya and Chattopadhyay, 2010), wireless network power allocation (Boussaid et al., 2011a), satellite image classification (Panchal et al., 2009), flexible job shop scheduling (Rahmati and Zandieh, 2012), power system optimization (Jamuna and Swarup, 2012; Kankanala et al., 2012; Lohokare et al., 2012; Rarick et al., 2009), antenna design (Singh et al., 2010), and others (Boussaid et al., 2011b; Chatterjee et al., 2012; Hadidi and Nazari, 2013; Van Laarhoven and Aarts, 1987). BBO probabilistically shares information between candidate solutions, like other EAs (Ahn, 2006; Vose, 1999; Yao et al., 1999). In BBO each candidate solution is composed of a set of independent variables, also called features. Each candidate solution immigrates features from other candidate solutions based on its immigration rate, and emigrates features to other candidate solutions based on its emigration rate.

BBO has a close relation with other EAs such as genetic algorithms (GAs), differential evolution (DE), and particle swarm optimization (PSO). This was highlighted in earlier work (Simon et al., 2011b), which also showed that in spite of commonality with other EAs, BBO has distinct features that give it unique properties. Because of these unique properties, it is useful to retain the distinction between BBO and other EAs. Another reason that it is useful to retain BBO as a distinct EA is that it easily allows the incorporation of behaviors from natural biogeography into BBO, such as the effect of geographical proximity on migration rates, nonlinear migration curves, the effect of species populations (including mortality and reproduction), predator/prey relationships, species mobilities, directional migration momentum, habitat area and isolation, and others (Ma, 2010; Costa e Silva et al., 2012; Goel et al., 2012).

Although BBO has shown good performance on various problems, it is not well understood for which problems it is effective, and why. The efficiency of BBO depends on the problem representation and the BBO tuning parameters. These parameters include the population size, the immigration curve shape, the emigration curve shape, and the mutation rate. For many problems, BBO can be very effective when a good representation is chosen and the parameters are set to appropriate values for the particular problem. When poor choices are made, BBO can perform like random search. If a mathematical model as a function of the BBO parameters could predict the improvement in fitness from one generation to the next, it could be used to find optimal values of the tuning parameters.

For example, consider a problem with very expensive fitness function evaluations; we may even need to do physical experiments to evaluate fitness. If we can find a model that predicts fitness improvement, we can use the model during early generations to tune BBO parameters or to predict if BBO will perform better than other EAs. So a mathematical model of the evolution of BBO would be useful to develop effective BBO modifications. More generally, a model of BBO dynamics could be useful in producing insights as to how the algorithm behaves and when it is likely to be effective.

There have been some attempts to describe the dynamics of BBO. The most well-developed method is the Markov chain model (Nix and Vose, 1992; Suzuki, 1995), which is an exact model of the evolution of BBO (Simon et al., 2009; 2011a; 2011b). It gives the probability of arriving at any population given the starting population as the generation count approaches infinity. But it is of little practical use for real-world problems because transition probabilities are never known in real applications. In this paper, we present a formalism based on ideas from statistical mechanics to study the dynamics of BBO. This method could lead to better understanding of the evolution of BBO in realistic high-dimensional optimization problems.

Statistical mechanics is a branch of physics that applies probability theory to the study of the thermodynamic behavior of systems consisting of a huge number of interacting entities. It provides methods to calculate the average properties of these systems by treating the small-scale motions as random. It is usually used to compute thermodynamic properties of particles interacting with a heat bath, but it has recently been applied to other problems, for example, mathematical modeling for the genetic algorithm (Prügel-Bennett and Shapiro, 1994; Rattray, 1995; 1996; Rattray and Shapiro, 1996; Shapiro et al., 1994) and simulated annealing (Van Laarhoven and Aarts, 1987). However, the modeling of GAs and simulated annealing using statistical mechanics has been limited to specific problems. This paper extends that work to BBO and also, for the first time, uses statistical mechanics ideas to model the population dynamics for a general but separable fitness function.

The advantage of statistical mechanics for describing the dynamics of BBO is that we only assume knowledge of the cumulants of the fitness distribution and the average fitness correlation within the population. The cumulant is defined by the generating function, which is given by the natural logarithm of the Fourier transform of the sample fitness probability distribution function. The cumulants contain the same information as the distribution function. The first and second cumulants are respectively called the mean and variance of the distribution, and the third cumulant is called the skewness and measures the asymmetry of the distribution. All cumulants except the first two are zero for a Gaussian distribution, so the higher-order cumulants measure the non-Gaussian content of the distribution.

In this paper, the dynamics description of BBO is computed in terms of the change of the cumulants due to the migration and mutation. The cumulants are used for a number of reasons. Cumulants can readily be measured, they are more stable than other statistical parameters because they tend to self-average, and they provide interesting results concerning how BBO works.

Section 2 describes the BBO algorithm. Section 3 introduces statistical mechanics and derives the statistical mechanics approximation of BBO for the one-max problem. Section 4 uses simulations to study the effect of migration and mutation, and then compares BBO with GA based on statistical mechanics models. Section 5 derives the statistical mechanics approximation of BBO for separable functions to generalize the model and to further reveal how BBO works. Section 6 provides some concluding remarks and suggestions for future work.

## 2 Biogeography-Based Optimization

BBO is a new evolutionary optimization algorithm that is inspired by biogeography (Simon, 2008). In BBO a biogeography habitat denotes a candidate optimization problem solution. Each candidate solution is composed of a set of features, also called decision variables or independent variables, that are similar to genes in GAs. A set of biogeography habitats denotes a population of candidate solutions. Like other EAs, each candidate solution in BBO probabilistically shares decision variables with other candidate solutions to improve candidate solution fitness. This sharing process is analogous to migration in biogeography. That is, each candidate solution immigrates decision variables from other candidate solutions based on its immigration rate, and emigrates decision variables to other candidate solutions based on its emigration rate. BBO consists of two main steps: migration and mutation.

*Migration*is a probabilistic operator that is intended to improve a candidate solution

*y*. For each decision variable of a given candidate solution

_{k}*y*, the candidate solution’s immigration rate is used to probabilistically decide whether or not to immigrate. If immigration is selected, then the emigrating candidate solution

_{k}*y*is probabilistically chosen based on the emigration rate . Migration is written as where

_{j}*s*is a decision variable. In BBO, each candidate solution

*y*has its own immigration rate and emigration rate . A good candidate solution has a relatively high emigration rate and low immigration rate, whereas the converse is true for a poor candidate solution. Immigration rate and emigration rate are based on particular migration curves, such as the linear migration curves presented in Figure 1, where the maximum immigration rate and maximum emigration rate are assumed both equal to 1. Nonlinear immigration rates and emigration rates were discussed by Ma (2010).

_{k}*Mutation* is a probabilistic operator that randomly modifies a decision variable of a candidate solution. The purpose of mutation is to increase diversity among the population, just as in other EAs. At the end of each generation, mutation of the *k*th candidate solution is implemented as shown in Algorithm 1.

In the mutation logic of Algorithm 1, is a uniformly distributed random number between *a* and *b*, is the mutation rate, and and are the lower and upper search bounds of decision variable *s*. The logic mutates each independent variable with a probability of . If mutation occurs for a given independent variable, then that independent variable is replaced with a random number in its search domain.

A description of one generation of BBO is given in Algorithm 2. Note that Algorithm 2 includes the use of a temporary population, so migration completes before the original population is replaced with the new population at the end of each generation.

Now we make several clarifications about Algorithm 2. First, a solution can emigrate to itself, which means that *j* might be chosen to be equal to *k* in the statement “.” That is, when an independent variable in a candidate solution *z _{k}* is replaced via migration, the new independent variable might be chosen to come from

*z*itself. In this case, the independent variable is not actually changed in

_{k}*z*. However, the probabilistic choice of the emigrating candidate solution allows this to happen on occasion.

_{k}Second, in this paper we assume that the migration rates and are independent of the population distribution. That is, the best fitness that is used to define (Figure 1) is the best fitness for the problem, not the best fitness in the current population. Similarly, the worst fitness that is used to define is the worst fitness for the problem. Using the worst fitness to define is not much of a limitation because, in practice, we can assign an arbitrary small fitness value to poor candidate solutions. Alternatives to these assumptions are commonly used in practice, but these assumptions are needed for the statistical mechanics model development.

## 3 Statistical Mechanics Model of BBO

This section provides a preliminary foundation for the definition of the cumulants of a distribution function and then derives the statistical mechanics model of BBO under migration and mutation.

### 3.1 Preliminary Foundation

The cumulants of a probability distribution describe the shape of the distribution. Cumulant dynamics have already been developed for some evolutionary algorithms, such as the simple genetic algorithm (Reeves and Rowe, 2003). They are natural statistics to describe distributions that are close to Gaussian, since the higher-order cumulants are a measure of deviation from a Gaussian distribution. The first two cumulants are the mean and variance of the distribution, and the third and fourth cumulants are related to skewness and kurtosis, respectively. Since populations in evolutionary algorithms are often initialized randomly, and fitness is a function of many variables, and the central limit theorem states that the accumulated distribution of random variables becomes more Gaussian as the number of variables increases, many real-world problems have a Gaussian fitness distribution during the early generations.

*X*that can take non-negative integer values, for each , there is a certain probability

*P*that

_{k}*X*takes the value

*k*. So the characteristic function of the probability distribution for

*X*and its Taylor series can be described as where the coefficients are called the moments of the probability distribution, given by Expanding the natural logarithm of the characteristic function as a Taylor series gives where the coefficients for are called the cumulants of the probability distribution. In particular, the relations between the first three cumulants and the moments of the probability distribution have simple expressions: From these equations, it is seen that the first cumulant is the mean of the distribution, and the second cumulant is the variance. Since the Gaussian distribution is completely defined by its mean and variance and all the higher cumulants are zero, any distribution with very small values for its higher cumulants will look like a Gaussian. Namely, as long as populations' fitness are distributed approximately Gaussian, we can truncate the series expansion as shown in (5) and use just the first few cumulants to approximate fitness distributions.

*X*is an independent random variable for , then

_{i}#### An Example

*n*as the length of a binary string, and

*X*as the possible fitness values, the problem is defined as follows: where is bit

*j*of a binary string. In this example, the length of the binary string

*n*is set to 5, which is the maximum fitness, and the size of a population

*N*is set to 24. An initial population is randomly generated with each bit independent of the others. It might be the following:

The argument is that after migration, each bit in a given individual will remain independent because migration to each bit is independent of migration to the other bits, as shown in Algorithm 2. However, this is a simplistic assumption. Because of selection, BBO will introduce correlations between bits, and they will not remain independent for very long. Our hypothesis here is that the variables will remain mostly independent, at least during the early generations, so that statistical mechanics will provide a good fitness approximation. This hypothesis is verified in Section 4.

### 3.2 BBO Migration

Cumulants of a BBO fitness probability distribution provide certain average properties. These cumulants describe the fitness as a whole each generation. The cumulant dynamics tell us the trajectory of the distribution of the population fitness during the evolutionary process. In the following sections the equations of the dynamic evolution of BBO are derived based on the cumulants of its fitness distribution, and the specific one-max problem described in (8) is used to study the properties of BBO. Although the one-max problem is a simple optimization problem, the dynamic equations of BBO can provide predictive power and insight into BBO dynamics. In Section 5 we extend this model to general but separable fitness functions.

*x*. Suppose that it has a randomly distributed population, and consider the effect of migration on the values in this bit position. There are two ways in which a bit can be equal to 1 after an immigration trial (that is, after one iteration of the ”for each decision variable

_{j}*s*” loop in Algorithm 2). First, it can immigrate 1 from another individual in the population (that is, another individual emigrates a 1 to this bit). Second, it can start as 1 and not immigrate. So the expected value of this bit after an immigration trial is computed by Consider the first probability on the right side, that is, the probability that an emigrating bit is 1, which is proportional to the sum of the emigration rates of all individuals whose

*j*th bit is equal to 1: where = , and

*N*denotes the size of the population. Now consider the third probability on the right side of (9): If immigration does not occur, the probability that the value of bit

*j*is 1 after an immigration trial is equal to the probability that the value of bit

*j*is 1 before the immigration trial, which can be written as where denotes the value of bit

*j*before an immigration trial.

*j*is 1 before the immigration trial: Note that the mean value of bit

*j*before the immigration trial is where denotes the number of individuals in the population whose

*j*th bit is equal to 1. Combining this with (6), the first cumulant of the

*j*th bit before the immigration trial is

*x*after an immigration trial can be written as Note that the subscripts and

_{j}*t*denote quantities after and before migration, respectively. For the other cumulants of bit

*x*, we note that bits can only take the value 0 or 1, so for all values of

_{j}*k*. So the moments of all orders are equal, namely, Based on the relations of moments and cumulants described in (6), we obtain the equations of cumulants and as follows:

Cumulant and other higher-order cumulants are not discussed in this section. denotes the kurtosis, which measures the peakedness of distribution. EA researchers are typically more interested in the mean and the variance of fitness distributions, namely, and , and only affects based on (28). In fact, it is apparent from (28) that the influence of on is relatively small. So we make the approximation here that and other higher-order cumulants are zero, which gives us an approximation of the migration dynamics using just three variables.

Equation (28) gives the migration dynamics of the statistical properties of BBO. Figure 3 shows the evolution of the BBO fitness distribution due to migration for the one-max problem, where the probability distributions of the population fitness *X* are shown every 4 generations, up to 20 generations. From the figure we see that, as expected, the mean of the best fitness becomes larger as the generation count increases.

### 3.3 BBO Mutation

*m*is the mutation rate, and

*t*+1 and

*t*denote the quantities after and before mutation, respectively. Note that these mutation-only equations were previously obtained for the one-max problem in the case that the variables were taken to be (Prgel-Bennett, 1997, Sec. 3). We combine these mutation equations with those of BBO migration derived in the previous section. This gives the following model for BBO with both migration and mutation: These equations give statistical properties of BBO fitness values after migration and mutation in terms of those at the previous generation. So it is possible to iterate these equations to obtain predictions for the evolution of BBO populations.

## 4 Simulation Results

This section confirms the BBO statistical mechanics approximation model derived in Section 3 with simulation and then compares the approximation results of BBO with those of the GA.

### 4.1 Comparisons between Theory and Simulation

This section uses the 100-bit one-max problem described in (8) to verify the statistical model of BBO with simulation. The statistical approximations of BBO are obtained by iterating (28) when considering only migration and by iterating (30) when considering both migration and mutation. We use = 0 in the iterations of (28) and (30). The simulation parameters of BBO are as follows: population size 50, maximum immigration rate and maximum emigration rate 1, and 200 Monte Carlo runs. Figures 4–6 show comparisons between theoretical (statistical approximation) and simulated BBO results.

Several things are notable about Figures 4–6. First, the approximation results of BBO show that as the generation count increases, the first cumulant increases and the second cumulant decreases; namely, the mean increases and the variance decreases with the generation count. This is consistent with the expected behavior of BBO: as the generation count increases, the mean fitness of the population increases, and the population becomes more uniform.

Second, the cumulant approximations and the simulation results match well. This indicates that the statistical mechanics approximation model of BBO is reasonable, and that the statistical mechanics approximation is valid for the one-max problem. This fact allows us to make quantitative conclusions from statistical mechanics approximations, and to use those approximations to obtain valid conclusions regarding the performance of BBO with various tuning parameters.

Third, as the mutation rate decreases, the approximation for the mean becomes larger; namely, BBO achieves better fitness values. This indicates that low mutation rates have better performance for the one-max problem. A high mutation rate of 0.1 results in too much exploration, and the population remains too widely distributed across the search space. However, a low mutation rate of 0.01 or 0.001 allows BBO to exploit better fitness values in the population. These results are similar to those obtained by Markov model analysis by Simon et al. (2011a).

### 4.2 Comparisons between BBO and GA

We see that the approximation equations of BBO in (30) are similar to those of the GA in (32), and the difference is the additional coefficient in some terms of the BBO model. As the problem size *n* becomes large, the BBO approximation approaches the GA approximation. These equations can be used to compare the statistical approximations of BBO and GA for the 100-bit one-max problem, as shown in Figures 7–9.

From Figures 7–9, several things are notable about the approximation comparisons between BBO and GA. First, the trends of the cumulants are very similar for BBO as for GA. When the generation count increases, the mean (the first cumulant ) increases, and the variance (the second cumulant ) decreases. The only qualitative difference we see is that the skewness (the third cumulant ) of GA suddenly changes slope during the first few generations, while the skewness of BBO changes more smoothly. These comparisons indicate that the expected behavior of BBO is similar to that of GA. This point has already been implied by Eqs. (30) and (32). It is not too surprising that BBO is similar to GA, because many EAs can be expressed in terms of each other. Simon et al. (2011b) compared the optimization performance between BBO and GA and reported similar results. On the other hand, the figures show that differences do exist between BBO and GA. This indicates that BBO has its own particular characteristics that are distinct from GA. A BBO individual uses its own fitness before deciding how likely it is to accept features from other candidate solutions. This simple and intuitive idea does not have an analogy in genetics but is motivated by biogeography. This implies that it is useful to retain the distinction between BBO and GA, and that the biogeographical foundation of BBO can motivate many extensions.

Second, Figure 7 shows that the mean fitness of BBO is smaller than that of GA, and Figure 8 shows that the fitness variance of BBO is larger than that of GA. This indicates that GA might have better convergence properties than BBO for high mutation rates for the one-max problem.

We should be wary of drawing general conclusions about BBO versus GA comparisons from these results because the BBO and GA versions used here are basic versions, and the results in this section are limited to the 100-bit one-max problem.

## 5 Separable Fitness Functions

*j*of a binary string, and

*n*is the length of this binary string and also the number of features in each candidate solution.

*x*. The expected fitness of this bit after an immigration trial is computed by First, we analyze the probability of

_{j}*x*= 1 as follows: Consider the probability that an emigrating bit is 1, which is proportional to the sum of the emigration rates of all individuals whose

_{j}*j*th bit is equal to 1: where (1) = {population indices of individuals whose

*j*th bit is equal to 1}, and

*N*denotes the size of the population. We write where denotes the value of bit

*j*before an immigration trial. We write the following probabilities Pr(immigration) and Pr(no immigration), which are the same as (17) assuming that the immigration curve is as shown in Figure 1: Substituting (37), (38), and (39) into (36), we obtain Now consider the probability that the value of bit

*j*is 1 before the immigration trial: where denotes the number of individuals in the population whose

*j*th bit is equal to 1.

*x*before an immigration trial as Note that the subscript

_{j}*t*denotes a quantity before migration. For ease of notation, we use and instead of and , respectively. Equation (46) can then be written as The expected fitness of bit

*x*after an immigration trial can be written as follows, which we obtain by substituting (45) into (35): Note that the subscript

_{j}*t*+1 denotes a quantity after migration. Now we can use (47) to obtain Next, we use (7) to obtain Furthermore, using (41) and (47), we obtain Substituting (51) into (50), we obtain Next, we again use (41) and (47) to compute We use (45), (47), and (49) to obtain The derivations of equations (53) and (54) are in the Appendix. Substituting (51) and (53) into (54), we obtain which gives Note that the derivation of (56) is the same as that of (28), as shown in the Appendix. Comparing (52) and (56) with the first two equations of (28), we find that the cumulants and are the same. Namely, the dynamics of the mean and the variance are identical for all separable functions. The skewness (the third cumulant ) of separable functions is not derived here but is relegated to future work. Note that higher-order cumulants quantify the non-Gaussian content of the distribution.

These results indicate that formulations of the dynamics of BBO based on statistical mechanics can model how BBO works not only for the simple one-max problem but also for general but separable fitness functions. Many complex, nonseparable functions have been approximated as separable functions (Simon, 2011).

Figures 10 and 11 show comparisons between approximation results based on (52) and (56), and experimental results, for the 30-dimensional sphere function where each independent variable is restricted to 0, 1. We use = 0 when iterating (56). Figure 10 shows that the mean approximation based on (52) matches the empirical results quite well. Figure 11 shows that the variance approximation based on (56) matches very well for high mutation rates; when the mutation rate is low, the variance approximation still matches the empirical results well but not as well as for high mutation rates.

## 6 Conclusion

This paper presents formalisms for studying the dynamics of BBO for the one-max problem and general but separable functions, based on ideas from statistical mechanics. The derived model describes the evolution of the statistical properties of the BBO population fitness. The approximation ability of the statistical mechanics model has been explored, and describes the effect of the BBO migration and mutation operators. These new theoretical results are confirmed with simulation results. In addition, the theoretical results for BBO are compared with a simple GA.

This paper has obtained an approximate statistical mechanics model of BBO for separable problems, which can be used to study the effect of BBO tuning parameters on optimization performance. The numerical results in this paper are limited to a 100-bit one-max problem, which implies that GA outperforms BBO for that problem, and a 30-bit sphere problem. However, the results in this paper could conceivably be useful as a general tool for BBO tuning parameter trade-off studies.

For future work we suggest several important directions. First, the analysis of this paper is based on the one-max problem and general but separable functions. Future work could explore how well the model predicts performance on separable benchmark functions; how well it predicts performance on nonseparable benchmark functions; and how it could be used to estimate cumulants on more general and complicated optimization problems, including continuous optimization problems and the traveling salesman problem.

Second, this paper uses linear migration curves. It is also of interest to combine the statistical mechanics approximation with nonlinear migration curves (Ma, 2010) to explore the evolution of BBO populations in that case. This could provide insight into optimal BBO migration curves.

The third important direction for future work is to examine BBO performance variations with population size, immigration and emigration curves, and other BBO parameters, based on the statistical mechanics model of BBO obtained in this paper. The fourth direction is to generalize the statistical mechanics models of BBO and GA into a unified framework and to use the methods of this paper to model other evolutionary algorithms. The fifth direction is to further develop the optimization ability of BBO by using principles from natural biogeography theory to modify the BBO algorithm.

## Acknowledgments

This material is based upon work supported by the National Science Foundation under Grant Nos. 0826124, 1344954, and by the National Natural Science Foundation of China under Grant Nos. 61305078, 61074032, 61179041. The comments of the anonymous reviewers were instrumental in improving this paper from its original version.

## References

## Appendix

*j*for the one-max problem in Eq. (27). Based on (6) and (26), they are obtained as follows: