## Abstract

To maintain the population diversity of genetic algorithms (GAs), we are required to employ an appropriate population diversity measure. However, commonly used population diversity measures designed for permutation problems do not consider the dependencies between the variables of the individuals in the population. We propose three types of population diversity measures that address high-order dependencies between the variables to investigate the effectiveness of considering high-order dependencies. The first is formulated as the entropy of the probability distribution of individuals estimated from the population based on an $m$-th--order Markov model. The second is an extension of the first. The third is similar to the first, but it is based on a variable order Markov model. The proposed population diversity measures are incorporated into the evaluation function of a GA for the traveling salesman problem to maintain population diversity. Experimental results demonstrate the effectiveness of the three types of high-order entropy-based population diversity measures against the commonly used population diversity measures.

## 1 Introduction

The maintenance of population diversity is recognized as an important factor for fully exercising the capability of evolutionary algorithms (EAs), and a wide variety of population management strategies for promoting diversity inside the population have been proposed. A survey of methodologies for promoting population diversity in EAs can be found in Squillero and Tonda (2016).

Several population diversity management methodologies utilize measures of population diversity, which can be used to analyze the behavior of EAs (Yao, 1993; Tsai et al., 2004; Wang et al., 2010), to select individuals to maintain population diversity in a positive manner (Maekawa et al., 1996; Zhang et al., 2006; Nagata, 2006; Nagata and Kobayashi, 2013), and as a trigger to activate diversification procedures (Tsujimura and Gen, 1998; Vallada and Ruiz, 2010). The pairwise Hamming distance (the average of the Hamming distance between all possible pairs of the population members) is the most commonly used measure of population diversity. Another commonly used population diversity measure is based on entropy. In information theory, entropy, defined as $-\u2211s\u2208Spslogps$, is a measure of the uncertainty of a probability distribution $ps(s\u2208S)$, where $S$ is a set of all possible events. This definition, however, cannot be directly used to measure population diversity because the population size is typically considerably smaller than the number of all possible solution candidates in the search space $S$. Therefore, to the best of our knowledge, the entropy-based population diversity measures proposed in previous works are all defined as the sum of the entropies of the univariate marginal distributions of all variables. For example, let the solution space $S$ be defined as $(x1,\u2026,xn)$, where $xi$ is a variable taking values in a discrete set $Ai$. The entropy of the $i$-th variable is defined as $Hi=-\u2211j\u2208Aipijlogpij$, where $pij$ is the probability that $xi$ has a value $j$ in the population. Then, the commonly used entropy-based population diversity measure is defined as $H=\u2211i=1nHi$. In this article, we refer to an entropy-based population diversity measure defined in this manner as an *independent entropy measure*. In previous works, the independent entropy measure was incorporated into EAs applied to the knapsack problem (Mori et al., 1996), binary quadratic programming problem (Wang et al., 2010), traveling salesman problem (Yao, 1993; Maekawa et al., 1996; Tsujimura and Gen, 1998; Tsai et al., 2004; Nagata, 2006; Nagata and Kobayashi, 2013), and others (Zhang et al., 2006).

The independent entropy measure (and other commonly used population diversity measures), however, is not able to consider the dependencies between the variables of the individuals in the population, which creates a situation where population diversity cannot be evaluated appropriately. For example, consider an extreme example on the $n$-dimensional binary solution space where half of the population members are “$00\cdots 00$” and the other half are “$11\cdots 11$.” The value of the independent entropy measure of this population is virtually the same as that of a randomly generated population because $pi0\u2243pi1\u22430.5(i=1,\u2026,n)$ for both populations, even though “true” population diversity is extremely low in the former. Therefore, our motivation herein is to design a more appropriate entropy-based population diversity measure by considering dependencies between the variables of the individuals in the population. We refer to such a population diversity measure as a *high-order entropy measure*.

In this article, we propose several high-order entropy measures for the traveling salesman problem (TSP) to investigate the advantages of using entropy-based population diversity measures that consider the dependencies between the variables. We first formulate high-order entropy measures based on a fixed-order Markov model, where we assume that the probability of observing each vertex at a certain position in an individual (tour) of the population depends on the sequence of $m(\u22651)$ precedent vertices. We further extend this model into a variable-order Markov model, where the value of $m$ varies depending on the situation.

We tested the proposed high-order entropy measures on a genetic algorithm (GA) developed by Nagata and Kobayashi (2013), which is known as one of the most effective heuristic algorithms for the TSP. In this GA, one important feature for achieving a top performance is to maintain population diversity by evaluating offspring solutions based on an evaluation function that incorporates a population diversity measure as well as the original evaluation function (tour length). An independent entropy measure is used for evaluating the population diversity. In this article, we perform this GA by replacing the original independent entropy measure with each of the proposed high-order entropy measures in the evaluation function.

Preliminary reports for the high-order entropy measures proposed in this article, were presented in previous works of the author (Nagata and Ono, 2013; Nagata, 2016). This article provides a full description and instructive analysis of the proposed high-order entropy measures; Section 6.3 and the Appendix are completely new and other parts significantly extend the contents of the conference proceedings. The remainder of this article is organized as follows. In Section 2, we describe the background of this study. In Section 3, we propose two types of high-order entropy measures based on a fixed-order Markov model. In Section 4, we propose a high-order entropy measure based on a variable-order Markov model. In Section 5, the GA framework, where the proposed population diversity measures are incorporated, is described. Computational results are presented in Section 6. Finally, conclusions are provided in Section 7.

## 2 Background

We first consider commonly used population diversity measures (independent entropy measure and pairwise Hamming distance) in a general case and then describe the independent entropy measure for the TSP. We also refer to the difficulty of designing an entropy-based population diversity measure that considers the dependencies between the variables.

### 2.1 Commonly Used Population Diversity Measures

The similarity between the independent entropy measure $Hind$ and the pairwise Hamming distance $D$ was discussed in Wineberg and Oppacher (2003). As suggested in Nagata and Kobayashi (2013), one advantage of the independent entropy measure over the pairwise Hamming distance is the sensitivity to the change of rare elements in the population. This feature makes $Hind$ a more appropriate population diversity measure than $D$.

### 2.2 TSP Case

Let an asymmetric TSP (ATSP) be defined on a complete directed graph $(V,E)$ with a set of vertices $V={1,\u2026,n}$ and a set of edges $E={(i,j)\u2223i,j\u2208V}$. In the asymmetric case, the distance (or cost) between two vertices depends on the travel direction. If the distance is the same in both directions, the TSP is called a symmetric TSP (STSP).

In the majority of GAs applied to the TSP, an individual is represented as the order of vertices on the tour. It is also possible to represent an individual such that the variable $xi$ represents the vertex subsequent to vertex $i$ in the tour, which is well suited for the definition of population diversities $D$ and $Hind$. Using this expression, $P(Xi=l)(l\u22081,2,\cdots n)$ is defined as the probability distribution of the vertices subsequent to vertex $i$ in the population, and the population diversity measures $Hind$ and $D$ are defined according to Eqs. (1) and (3), respectively. Here, we need to give attention to the STSP case. In this case, we must consider both travel directions for each tour because the population diversity should not depend on the travel direction. Therefore, $P(Xi=l)(l\u22081,2,\cdots n)$ is defined as the probability distribution of the vertices linked to vertex $i$ in the population. The population diversity measure $Hind$ defined in this manner is used in GAs (Maekawa et al., 1996; Tsai et al., 2004; Nagata, 2006; Nagata and Kobayashi, 2013) for the STSP to control the diversity of the population.

### 2.3 Difficulty in Considering Dependencies

A Bayesian network could be also useful to model the joint probability distribution. It is represented as $P(x1,x2,\u2026,xn)=\u220fi=1nP(xi\u2223xparent(i))$, where each variable $xi$ is conditional only on its parent variables $xparent(i)$ (if it is empty, $P(xi\u2223xparent(i))$ means a prior probability distribution $P(xi)$). However, it is typically difficult to detect an appropriate conditional dependency between the variables, which is represented as a directed acyclic graph (DAG), in advance. Moreover, it is difficult to compute Eq. (4) efficiently even if a DAG is given in advance.

## 3 High-Order Entropy Measures Based on a Fixed-Order Markov Model

We propose two high-order entropy measures based on a fixed-order Markov model to measure population diversity of GAs for the TSP.

### 3.1 High-Order Entropy Measure $Hm$

In information theory, $Hm$ is known as the entropy rate of an $m$-th--order Markov information source modeled by the conditional probability distributions $P(sm+1\u2223s1,\u2026,sm)$, where the entropy rate of a data source is defined as the average information per symbol obtained from the data source. A central theorem of information theory states that the entropy rate of a data source indicates the average number of bits per symbol required to encode it. Therefore, the existence of the same sequence consisting of up to $m+1$ vertices in the population decreases the value of $Hm$; this effect is more prominent when the length of the high-frequency sequences increases.

As the value of $m$ is increased, $Hm$ captures higher-order dependencies in the sequences of symbols included in the population. In this sense, we want to increase the value of $m$. However, $Hm$ is of no value if the value of $m$ is overly large because it is unlikely to obtain a sufficient number of samples (sequences of symbols) from the population necessary to estimate the conditional probability distributions $P(si+m\u2223si,\u2026,si+m-1)=P(s1,\u2026,sm+1)P(s1,\u2026,sm)=N(s1,\u2026,sm+1)N(s1,\u2026,sm)$ for computing $Hm$; the estimated conditional probability distribution is unreliable if the number of samples of the denominator is small. Therefore, there is a tradeoff between the potential ability to capture higher-order dependencies and the estimate accuracy of the conditional probability distributions, and we need to determine an appropriate value of $m$.

### 3.2 High-Order Entropy Measure $Hmadj$

One might think that $Hm+1\xaf$ can also be used as a population diversity measure. This is equivalent to the entropy of the probability distribution $P(s1,\u2026,sm+1)$ defined under the assumption that blocks of $m+1$ consecutive symbols $s1,\u2026,sm+1$ appear in the population according to this probability distribution and that these occurrences are independent of each other. However, these occurrences are actually correlated and the definition of $Hm+1\xaf$ neglects such dependencies. In information theory, $Hm+1\xaf$ is known as the entropy of the *adjoint source* of the $(m+1)$-th *extension* of the original Markov source. From this point onward, we call $Hm+1\xaf$ the high-order entropy measure $Hmadj$, to distinguish it from $Hm$.

## 4 High-Order Entropy Measure Based on a Variable-Order Markov Model

We propose a high-order entropy measure based on a variable-order Markov model to measure population diversity of GAs for the TSP.

### 4.1 Motivation

To capture higher-order dependencies without suffering from a lack of sufficient statistics, we model the probability of the occurrence of symbols (vertices) appearing in individuals in the population as a variable-order Markov process. The variable-order Markov model was first suggested in Rissanen (1983) for data compression purposes. Later variants of variable-order Markov models have been successfully applied to areas such as statistical analysis, classification, and prediction (Shmilovici and Ben-Gal, 2007; Begleiter et al., 2004; Ben-Gal et al., 2003). In a variable-order Markov process, the probability distribution of observing the next symbol $s0$ depends on the preceding symbols of variable length $k$. The basic idea is to determine the value of $k$ adaptively such that the number of samples $N(s-k,\u2026,s-1)$ is a sufficient statistic for estimating the conditional probability distribution $P(s0\u2223s-k,\u2026,s-1)=N(s-k,\u2026,s-1,s0)N(s-k,\u2026,s-1)$.

### 4.2 A High-Order Entropy Measure $Hmvari$

*context tree*(Rissanen, 1983) is useful. Let $sc\u02dc$ be the reverse sequence of $sc$ and we define $S\u02dc={sc\u02dc|sc\u2208S}$. Then, the elements of $S\u02dc$ are represented as the leaf nodes of a context tree $S\u02dc$ (we use this symbol to refer to the context tree as well) as illustrated in Figure 3, where each number represents the number of the corresponding sequence existing in the population. Note that a symbol “$#$” means any symbol other than the symbols of its sibling nodes. For a given sequence ${s-m,\u2026,s-2,s-1}$ ($s0$ is observed next), the conditioning part is determined by tracing the context tree $S\u02dc$ to the maximum extent possible from top to bottom according to $s-1,s-2,\u2026,s-m$. For example, if ${s-3,s-2,s-1}={h,c,a}(m=3)$, the conditioning part is determined as ${s-2,s-1}={c,a}$ and the conditional probability distribution of observing the next symbol $s0$ is given by $P(S0=s0|S-2=c,S-1=a)$ or $P(s0|c,a)$ for short. In another example, if ${s-3,s-2,s-1}={d,b,a}$, the conditioning part is determined as ${s-2,s-1}={#,a}$ (“$#$” means any symbol other than $c$ and $f$) and the conditional probability distribution of observing the next symbol $s0$ is given by $P(S0=s0|S-2=#,S-1=a)$ or $P(s0|#,a)$. This conditional probability distribution is estimated by $P(#,a,s0)P(#,a)=P(a,s0)-P(c,a,s0)-P(f,a,s0)P(a)-P(c,a)-P(f,a)=N(a,s0)-N(c,a,s0)-N(f,a,s0)N(a)-N(c,a)-N(f,a)$.

Next, we describe the method to determine $S\u02dc$ (and equivalently $S$), which is essentially equivalent to the learning algorithms used in Rissanen (1983), Ron et al. (1996), Mächler and Bühlmann (2004), and Schulz et al. (2008). Let $N\u02dc(s-1,\u2026,s-k)$ be the number of the reverse sequence of symbols ${s-k,\u2026,s-1}$ in the population, i.e., $N\u02dc(s-1,\u2026,s-k)=N(s-k,\u2026,s-1)$. As with tree $T$, let $T\u02dc$ be a tree for storing the values of $N\u02dc(s-1,\u2026,s-k)(k\u2264m)$. Figure 4 illustrates an example of the tree $T\u02dc$, which corresponds to $T$ presented in Figure 2 (e.g., $N(a,f,b)=N\u02dc(b,f,a)=4$). Note that in the case of the STSP, $T$ = $T\u02dc$ (except for the maximum depth of the tree), but the displayed example shows the case of the ATSP (because it is easier to understand). For a current population, the context tree $S\u02dc$ is constructed by the following procedure, where $ratio$ is a parameter taking a value between zero and one.

**Construction of context tree $S\u02dc$**

$S\u02dc$ is initialized as the perfect tree of depth one, i.e., $S\u02dc={s-1|s-1\u2208L}$.

For each of the leaf nodes ${s-1,\u2026,s-k}\u2208S\u02dc$ with $k<m$, if there exists a symbol(s) $s-(k+1)'\u2208L$ such that $ratio\xd7Np\u2264N\u02dc(s-1,\u2026,s-k,s-(k+1)')$, this node is expanded to generate a new leaf node(s) ${s-1,\u2026,s-k,s-(k+1)'}$. Expansions of the leaf nodes are iterated until no further expansion is possible.

For every node that is already expanded, generate a child node with a symbol “$#$”.

According to the above procedure, the context tree $S\u02dc$ presented in Figure 3 is obtained from $T\u02dc$ presented in Figure 4 (if $ratio\xd7Np=8$).

The aim behind the expansion of a leaf node ${s-1,\u2026,s-k}$ (Step 2 of the above procedure) is to capture the higher-order dependency expressed as the conditional probability distribution $P(s0|s-(k+1)',s-k,\u2026,s-1)$ only when it is judged to have a sufficient statistic for estimating this conditional probability distribution. The parameter $ratio$ balances the tradeoff between the potential ability to capture higher-order dependencies and the estimate accuracy of the conditional probability distributions.

In Rissanen (1983), Ron et al. (1996), Mächler and Bühlmann (2004), and Schulz et al. (2008), the context tree constructed by the above algorithm is then pruned based on the magnitude of the effect on the stochastic model. However, we do not use this pruning procedure because we confirmed in a preliminary experiment that the performance of the GA deteriorated by introducing this pruning procedure.

## 5 GA Framework

To evaluate the ability of the proposed population diversity measures $Hm$, $Hmadj$, and $Hmvari$, we perform the GA proposed in Nagata and Kobayashi (2013) using each of the three types of the population diversity measures with different values of $m$ and $ratio$. This GA is one of the most effective heuristic algorithms for the TSP. One important factor for achieving top performance is to maintain population diversity by evaluating offspring solutions based on the change in population diversity when they are selected to survive in the population as well as the tour length. The independent entropy measure $Hind$ was originally used for evaluating population diversity.

Algorithm 1 depicts the GA framework. The population consists of $Np$ individuals. The initial population is generated by a greedy local search algorithm with the *2-opt* neighborhood (Line 1). At each generation (Lines 3–8) of the GA, each of the population members is selected, once as parent $pA$ and once as parent $pB$, in random order (Lines 3 and 5). For each pair of parents, edge assembly crossover (EAX) operator generates the $Nch$ (e.g., 30) offspring solutions (Line 6). Then, a best solution is selected from the generated offspring solutions and $pA$ in terms of a given evaluation function, and the selected individual replaces the population member selected as $pA$ (Line 7). Therefore, no replacement occurs if all offspring solutions are worse than $pA$. Note that only parent $pA$ is replaced to better maintain population diversity because EAX typically generates offspring solutions similar to $pA$. Iterations of generation are repeated until a termination condition is achieved (Line 9).

^{1}

For every offspring solution $y$, we must compute $\Delta L(y)$ and $\Delta H(y)$ to obtain the value of $Eval(y)$. However, the computational cost of $\Delta H(y)$ for $Hm$, $Hmadj$, and $Hmvari$ is not negligible, especially for large values of $m$. Although this problem is partially alleviated by computing $\Delta H(y)$ only when $\Delta L(y)\u22640$, we require an efficient algorithm for computing $\Delta H(y)$. An outline of the efficient computation of $\Delta H(y)$ is presented in the Appendix.

When the population diversity measure $Hmvari$ is used, the context tree $S\u02dc$ should be updated each time $xr(i)$ is replaced. However, we decided to update the context tree at the beginning of each generation (before Line 3) because we confirmed that the results did not change significantly using this update method and an efficient algorithm for the immediate update of the context tree is complicated (although it can be implemented).

## 6 Experimental Results

We now present experimental results to analyze the ability of the proposed high-order entropy measures. The GA was implemented in C++ on Ubuntu 14.04 and the program code was executed on PCs with Intel Core i7-4790 CPU/3.60 GHz processor.

### 6.1 Experimental Settings

To investigate the ability of the three types of high-order entropy measures $Hm$, $Hmadj$, and $Hmvari$, we performed the GA described in Section 5 using each of the population diversity measures in the evaluation function (14). We used the default configuration of the GA except for the population diversity measure ($Hind$ was used in the original GA), where $Np=300$ and $Nch=30$ in the default configuration (Nagata and Kobayashi, 2013). In addition, we tested $Np=50$ to investigate the effect of the population size. We also tested the GA using either the pairwise Hamming distance $D$ (Eq. (3)) or no population diversity measure (set $T=0$ in the evaluation function) to evaluate the baseline ability of $Hind$. The population diversity measures tested are summarized as follows:

Greedy (no population diversity measure), $D$, $Hind$

$Hm(m=2,3,4,5,6,8)$

$Hmadj(m=2,3,4,5,6,8)$

$Hmvari(m=6;ratio=0.02,0.05,0.1,0.2,0.4)$

Note that $H1$ and $H1adj$ are equivalent to $Hind$. For $Hmvari$, the value of $m$ was set to six, and we show the results of $H6vari$ with different values of the parameter $ratio$ because the amount of data for possible combinations of $m$ and $ratio$ is very large.

For each population diversity measure, we performed 30 independent runs of the GA on instances in the following three well-known benchmark sets for the STSP, where we selected all 54 instances of sizes ranging from 4,000 to 30,000.^{2}

**TSPLIB**A most widely-used collection of TSP instances drawn from industrial applications and from geographic problems featuring the locations of cities on maps (available at http://comopt.ifi.uni-heidelberg.de/software/TSPLIB95/).**National TSPs**A collection of TSP instances that are based on the real-world locations of cities in selected countries (available at http://www.math.uwaterloo.ca/tsp/vlsi/index.html).**VLSI TSPs**A collection of TSP instances that are based on VLSI circuit design (available at http://www.math.uwaterloo.ca/tsp/vlsi/index.html).

### 6.2 Results

Table 1 lists the results of the GA using different population diversity measures under the population size $Np=300$, where detailed results only for some selected population diversity measures including $Greedy$ and $Hind$ are presented to avoid displaying a huge amount of data; for each of $Hm$, $Hmadj$, and $H6vari$, the parameter value of $m$ or $ratio(=0.1)$ that achieved relatively good results is selected. Each line presents the instance name (instance), the optimal or best-known solution (Opt. or UB), the number of runs that succeeded in finding the optimal or best-known solution (Su), and the average error of the best tour lengths in the 30 runs from the optimal or best-known solutions (A-Err), where one unit in the column A-Err is $10-5$%. Full results for all population diversity measures are provided in the online supplementary file, available at https://www.mitpressjournals.org/doi/suppl/10.1162/evco_a_00268.

. | . | $Greedy$ . | $Hind$ . | $H3$ . | $H6adj$ . | $H6vari$ . | |||||
---|---|---|---|---|---|---|---|---|---|---|---|

Instance . | Opt.(UB) . | Su . | A-Err . | Su . | A-Err . | Su . | A-Err . | Su . | A-Err . | Su . | A-Err . |

fnl4461 | 182566 | 0 | 858^{†} | 30 | 0 | 30 | 0 | 30 | 0 | 30 | 0 |

rl5915 | 565530 | 2 | 1294^{†} | 30 | 0 | 30 | 0 | 30 | 0 | 23 | 82^{†} |

rl5934 | 556045 | 2 | 4130^{†} | 23 | 342 | 28 | 64^{*} | 26 | 164 | 27 | 109 |

pla7397 | 23260728 | 0 | 894^{†} | 3 | 251 | 0 | 585^{†} | 0 | 425^{†} | 0 | 502^{†} |

rl11849 | 923288 | 0 | 835^{†} | 24 | 12 | 25 | 12 | 28 | 2 | 28 | 5 |

usa13509 | 19982859 | 0 | 816^{†} | 14 | 18 | 23 | 9^{*} | 16 | 13 | 25 | 7^{*} |

brd14051 | 469385 | 0 | 820^{†} | 29 | 2 | 22 | 15^{†} | 26 | 7 | 24 | 12^{†} |

d15112 | 1573084 | 0 | 512^{†} | 18 | 8 | 14 | 11 | 23 | 4 | 16 | 4 |

d18512 | 645238 | 0 | 747^{†} | 18 | 7 | 21 | 12 | 25 | 3^{*} | 19 | 10 |

ca4663 | 1290319 | 1 | 555^{†} | 27 | 24 | 30 | 0^{*} | 30 | 0^{*} | 30 | 0^{*} |

pm4951 | 114855 | 0 | 4600^{†} | 24 | 87 | 9 | 255^{†} | 21 | 240 | 25 | 142 |

tz6117 | 394718 | 0 | 717^{†} | 12 | 99 | 18 | 72 | 14 | 99 | 19 | 72 |

ar6723 | 837479 | 0 | 1822^{†} | 25 | 19 | 24 | 23 | 23 | 27 | 25 | 19 |

ho7103 | 177092 | 0 | 717^{†} | 24 | 22 | 21 | 39 | 19 | 43 | 11 | 84^{†} |

eg7146 | 172386 | 0 | 446^{†} | 23 | 19 | 21 | 34 | 29 | 3^{*} | 30 | 0^{*} |

ym7663 | 238314 | 0 | 1759^{†} | 29 | 2 | 22 | 288^{†} | 27 | 6 | 28 | 92 |

ei8246 | 206171 | 0 | 994^{†} | 13 | 79 | 28 | 9^{*} | 30 | 0^{*} | 30 | 0^{*} |

ja9847 | 491924 | 0 | 2598^{†} | 6 | 83 | 12 | 43^{*} | 9 | 58^{*} | 15 | 48^{*} |

gr9882 | 300899 | 0 | 614^{†} | 12 | 163 | 13 | 55 | 16 | 46 | 19 | 37^{*} |

kz9976 | 1061881 | 0 | 1283^{†} | 27 | 4 | 29 | 3 | 29 | 1 | 30 | 0^{*} |

fi10639 | 520527 | 0 | 796^{†} | 15 | 33 | 25 | 10^{*} | 29 | 0^{*} | 25 | 10^{*} |

mo14185 | (427377) | 0 | 627^{†} | 13 | 28 | 18 | 18 | 24 | 8^{*} | 21 | 13^{*} |

it16862 | 557315 | 0 | 1065^{†} | 3 | 57 | 9 | 44 | 6 | 20^{*} | 6 | 24^{*} |

vm22775 | 569288 | 0 | 1330^{†} | 0 | 116 | 0 | 138 | 0 | 115 | 1 | 92 |

sw24978 | 855597 | 0 | 1014^{†} | 14 | 29 | 10 | 31 | 15 | 18 | 14 | 30 |

bgb4355 | 12723 | 3 | 4034^{†} | 30 | 0 | 30 | 0 | 30 | 0 | 30 | 0 |

bgd4396 | 13009 | 19 | 486^{†} | 27 | 76 | 25 | 128 | 30 | 0^{*} | 30 | 0^{*} |

frv4410 | 10711 | 9 | 2116^{†} | 30 | 0 | 30 | 0 | 30 | 0 | 30 | 0 |

bgf4475 | 13221 | 7 | 1159^{†} | 25 | 126 | 30 | 0^{*} | 30 | 0^{*} | 30 | 0^{*} |

xqd4966 | 15316 | 1 | 1654^{†} | 30 | 0 | 30 | 0 | 30 | 0 | 30 | 0 |

fqm5087 | 13029 | 7 | 1074^{†} | 30 | 0 | 29 | 25 | 29 | 25 | 30 | 0 |

fea5557 | 15445 | 17 | 841^{†} | 29 | 21 | 30 | 0 | 30 | 0 | 30 | 0 |

xsc6880 | 21535 | 0 | 3451^{†} | 22 | 216 | 27 | 46^{*} | 28 | 30^{*} | 25 | 77 |

bnd7168 | 21834 | 18 | 305^{†} | 30 | 0 | 30 | 0 | 30 | 0 | 30 | 0 |

lap7454 | 19535 | 4 | 784^{†} | 30 | 0 | 30 | 0 | 30 | 0 | 30 | 0 |

ida8197 | 22338 | 1 | 1910^{†} | 29 | 14 | 28 | 29 | 30 | 0 | 30 | 0 |

dga9698 | 27724 | 2 | 1262^{†} | 28 | 36 | 30 | 0 | 30 | 0 | 30 | 0 |

xmc10150 | 28387 | 1 | 1585^{†} | 15 | 258 | 24 | 82^{*} | 30 | 0^{*} | 25 | 58^{*} |

xvb13584 | 37083 | 1 | 1240^{†} | 21 | 80 | 27 | 26^{*} | 27 | 26^{*} | 29 | 8^{*} |

xrb14233 | (45462) | 0 | 1649^{†} | 2 | 461 | 4 | 381^{*} | 9 | 300^{*} | 15 | 227^{*} |

xia16928 | (52850) | 0 | 1255^{†} | 19 | 113 | 19 | 145 | 17 | 145 | 20 | 88 |

pjh17845 | (48092) | 0 | 1919^{†} | 12 | 138 | 19 | 97 | 16 | 97 | 16 | 97 |

frh19289 | (55798) | 1 | 1409^{†} | 27 | 23 | 29 | 5 | 30 | 0^{*} | 30 | 0^{*} |

fnc19402 | (59287) | 0 | 1388^{†} | 19 | 67 | 20 | 73 | 26 | 22^{*} | 21 | 50 |

ido21215 | (63517) | 0 | 1548^{†} | 20 | 104 | 22 | 68 | 28 | 15^{*} | 19 | 78 |

fma21553 | (66527) | 0 | 1332^{†} | 12 | 135 | 22 | 45^{*} | 14 | 95 | 23 | 35^{*} |

lsb22777 | (60977) | 1 | 754^{†} | 19 | 65 | 26 | 21^{*} | 26 | 27^{*} | 29 | 5^{*} |

xrh24104 | (69294) | 0 | 1279^{†} | 28 | 9 | 29 | 4 | 27 | 14 | 30 | 0 |

bbz25234 | (69335) | 0 | 1413^{†} | 23 | 38 | 28 | 14^{*} | 29 | 4^{*} | 28 | 9^{*} |

irx28268 | (72607) | 0 | 1221^{†} | 28 | 9 | 27 | 13 | 26 | 18 | 27 | 13 |

fyg28534 | (78562) | 0 | 1251^{†} | 12 | 106 | 19 | 50^{*} | 18 | 59^{*} | 24 | 29^{*} |

icx28698 | (78088) | 0 | 1494^{†} | 1 | 217 | 4 | 170^{*} | 13 | 93^{*} | 10 | 102^{*} |

boa28924 | (79622) | 0 | 1486^{†} | 4 | 121 | 5 | 117 | 3 | 117 | 5 | 108 |

ird29514 | (80353) | 0 | 2152^{†} | 4 | 215 | 12 | 99^{*} | 14 | 87^{*} | 19 | 58^{*} |

Statistical test | W=0: L=54 | — | W=17: L=4 | W=22: L=1 | W=22: L=4 |

. | . | $Greedy$ . | $Hind$ . | $H3$ . | $H6adj$ . | $H6vari$ . | |||||
---|---|---|---|---|---|---|---|---|---|---|---|

Instance . | Opt.(UB) . | Su . | A-Err . | Su . | A-Err . | Su . | A-Err . | Su . | A-Err . | Su . | A-Err . |

fnl4461 | 182566 | 0 | 858^{†} | 30 | 0 | 30 | 0 | 30 | 0 | 30 | 0 |

rl5915 | 565530 | 2 | 1294^{†} | 30 | 0 | 30 | 0 | 30 | 0 | 23 | 82^{†} |

rl5934 | 556045 | 2 | 4130^{†} | 23 | 342 | 28 | 64^{*} | 26 | 164 | 27 | 109 |

pla7397 | 23260728 | 0 | 894^{†} | 3 | 251 | 0 | 585^{†} | 0 | 425^{†} | 0 | 502^{†} |

rl11849 | 923288 | 0 | 835^{†} | 24 | 12 | 25 | 12 | 28 | 2 | 28 | 5 |

usa13509 | 19982859 | 0 | 816^{†} | 14 | 18 | 23 | 9^{*} | 16 | 13 | 25 | 7^{*} |

brd14051 | 469385 | 0 | 820^{†} | 29 | 2 | 22 | 15^{†} | 26 | 7 | 24 | 12^{†} |

d15112 | 1573084 | 0 | 512^{†} | 18 | 8 | 14 | 11 | 23 | 4 | 16 | 4 |

d18512 | 645238 | 0 | 747^{†} | 18 | 7 | 21 | 12 | 25 | 3^{*} | 19 | 10 |

ca4663 | 1290319 | 1 | 555^{†} | 27 | 24 | 30 | 0^{*} | 30 | 0^{*} | 30 | 0^{*} |

pm4951 | 114855 | 0 | 4600^{†} | 24 | 87 | 9 | 255^{†} | 21 | 240 | 25 | 142 |

tz6117 | 394718 | 0 | 717^{†} | 12 | 99 | 18 | 72 | 14 | 99 | 19 | 72 |

ar6723 | 837479 | 0 | 1822^{†} | 25 | 19 | 24 | 23 | 23 | 27 | 25 | 19 |

ho7103 | 177092 | 0 | 717^{†} | 24 | 22 | 21 | 39 | 19 | 43 | 11 | 84^{†} |

eg7146 | 172386 | 0 | 446^{†} | 23 | 19 | 21 | 34 | 29 | 3^{*} | 30 | 0^{*} |

ym7663 | 238314 | 0 | 1759^{†} | 29 | 2 | 22 | 288^{†} | 27 | 6 | 28 | 92 |

ei8246 | 206171 | 0 | 994^{†} | 13 | 79 | 28 | 9^{*} | 30 | 0^{*} | 30 | 0^{*} |

ja9847 | 491924 | 0 | 2598^{†} | 6 | 83 | 12 | 43^{*} | 9 | 58^{*} | 15 | 48^{*} |

gr9882 | 300899 | 0 | 614^{†} | 12 | 163 | 13 | 55 | 16 | 46 | 19 | 37^{*} |

kz9976 | 1061881 | 0 | 1283^{†} | 27 | 4 | 29 | 3 | 29 | 1 | 30 | 0^{*} |

fi10639 | 520527 | 0 | 796^{†} | 15 | 33 | 25 | 10^{*} | 29 | 0^{*} | 25 | 10^{*} |

mo14185 | (427377) | 0 | 627^{†} | 13 | 28 | 18 | 18 | 24 | 8^{*} | 21 | 13^{*} |

it16862 | 557315 | 0 | 1065^{†} | 3 | 57 | 9 | 44 | 6 | 20^{*} | 6 | 24^{*} |

vm22775 | 569288 | 0 | 1330^{†} | 0 | 116 | 0 | 138 | 0 | 115 | 1 | 92 |

sw24978 | 855597 | 0 | 1014^{†} | 14 | 29 | 10 | 31 | 15 | 18 | 14 | 30 |

bgb4355 | 12723 | 3 | 4034^{†} | 30 | 0 | 30 | 0 | 30 | 0 | 30 | 0 |

bgd4396 | 13009 | 19 | 486^{†} | 27 | 76 | 25 | 128 | 30 | 0^{*} | 30 | 0^{*} |

frv4410 | 10711 | 9 | 2116^{†} | 30 | 0 | 30 | 0 | 30 | 0 | 30 | 0 |

bgf4475 | 13221 | 7 | 1159^{†} | 25 | 126 | 30 | 0^{*} | 30 | 0^{*} | 30 | 0^{*} |

xqd4966 | 15316 | 1 | 1654^{†} | 30 | 0 | 30 | 0 | 30 | 0 | 30 | 0 |

fqm5087 | 13029 | 7 | 1074^{†} | 30 | 0 | 29 | 25 | 29 | 25 | 30 | 0 |

fea5557 | 15445 | 17 | 841^{†} | 29 | 21 | 30 | 0 | 30 | 0 | 30 | 0 |

xsc6880 | 21535 | 0 | 3451^{†} | 22 | 216 | 27 | 46^{*} | 28 | 30^{*} | 25 | 77 |

bnd7168 | 21834 | 18 | 305^{†} | 30 | 0 | 30 | 0 | 30 | 0 | 30 | 0 |

lap7454 | 19535 | 4 | 784^{†} | 30 | 0 | 30 | 0 | 30 | 0 | 30 | 0 |

ida8197 | 22338 | 1 | 1910^{†} | 29 | 14 | 28 | 29 | 30 | 0 | 30 | 0 |

dga9698 | 27724 | 2 | 1262^{†} | 28 | 36 | 30 | 0 | 30 | 0 | 30 | 0 |

xmc10150 | 28387 | 1 | 1585^{†} | 15 | 258 | 24 | 82^{*} | 30 | 0^{*} | 25 | 58^{*} |

xvb13584 | 37083 | 1 | 1240^{†} | 21 | 80 | 27 | 26^{*} | 27 | 26^{*} | 29 | 8^{*} |

xrb14233 | (45462) | 0 | 1649^{†} | 2 | 461 | 4 | 381^{*} | 9 | 300^{*} | 15 | 227^{*} |

xia16928 | (52850) | 0 | 1255^{†} | 19 | 113 | 19 | 145 | 17 | 145 | 20 | 88 |

pjh17845 | (48092) | 0 | 1919^{†} | 12 | 138 | 19 | 97 | 16 | 97 | 16 | 97 |

frh19289 | (55798) | 1 | 1409^{†} | 27 | 23 | 29 | 5 | 30 | 0^{*} | 30 | 0^{*} |

fnc19402 | (59287) | 0 | 1388^{†} | 19 | 67 | 20 | 73 | 26 | 22^{*} | 21 | 50 |

ido21215 | (63517) | 0 | 1548^{†} | 20 | 104 | 22 | 68 | 28 | 15^{*} | 19 | 78 |

fma21553 | (66527) | 0 | 1332^{†} | 12 | 135 | 22 | 45^{*} | 14 | 95 | 23 | 35^{*} |

lsb22777 | (60977) | 1 | 754^{†} | 19 | 65 | 26 | 21^{*} | 26 | 27^{*} | 29 | 5^{*} |

xrh24104 | (69294) | 0 | 1279^{†} | 28 | 9 | 29 | 4 | 27 | 14 | 30 | 0 |

bbz25234 | (69335) | 0 | 1413^{†} | 23 | 38 | 28 | 14^{*} | 29 | 4^{*} | 28 | 9^{*} |

irx28268 | (72607) | 0 | 1221^{†} | 28 | 9 | 27 | 13 | 26 | 18 | 27 | 13 |

fyg28534 | (78562) | 0 | 1251^{†} | 12 | 106 | 19 | 50^{*} | 18 | 59^{*} | 24 | 29^{*} |

icx28698 | (78088) | 0 | 1494^{†} | 1 | 217 | 4 | 170^{*} | 13 | 93^{*} | 10 | 102^{*} |

boa28924 | (79622) | 0 | 1486^{†} | 4 | 121 | 5 | 117 | 3 | 117 | 5 | 108 |

ird29514 | (80353) | 0 | 2152^{†} | 4 | 215 | 12 | 99^{*} | 14 | 87^{*} | 19 | 58^{*} |

Statistical test | W=0: L=54 | — | W=17: L=4 | W=22: L=1 | W=22: L=4 |

For each population diversity measure $H$, we performed the one-sided Wilcoxon rank sum test for the null hypothesis that the median of the distribution of the tour length (of the best solution of each run) obtained with $H$ is greater than that with $Hind$. If the null hypothesis was rejected at a significance level of 0.05, the corresponding value in the column A-Err is indicated by the asterisk (indicating that $H$ was significantly superior to $Hind$). Conversely, if the opposite null hypothesis was rejected, the corresponding value is indicated by the dagger (indicating that $H$ was significantly worse than $Hind$). For each population diversity measure, the numbers of asterisks and daggers are presented in the bottom line (Statistical test) of the table, where “W=a: L=b” indicates that the numbers of asterisks and daggers are $a$ and $b$, respectively.

We performed the same one-sided Wilcoxon rank sum test between $Hind$ and each of all population diversity measures under the population size $Np=50$ and 300, and the summarized results (W and L) are presented in Table 2. Note that when the GA using $Hind$ found optimal (or best-known) solutions in most trials of the 30 runs, we cannot find a statistically significant difference even if the GA using the population diversity measure $H$ finds optimal (or best-known) solutions for all 30 runs. This situation is particularly noticeable when the number of vertices is small (e.g., $n<10,000)$. Therefore, the results in Table 2 are presented separately for all 54 instances, a set of the 27 instances with $n<10,000$, and a set of the 27 instances with $10,000\u2264n$. When the population size is 300, the numbers of instances from which a statistically significant difference can be detected (if the optimal (best-known) solution is found in all runs) are 15 ($n<10,000$) and 24 ($10,000\u2264n$). On the other hand, this situation does not occur when the population size is 50.

. | . | $Np=300$ . | $Np=50$ . | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|

. | . | All . | $n<104$ . | $104\u2264n$ . | All . | $n<104$ . | $104\u2264n$ . | ||||||

Population diversity . | W . | L . | W . | L . | W . | L . | W . | L . | W . | L . | W . | L . | |

$Greedy$ | — | 0 | 54 | 0 | 27 | 0 | 27 | 0 | 54 | 0 | 27 | 0 | 27 |

$D$ | — | 0 | 43 | 0 | 22 | 0 | 21 | 0 | 54 | 0 | 27 | 0 | 27 |

$m=2$ | 11 | 4 | 3 | 2 | 8 | 2 | 11 | 0 | 3 | 0 | 8 | 0 | |

$m=3$ | 17 | 4 | 6 | 3 | 11 | 1 | 6 | 10 | 3 | 3 | 3 | 7 | |

$Hm$ | $m=4$ | 15 | 3 | 7 | 3 | 8 | 0 | 6 | 14 | 5 | 4 | 1 | 10 |

$m=5$ | 13 | 11 | 5 | 8 | 8 | 3 | 3 | 31 | 3 | 10 | 0 | 21 | |

$m=6$ | 14 | 13 | 7 | 6 | 7 | 7 | 1 | 42 | 1 | 15 | 0 | 27 | |

$m=8$ | 9 | 21 | 4 | 7 | 5 | 14 | 0 | 49 | 0 | 22 | 0 | 27 | |

$m=2$ | 11 | 1 | 4 | 0 | 7 | 1 | 13 | 0 | 5 | 0 | 8 | 0 | |

$m=3$ | 17 | 2 | 5 | 1 | 12 | 1 | 26 | 0 | 10 | 0 | 16 | 0 | |

$Hmadj$ | $m=4$ | 23 | 1 | 8 | 1 | 15 | 0 | 29 | 0 | 13 | 0 | 16 | 0 |

$m=5$ | 22 | 1 | 6 | 0 | 16 | 1 | 33 | 0 | 17 | 0 | 16 | 0 | |

$m=6$ | 22 | 1 | 7 | 1 | 15 | 0 | 24 | 0 | 15 | 0 | 9 | 0 | |

$m=8$ | 18 | 4 | 7 | 4 | 11 | 0 | 20 | 1 | 14 | 0 | 6 | 1 | |

$r=0.02$ | 17 | 5 | 8 | 4 | 9 | 1 | 1 | 42 | 1 | 16 | 0 | 26 | |

$r=0.05$ | 20 | 5 | 8 | 4 | 12 | 1 | 3 | 27 | 3 | 7 | 0 | 20 | |

$H6vari$ | $r=0.1$ | 22 | 4 | 8 | 3 | 14 | 1 | 13 | 3 | 8 | 1 | 5 | 2 |

($r=ratio$) | $r=0.2$ | 19 | 5 | 6 | 3 | 13 | 2 | 31 | 2 | 13 | 1 | 18 | 1 |

$r=0.4$ | 17 | 3 | 4 | 2 | 13 | 1 | 21 | 2 | 6 | 2 | 15 | 0 |

. | . | $Np=300$ . | $Np=50$ . | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|

. | . | All . | $n<104$ . | $104\u2264n$ . | All . | $n<104$ . | $104\u2264n$ . | ||||||

Population diversity . | W . | L . | W . | L . | W . | L . | W . | L . | W . | L . | W . | L . | |

$Greedy$ | — | 0 | 54 | 0 | 27 | 0 | 27 | 0 | 54 | 0 | 27 | 0 | 27 |

$D$ | — | 0 | 43 | 0 | 22 | 0 | 21 | 0 | 54 | 0 | 27 | 0 | 27 |

$m=2$ | 11 | 4 | 3 | 2 | 8 | 2 | 11 | 0 | 3 | 0 | 8 | 0 | |

$m=3$ | 17 | 4 | 6 | 3 | 11 | 1 | 6 | 10 | 3 | 3 | 3 | 7 | |

$Hm$ | $m=4$ | 15 | 3 | 7 | 3 | 8 | 0 | 6 | 14 | 5 | 4 | 1 | 10 |

$m=5$ | 13 | 11 | 5 | 8 | 8 | 3 | 3 | 31 | 3 | 10 | 0 | 21 | |

$m=6$ | 14 | 13 | 7 | 6 | 7 | 7 | 1 | 42 | 1 | 15 | 0 | 27 | |

$m=8$ | 9 | 21 | 4 | 7 | 5 | 14 | 0 | 49 | 0 | 22 | 0 | 27 | |

$m=2$ | 11 | 1 | 4 | 0 | 7 | 1 | 13 | 0 | 5 | 0 | 8 | 0 | |

$m=3$ | 17 | 2 | 5 | 1 | 12 | 1 | 26 | 0 | 10 | 0 | 16 | 0 | |

$Hmadj$ | $m=4$ | 23 | 1 | 8 | 1 | 15 | 0 | 29 | 0 | 13 | 0 | 16 | 0 |

$m=5$ | 22 | 1 | 6 | 0 | 16 | 1 | 33 | 0 | 17 | 0 | 16 | 0 | |

$m=6$ | 22 | 1 | 7 | 1 | 15 | 0 | 24 | 0 | 15 | 0 | 9 | 0 | |

$m=8$ | 18 | 4 | 7 | 4 | 11 | 0 | 20 | 1 | 14 | 0 | 6 | 1 | |

$r=0.02$ | 17 | 5 | 8 | 4 | 9 | 1 | 1 | 42 | 1 | 16 | 0 | 26 | |

$r=0.05$ | 20 | 5 | 8 | 4 | 12 | 1 | 3 | 27 | 3 | 7 | 0 | 20 | |

$H6vari$ | $r=0.1$ | 22 | 4 | 8 | 3 | 14 | 1 | 13 | 3 | 8 | 1 | 5 | 2 |

($r=ratio$) | $r=0.2$ | 19 | 5 | 6 | 3 | 13 | 2 | 31 | 2 | 13 | 1 | 18 | 1 |

$r=0.4$ | 17 | 3 | 4 | 2 | 13 | 1 | 21 | 2 | 6 | 2 | 15 | 0 |

Table 2 indicates that the GA using the independent entropy measure $Hind$ clearly outperforms the GA using either no population diversity measure or the pairwise Hamming distance $D$. In the following, we first compare the results of the three types of the high-order entropy measures when the population size is 300.

**Results of $Hm$** Table 2 shows that the results of $Hm$ gradually improves as the value of $m$ increases from one ($H1$ is equivalent to $Hind$) to three or four. For greater values of $m$, however, the results of the statistical test gradually deteriorates with increasing the value of $m$. These results demonstrate that the ability of evaluating population diversity can be improved by considering high-order dependencies between consecutive symbols (vertices) in the population. As we expected, however, there is a tradeoff between the potential ability to capture higher-order dependencies and the estimate accuracy of the conditional probability distributions required for computing $Hm$. The experimental results indicate that $m=3$ or 4 achieves an appropriate tradeoff between these for this population size ($Np=300$).

**Results of $Hmadj$** Table 2 shows that the results of $Hmadj$ gradually improves as the value of $m$ increases from one ($H1adj$ is equivalent to $Hind$) to four, five, or six. However, the result of $H8adj$ is worse than that of $H6adj$, though it is still better than that of $Hind$. As can be predicted (see Section 3.2) and supported by the results, the best value of $m$ for $Hmadj$ is greater than that for $Hm$. More importantly, the best result of $Hmadj$ (obtained with $m=4,5$, or 6) is superior to the best result of $Hm$ (obtained with $m=3$ or 4). This suggests that $Hmadj$ (with an appropriate value of $m$) is a better population diversity measure than $Hm$. We analyze the reason for this in the next subsection.

**Results of $H6vari$** In preliminary experiments for selected instances, better results were obtained when $m=6$ than when $m=4$, but there was no significant difference in results between $m=6$ and $m=8$. Table 2 shows that the GA using $H6vari$ outperforms the GA using $Hind$ for all values of $ratio$ ranging from 0.02 to 0.4, where the best value of $ratio$ is around 0.1. When $H6vari(ratio=0.1)$ is compared with $Hm$ and $Hmadj$, the result of $H6vari(ratio=0.1)$ is better than the results of $H3$ and $H4$ (the best results of $Hm$). This indicates that the variable-order Markov model introduced to estimate $Hm$ (see Section 4.1) works as expected. However, the result of $H6vari(ratio=0.1)$ is slightly worse than the results of $H4adj$, $H5adj$, and $H6adj$ (the best results of $Hmadj$).

**Results on hard instances** The result of $Hind$ is poor (e.g., $Su<5$) for some instances. This is because there is a deceptive structure on these instances, i.e., some edges of the optimal solution are not included in a majority of near-optimal solutions. In general, finding the optimal solution for such instances is more difficult. Table 1 shows that the results of $Hind$ for these instances are more or less improved by using any of the high-order entropy measures $H3$, $H6adj$, and $H6vari$ except for instance pla7397.

**Effect of the population size** Next, we describe the results of the three types of the high-order entropy measures when the population size is 50, where we focus mainly on the effect of the population size. Table 2 shows that the use of $Hm$ improves the result of $Hind$ only when $m=2$, whereas the result of $Hind$ is improved by using $Hm(m=2,3,4,5)$ under the population size of 300. This makes sense because for a smaller population size, it will be more difficult to obtain a sufficient number of samples (sequences of symbols) from the population necessary to estimate the conditional probability distributions for all values of $m$. On the other hand, the superiority of $Hmadj$ over $Hind$ is retained for large values of $m$ even in the small population size ($Np=50$). As for the high-order entropy measure $H6vari$, the best value of $ratio$ is around 0.2, which is greater than that ($=$ 0.1) when $Np=300$. Given the role of $ratio$ in constructing the context tree $S\u02dc$, this is an expected result. Therefore, the value of $ratio$ should be set appropriately depending on the population size.

**Execution time** Next, we discuss the difference in the computation time of the GA when using the different population diversity measures. Table 3 lists the average computation time in seconds for the GA using the different population diversity measures. Results are presented for six selected instances listed in the table to avoid displaying a huge amount of data, where two instances ($10,000<n$) were randomly selected from each of the three benchmark sets. In addition, the bottom line (Ave. ratio) of the table shows the ratio of the execution time to that of the GA using $Hini$ averaged over the six instances. The table shows that the execution time tends to increase as the value of $m$ increases in the population diversity measures $Hm$ and $Hmadj$. This is mainly because of the increase in the computational effort of calculating $\Delta H(y)$ in the evaluation function (14). Note that the execution times of $Hm$ are smaller than those of $Hmadj$ (for the same value of $m$) even though the calculation of $Hmadj$ is simpler than that of $Hm$. The reason for this is that the number of generations of the GA required to complete the search was smaller when $Hm$ was used than when $Hmadj$ was used. As for the population diversity measure $H6vari$ with various values of $ratio$, the results are similar to that of $H6adj$.

. | . | . | . | . | . | $H6vari(ratio)$ . | . | . | ||||
---|---|---|---|---|---|---|---|---|---|---|---|---|

. | . | . | $Hind$ . | $Greedy$ . | $D$ . | (0.02) . | (0.05) . | (0.1) . | (0.2) . | (0.4) . | . | . |

usa13509 | 2429 | 1097 | 2442 | 3458 | 3704 | 3620 | 3475 | 3068 | ||||

d15112 | 3482 | 1932 | 3640 | 5743 | 5543 | 5127 | 4841 | 4284 | ||||

it16862 | 2958 | 1547 | 3009 | 5201 | 4803 | 4600 | 4341 | 4104 | ||||

pjh17845 | 1636 | 947 | 1677 | 2803 | 2685 | 2600 | 2497 | 2351 | ||||

fma21553 | 2053 | 1094 | 2084 | 3518 | 3354 | 3285 | 3154 | 2982 | ||||

sw24978 | 5930 | 3351 | 6142 | 10388 | 9769 | 9386 | 8744 | 8199 | ||||

Ave. ratio | — | 0.53 | 1.02 | 1.69 | 1.61 | 1.55 | 1.47 | 1.36 | ||||

$Hm$ | $Hmadj$ | |||||||||||

$m=2$ | $m=3$ | $m=4$ | $m=5$ | $m=6$ | $m=8$ | $m=2$ | $m=3$ | $m=4$ | $m=5$ | $m=6$ | $m=8$ | |

usa13509 | 2402 | 2352 | 2451 | 2614 | 2726 | 3069 | 2619 | 2717 | 2951 | 3218 | 3564 | 4471 |

d15112 | 3562 | 3550 | 3634 | 3982 | 4161 | 4901 | 3718 | 3762 | 4106 | 4567 | 4967 | 6632 |

it16862 | 3077 | 3087 | 3208 | 3406 | 3720 | 4294 | 3162 | 3290 | 3649 | 4139 | 4614 | 5936 |

pjh17845 | 1675 | 1729 | 1866 | 1969 | 2108 | 2488 | 1759 | 1861 | 2014 | 2257 | 2443 | 3033 |

fma21553 | 2109 | 2133 | 2248 | 2385 | 2582 | 3037 | 2214 | 2294 | 2528 | 2789 | 3072 | 3821 |

sw24978 | 6259 | 6065 | 6433 | 6849 | 7340 | 8382 | 6295 | 6729 | 7504 | 8169 | 9168 | 11600 |

Ave. ratio | 1.03 | 1.03 | 1.08 | 1.15 | 1.23 | 1.42 | 1.07 | 1.12 | 1.23 | 1.36 | 1.50 | 1.90 |

. | . | . | . | . | . | $H6vari(ratio)$ . | . | . | ||||
---|---|---|---|---|---|---|---|---|---|---|---|---|

. | . | . | $Hind$ . | $Greedy$ . | $D$ . | (0.02) . | (0.05) . | (0.1) . | (0.2) . | (0.4) . | . | . |

usa13509 | 2429 | 1097 | 2442 | 3458 | 3704 | 3620 | 3475 | 3068 | ||||

d15112 | 3482 | 1932 | 3640 | 5743 | 5543 | 5127 | 4841 | 4284 | ||||

it16862 | 2958 | 1547 | 3009 | 5201 | 4803 | 4600 | 4341 | 4104 | ||||

pjh17845 | 1636 | 947 | 1677 | 2803 | 2685 | 2600 | 2497 | 2351 | ||||

fma21553 | 2053 | 1094 | 2084 | 3518 | 3354 | 3285 | 3154 | 2982 | ||||

sw24978 | 5930 | 3351 | 6142 | 10388 | 9769 | 9386 | 8744 | 8199 | ||||

Ave. ratio | — | 0.53 | 1.02 | 1.69 | 1.61 | 1.55 | 1.47 | 1.36 | ||||

$Hm$ | $Hmadj$ | |||||||||||

$m=2$ | $m=3$ | $m=4$ | $m=5$ | $m=6$ | $m=8$ | $m=2$ | $m=3$ | $m=4$ | $m=5$ | $m=6$ | $m=8$ | |

usa13509 | 2402 | 2352 | 2451 | 2614 | 2726 | 3069 | 2619 | 2717 | 2951 | 3218 | 3564 | 4471 |

d15112 | 3562 | 3550 | 3634 | 3982 | 4161 | 4901 | 3718 | 3762 | 4106 | 4567 | 4967 | 6632 |

it16862 | 3077 | 3087 | 3208 | 3406 | 3720 | 4294 | 3162 | 3290 | 3649 | 4139 | 4614 | 5936 |

pjh17845 | 1675 | 1729 | 1866 | 1969 | 2108 | 2488 | 1759 | 1861 | 2014 | 2257 | 2443 | 3033 |

fma21553 | 2109 | 2133 | 2248 | 2385 | 2582 | 3037 | 2214 | 2294 | 2528 | 2789 | 3072 | 3821 |

sw24978 | 6259 | 6065 | 6433 | 6849 | 7340 | 8382 | 6295 | 6729 | 7504 | 8169 | 9168 | 11600 |

Ave. ratio | 1.03 | 1.03 | 1.08 | 1.15 | 1.23 | 1.42 | 1.07 | 1.12 | 1.23 | 1.36 | 1.50 | 1.90 |

### 6.3 Analysis

Figure 7 indicates that each value of $H1$, $H3$, $H4$, and $H6$ is maintained at the highest value at each value of $L$ when the same diversity measure is incorporated into the evaluation function (14) of the GA. However, maintaining the value of $Hm$ at the highest level for a small value of $m$ does not necessarily lead to maintaining $Hk(m<k)$ at a high level. For example, if $Hind$ (equivalently $H1$) is incorporated into the evaluation function, the population is evolved such that duplication of sequences of length two in the population is suppressed without considering the increase of duplication of longer sequences. On the other hand, maintaining the value of $Hm$ at the highest level for a large value of $m$ (e.g., $m=6$) leads to “overfitting” of the population specialized in maintaining the value of $Hm$ high. For example, if it is possible to exclude any duplication of sequence of length $m+1$ in the population, such a population is preferred to maintain the value of $Hm$ as high as possible even if the duplication of a certain shorter sequence increases excessively.

To alleviate the overfitting problem of $Hm$ for a large value of $m$, not only the values of $Hm$ but also the values of $Hk(k=1,\u2026,m-1)$ should also be maintained at a high level. The high-order entropy measure $Hmadj$ is suitable for this purpose because $Hmadj$ is equivalent to $H1+\cdots +Hm$, and therefore the population is evolved such that each of the values of $Hk(k\u2264m)$ is maintained high although there is no guarantee that all values are maintained near their highest levels. Fortunately, as can be observed from Figure 7, when $H6adj$ is incorporated into the evaluation function, the values of $Hk(k\u22646)$ are all maintained near their highest level, especially for $3\u2264k$. Therefore, we conclude that this is a reason why the GA with a high-order entropy measure $H6adj$ achieves superior results compared with the GA using each of $Hm(m=1,\u2026,6)$.

## 7 Conclusions

We proposed three types of entropy-based population diversity measures to evaluate population diversity of a GA for the TSP. These measures consider high-order dependencies between variables of individuals in the population (high-order entropy measures). To derive these, we considered dependencies between consecutive variables and assumed that an individual is represented as a circular sequence of symbols, which is well suited for the TSP. Under these conditions, the entropy of the probability distribution of individuals in the population (used as a population diversity measure) is defined as the entropy rate of a Markov process estimated from the sequences of symbols sampled from the population.

The high-order entropy measure $Hm$ is equivalent to the entropy rate of the $m$-th--order Markov process. It has the potential ability to capture dependencies between consecutive variables of length up to $m+1$. We demonstrated that $Hm$ with an appropriate value of $m(=3or4)$ is significantly superior to $Hind$, the commonly used entropy-based population diversity measure that does not consider dependencies between variables, in the ability to evaluate population diversity. Although the high-order entropy measure $Hmadj$ is defined in a somewhat ad hoc manner, it is essentially equivalent to $H1+H2+\cdots +Hm$. It reduces the overfitting problem of $Hm$ for a large value of $m$ (e.g. $m=6$) while considering dependencies between consecutive variables of length up to $m+1$. Consequently, $Hmadj$ with an appropriate value of $m(=4,5,or6)$ further improves $Hm$ in the ability to measure population diversity. The high-order entropy measure $Hmvari$ is equivalent to the entropy rate of the variable-order Markov process. It also reduces the overfitting problem of $Hm$ with an appropriate parameter setting (e.g., $m=6$ and $ratio=0.1$) and improves $Hm$. Overall, the high-order entropy measure $Hmadj$ with an appropriate value of $m(=4,5,or6)$ is the best population diversity measure among all the population diversity measures tested.

We have demonstrated the effectiveness of considering high-order dependencies between variables of individuals in evaluating population diversity at least for the TSP. Development of other high-order entropy measures and their application to other combinatorial optimization problems remains as a future research direction.

## Acknowledgments

This work was supported by JSPS KAKENHI Grant Number 17K00342.

## Notes

^{1}

This strategy is useful because many of offspring solutions generated by EAX improve (or do not change) the tour length of $xr(i)$. Otherwise, offspring solutions should be evaluated simply by $\Delta L(y)-T\Delta H(y)$ while reducing the value of $T$ in the course of the search.

## References

## Appendix: Efficient Computation of $\Delta H(y)$

An efficient computation of $\Delta H(y)$ in the evaluation function (14) is crucial for the execution of the GA using the proposed high-order entropy measures ($Hm$, $Hmadj$, and $Hmvari$). We present an outline of the efficient computation of $\Delta H(y)$ in the ATSP case, which includes the STSP case as a special case (see Section 3.1).

The computation of $\Delta Hm(y)$ and $\Delta Hmadj(y)$ are similar; hence, we only describe the method to compute $\Delta Hm(y)$. For every sequence ${s1,\u2026,sm+1}$ to remove and add, the changes to $N(s1,\u2026,sm)$ and $N(s1,\u2026,sm+1)$ are accumulated in $\Delta N(s1,\u2026,sm)$ and $\Delta N(s1,\u2026,sm+1)$, respectively. We store these values in the tree $T$. Therefore, if an adding sequence ${s1,\u2026,sk}$ does not exist in $T$, we must create a temporal new node of $T$ with $N(s1,\u2026,sk)=0$ to store the necessary data ($T$ must be restored after computing $\Delta Hm(y)$).

Now, $\Delta Hmvari(y)$ is computed by calculating the differences in $NI(sc)$ and $NI+(sc,s0)$ when $pA$ is replaced with $y$. We denote the differences in $NI(sc)$ and $NI+(sc,s0)$ as $\Delta NI(sc)$ and $\Delta NI+(sc,s0)$, respectively. Let $SI$ and $SI+$ be sets of the nodes $sc$ and $(sc,s0)$ that change the values of $NI(sc)$ and $NI+(sc,s0)$, respectively. At the beginning, an initialization procedure is performed as follows: $SI=\u2205$, $SI+=\u2205$, and $\Delta NI(s)=\Delta NI+(s)=0(s\u2208S$). For every sequence ${s1,\u2026,sm+1}$ to remove or add (if $pA$ is replaced with $y$), the following procedure is performed.

Trace $S$ according to the sequence ${s1,\u2026,sm+1}$, and for each sequence ${s1,\u2026,sk}(h\u2264k\u2264m+1)$, perform procedures (2)–(3). The value of $h$ is explained earlier in this appendix.

If ${s1,\u2026,sk}$ is a Class 1 node of $S$, add this sequence to $SI$ and increment (decrement) $\Delta NI(s1,\u2026,sk)$ by one if this sequence is added (removed).

Else if ${s2,\u2026,sk}$ is a Class 2 node of $S$, add ${s2,\u2026,sk}$ to $SI$ and increment (decrement) $\Delta NI(s2,\u2026,sk)$ by one if ${s1,\u2026,sk}$ is added (removed).

If ${s1,\u2026,sk-1}$ is a Class 1 node of $S$, add ${s1,\u2026,sk}$ to $SI+$ and increment (decrement) $\Delta NI+(s1,\u2026,sk)$ by one if ${s1,\u2026,sk}$ is added (removed).

Else if ${s2,\u2026,sk-1}$ is a Class 2 node of $S$, add ${s2,\u2026,sk}$ to $SI+$ and increment (decrement) $\Delta NI+(s2,\u2026,sk)$ by one if ${s1,\u2026,sk}$ is added (removed).

After the parent solution $pA$ is replaced with the selected offspring solution, we need to update the trees $S$ and $S\u02dc$, accordingly. This can be done efficiently, but we omit the details because the procedure is complicated to explain. In the actual implementation, the values of $NI(sc)$ were updated immediately, but the structure of the trees were reconstructed at the beginning of each generation of the GA (see Section 5).