Abstract

Sorting unsigned permutations by reversals is a difficult problem; indeed, it was proved to be NP-hard by Caprara (1997). Because of its high complexity, many approximation algorithms to compute the minimal reversal distance were proposed until reaching the nowadays best-known theoretical ratio of 1.375. In this article, two memetic algorithms to compute the reversal distance are proposed. The first one uses the technique of opposition-based learning leading to an opposition-based memetic algorithm; the second one improves the previous algorithm by applying the heuristic of two breakpoint elimination leading to a hybrid approach. Several experiments were performed with one-hundred randomly generated permutations, single benchmark permutations, and biological permutations. Results of the experiments showed that the proposed OBMA and Hybrid-OBMA algorithms achieve the best results for practical cases, that is, for permutations of length up to 120. Also, Hybrid-OBMA showed to improve the results of OBMA for permutations greater than or equal to 60. The applicability of our proposed algorithms was checked processing permutations based on biological data, in which case OBMA gave the best average results for all instances.

1  Introduction

The problem of sorting permutations by reversals is of great importance to the field of bioinformatics. The reversal distance is used for determining the evolutionary distance between two organisms and thus for building the phylogenetic tree of several organisms. The complexity of the problem can vary depending on the signal type of the sequence involved. If the orientation of the genes in the genome of the organisms is not considered, that is, organisms are modeled by unsigned permutations, it is hard to find an optimal number of reversals to sort the sequence; in fact, this version of the problem was shown to be NP-hard by Caprara (1997), and is known as Sorting Unsigned Permutations by Reversals (SUPBR). In the case that the orientation of the genes is considered, that is, when the organisms are modeled by signed permutations, the problem is known to be in the class P, and is known as Sorting Signed Permutations by Reversals (SSPBR). The algorithmic and algebraic properties of both problems are indeed complex and many problems related with the combinatorics of signed and unsigned permutations remain under intensive investigation (see e.g., Grusea and Labarre, 2013 and de Lima and Ayala-Rincón, 2018).

The SUPBR problem is an optimization problem, where the objective is to minimize the number of reversals to transform one organism into another one. Organisms are represented by their genomes given as their sequences of genes with each different gene represented by a different natural number. Thus, if the orientation of the genes is not considered, organisms are represented as unsigned permutations that can be written as sequences of different naturals. A reversal is an operation that reverses a subsequence of contiguous genes. The reversal distance between two organisms (with the same genes) is the minimum number of reversals to transform their sequences into each other. For instance, Figure 1 shows a sequence of reversals for transforming the Tobacco permutation into the Lobelia fervens permutation (this famous example is taken from Bafna and Pevzner, 1993). The Lobelia fervens chloroplast genes were represented by increasing naturals and the Tobacco chloroplast genes maintain the same natural number representation in their respective ordering in this organism. Notice that there are many other feasible sequences of reversals (not necessarily optimal) for transforming the Tobacco permutation into the Lobelia fervens permutation. The sequence presented represents an optimal solution with only four reversals which is the reversal distance between these two organisms.

Figure 1:

Sequence of reversals for transforming Tobacco into Lobelia fervens (Bafna and Pevzner, 1993).

Figure 1:

Sequence of reversals for transforming Tobacco into Lobelia fervens (Bafna and Pevzner, 1993).

Many algorithms were proposed for the SUPBR problem, such as approximation algorithms and evolutionary algorithms. The most frequently implemented approximation algorithm is the one proposed by Christie (1998) with ratio 1.5, but the best known ratio is 1.375 proposed by Berman et al. (2002). As far as we know, the latter does not have an implementation, so it is only of theoretical interest. The first evolutionary approach is the genetic algorithm (GA) proposed in Auyeung and Abraham (2003). Soncco-Álvarez and Ayala-Rincón (2013) improved this GA by using a hybrid approach, and after that a parallel version was proposed in Soncco-Álvarez et al. (2013) with the best known results for the SUPBR problem. More recently, a memetic algorithm (MA) was proposed in Soncco-Álvarez and Ayala-Rincón (2014).

The main contributions of this article are two new algorithms for solving SUPBR that are based on the MA proposed by Soncco-Álvarez and Ayala-Rincón (2014): the Opposition-Based Memetic Algorithm (OBMA) and its hybrid version (Hybrid-OBMA). The novelty of OBMA is the application of a technique called Opposition-Based Learning (OBL) (Tizhoosh and Ventresca, 2008) in the stages of generation of the initial population and restart of the population. The novelty of Hybrid-OBMA is the inclusion of a heuristic at the initial stage of OBMA. Several experiments were performed using different types of permutations. The experiments using sets of one hundred randomly generated permutations of the same length, showed that OBMA and Hybrid-OBMA have the best results for practical cases, that is, for the sets of permutations of length up to 120. The experiments using benchmarks and biologically-based permutations confirmed that OBMA and Hybrid-OBMA are suitable for practical permutations. All the results of the experiments are supported by statistical tests.

The article is organized as follows. Section 2 gives the formal definitions and terminology, presents the related work and, briefly, the standard GA, Hybrid-GA, MA, and an introduction to OBL. Section 3 presents the new algorithms: OBMA and Hybrid-OBMA, discussing their time complexity (assuming as basic operation a single evaluation of the fitness function). Section 4 presents experiments and results. Section 5 discusses some results and finally, conclusions are given in Section 6.

2  Background

2.1  Definitions and Terminology

Before getting into the central theme of the article it is necessary to give some definitions, most of which are similar to those given by Bafna and Pevzner (1993), Kececioglu and Sankoff (1993), Christie (1998), and Hannenhalli and Pevzner (1995).

Different genes in an organism are represented by natural numbers. The order of the genes of an organism can be interpreted, in string notation, as an unsigned permutationπ=π1,π2,,πn, that is a bijection of the set {1,2,,n} into itself, where n is considered the length of the permutation. In this formal setting, permutations are seen as the elements of the symmetric group Sn, in which a myriad of results from group theory and combinatorics of permutations can be applied.

Example 01. Let π=5,3,2,6,4,1 be an unsigned permutation of length 6, where this sequence of numbers represents the order of the genes of an organism.

Permutations of length n are extended by adding to the beginning and to the end of the permutation, represented as a string, initial and final pivots: for a permutation π, π0=0 and πn=n+1, respectively. A reversal is a special permutation, denoted as ρi..j, for 0<ij<n, that reverses the elements of π at positions within the interval [i,j]:
ρki..j=k,ifk<iork>j,j-(i-k),ifikj.

In this notation, k represents any position of a permutation of length n, and when k is outside of the closed interval [i,j] its value remains the same; otherwise, it changes to j-(i-k). According to this definition, elements of any permutation π in the interval of positions [i,j] are reversed between this interval as action of the reversal, written in functional notation as ρi..jπ, where , as usual, denotes the composition of functions.

Example 02. Let π=0,5,3,2,6,4,1̲,7 be the extended permutation of the previous example. The resulting permutation after applying the reversal ρ4..6 is the following π'=0,5,3,2,1,4,6̲,7. The underlined elements from position 4 to 6 are those where the reversal was applied.

The reversal distance between two permutations π and σ is the minimum number of reversals to transform π into σ. Since this problem is equivalent to transform σ-1π into ι, namely the identity permutation that is the permutation sorted in increasing order (ιk=k, 0kn+1), one can express the reversal distance problem as the problem of finding the reversal distance between a permutation σ and ι. This is the problem SUPBR, mentioned in the introduction.

Example 03. Let ρ4..6,ρ1..4,ρ4..5, be a sequence of 3 reversals applied over the unsigned permutation π of the previous example. Indeed, the length of this sequence is the minimum possible for transforming π into the identity permutation. Thus, the reversal distance for π is 3.
π=0,5,3,2,6,4,1̲,7(Applyρ4..6)π'=0,5,3,2,1̲,4,6,7(Applyρ1..4)π''=0,1,2,3,5,4̲,6,7(Applyρ4..5)i=0,1,2,3,4,5,6,7

Let ij denote the following property |i-j|=1. In an extended permutation π, consecutive elements πi and πj are those such that 0i,jn+1 and ij. Consecutive elements in a permutation π are called adjacent if πiπj and they are said to form a breakpoint if πi¬πj. Note that the only permutation that does not have breakpoints is the identity permutation.

Let ρ be a reversal that transforms π into π' and denote the number of breakpoints of a permutation π as b(π). Then, we have that b(π)-b(π'){-2,-1,0,1,2} and r-reversals are those reversals that eliminate r breakpoints simultaneously, such that r{0,1,2}.

The breakpoint graph (cycle graph) G(π) of the permutation π is an edge-colored graph derived from the adjacencies and breakpoints in π that has n+2 vertices, one for each element of π including its pivots. Two vertices πi and πj are joined by a gray-edge (depicted as dashed lines) if they are not consecutive and πiπj, and are joined by a black-edge (depicted as continuous lines) if they form a breakpoint, that is they are consecutive, but πi¬πj. Thus, for instance, one can observe that the unique permutation without edges is the identity, since it does not have breakpoints and all their consecutive vertices are adjacent.

Example 04. Consider again the extended permutation π=0,5,3,2,6,4,1,7, where Figure 2 shows its corresponding breakpoint graph G(π). For instance, see that the elements 5 and 3 form a breakpoint so there exists a black-edge between them. Besides, elements 5 and 6 have an incident gray edge.

Figure 2:

Breakpoint graph G(π) and a cycle decomposition. Black edges are represented as continuous lines, and gray edges as dashed lines. There are three cycles represented by the colors green, blue, and red.

Figure 2:

Breakpoint graph G(π) and a cycle decomposition. Black edges are represented as continuous lines, and gray edges as dashed lines. There are three cycles represented by the colors green, blue, and red.

A cycle in G(π) is called an alternating cycle if the color of the edges alternate. Cycles in G(π) arise in a natural manner, since each vertex has exactly zero, one or two incident black edges and the same number of gray edges; indeed, for any breakpoint formed by a vertex πi there is a black edge, and a gray edge to a nonconsecutive vertex πj such that πjπi. Thus, having the possibility to build a path entering a vertex using either a black or gray edge, always gives the possibility to leave the vertex using either a gray or black-edge, respectively. As consequence of all that, G(π) can be decomposed into edge-disjoint alternating cycles. Since G(π) is generated from an unsigned permutation there exist many different cycle decompositions. Let c(π) denote the maximum number of cycles in any cycle decomposition of G(π).

Example 05. Figure 2 shows a cycle decomposition of the breakpoint graph of the extended permutation π=0,5,3,2,6,4,1,7. This cycle decomposition consists of three cycles that are colored with green, blue, and red colors. Figure 3 shows the simultaneous elimination of two breakpoints, after applying the 2-reversal ρ4..6 over π.

Figure 3:

Resulting breakpoint graph after applying a 2-reversal (ρ4..6) over the breakpoint graph of Figure 2.

Figure 3:

Resulting breakpoint graph after applying a 2-reversal (ρ4..6) over the breakpoint graph of Figure 2.

Regarding signed permutations (π), the genes are interpreted either as positive +πi or negative -πi elements. Thus, for each unsigned permutation π, choosing either a positive or negative sign for each element, 2n different signed permutations are built, which are called the signed versions of π. A reversal ρi..j over π, unlike the unsigned version, also changes the signs of the elements at positions within the interval [i,j]:
ρki..j=k,ifk<iork>j,-(j-(i-k)),ifikj.

Note that the identity permutation ι consists only of positive elements sorted in increasing order. In this case, the problem of determining the reversal distance between a permutation π and the identity permutation is known as SSPBR.

The notion of breakpoint graph can be extended to signed permutations by transforming a signed permutation into an unsigned permutation in which each positive element +πi is mapped into (2πi-1,2πi) and each negative element -πi into (2πi,2πi-1), and additionally, the initial and final pivots are given by π0=0 and π2n+1=2n+1, respectively. This class of permutations is a subgroup of the group of permutations of size 2n, that is the symmetric group S2n. This transformation leads to permutations whose breakpoint graphs are such that each vertex has at most degree two, that is exactly one black edge and one gray edge (see Figure 4a). Thus, there exists just one cycle decomposition. As consequence, SSPBR is easier to be treated than SUPBR.

Figure 4:

Two signed permutations πa and πb, and their corresponding breakpoints graphs and cycle decompositions.

Figure 4:

Two signed permutations πa and πb, and their corresponding breakpoints graphs and cycle decompositions.

2.2  Related Work

Before the complexity of SUPBR was known, a 2-approximation algorithm was proposed by Kececioglu and Sankoff (1993). Afterwards, this approximation ratio was improved to 1.75 by Bafna and Pevzner (1993), to 1.5 by Christie (1998), and finally to 1.375 by Berman et al. (2002). The last algorithm is of theoretical interest, but its implementation is of great difficulty, while the theoretical 1.5-approximation algorithm has been extensively implemented and used as a control mechanism for comparison of the quality of outputs provided by heuristic algorithms. Indeed, there exists an implementation by Auyeung and Abraham (2003) (as proposed by Christie), namely Au-1.5 implementation, whose results were also used for comparisons in Ghaffarizadeh et al. (2011). Further, theoretical inconsistencies were detected in the presentation of Au-1.5, which were reported and fixed in Soncco-Álvarez and Ayala-Rincón (2012) leading to a new implementation that was also used in Soncco-Álvarez and Ayala-Rincón (2013), and Soncco-Álvarez et al. (2013) (namely, the Fixed-1.5 implementation).

For the case of SSPBR, Hannenhalli and Pevzner (1995) proved that the problem is in P and proposed a polynomial time algorithm of running time O(n4) for finding the sorting sequence of reversals and O(n2) for finding just the reversal distance. After this algorithm was proposed, further improvements were done. Regarding the computation of the sorting sequence, Berman and Hannenhalli (1996) proposed an O(n2α(n)) algorithm, where α(n) is the inverse of the Ackermann's function; then, this complexity was improved to O(n2) by Kaplan et al. (2000). After this work, Bergeron (2001) proposed a simplified presentation of Hannenalli and Pevzner's method, but with a complexity of O(n3). Subsequently, algorithms were proposed with running time complexity O(n3/2log(n)), by Tannier et al. (2007), and O(nlog(n)+kn), by Swenson et al. (2009), where k is the total number of corrections of “unsafe” reversals performed by this algorithm. These corrections are necessary because this algorithm gets stuck since it can apply unsafe reversals generating permutations in which all their elements are positively oriented. As far as we know the last two algorithms are the best known for finding the sorting sequence. Regarding computation of the reversal distance only, Berman and Hannenhalli (1996) improved the complexity to O(nα(n)), and finally Bader et al. (2001) discovered a linear time algorithm.

Moreover, for the case of SUPBR many evolutionary algorithms were proposed. The first of this class was the standard Genetic Algorithm proposed by Auyeung and Abraham (2003), namely Au-GA, in which the search space consists of the 2n signed permutations that are generated from a given unsigned permutation of size n by assigning either a positive or negative sign to each element. This method uses a population size of n2 and for the fitness calculation uses Kaplan et al.'s (2000) algorithm for SSPBR leading to an overall time complexity of O(n5). Then, Ghaffarizadeh et al. (2011) proposed an Evolutionary Algorithm (EA) that uses a population size of nlog(n), and with an overall time complexity of O(n4log2(n)). These two methods were compared with the Au-1.5 implementation, overcoming its results. Also, Ghaffarizadeh's EA was reported to outperform Au-GA up to permutations of length 110. After that, Soncco-Álvarez and Ayala-Rincón (2012) proposed a genetic approach, in which the search space consists of reversals that eliminate zero, one or two breakpoints, that essentially are pairs of contiguous elements in the permutation that are not consecutive; indeed, a breakpoint is a well-known notion in the literature of approximate algorithms for permutation distance problems. In this algorithm the initial population is formed by individuals of length zero, and in each generation their lengths are incremented by adding new genes (reversals) until a valid solution is reached that sorts the input permutation. The complexity of this algorithm is in O(n3log(n)). This is the first approach that was compared with the Fixed-1.5 implementation, which provided better results than the Au-1.5 implementation, and although its results do not outperform those of the Fixed-1.5 implementation it has closer results than all previous approaches. Further, the same authors Soncco-Álvarez and Ayala-Rincón (2013) implemented a version of Au-GA, namely SA-GA, using a population size of just nlog(n) and computing the fitness by Bader et al.'s (2001) algorithm, leading to an overall time complexity of O(n3log(n)). This was the first algorithm that outperforms the results computed by the Fixed-1.5 implementation (and consequently, as well of the Au-GA, Ghaffarizadeh's EA, and the genetic approach in Soncco-Álvarez and Ayala-Rincón (2012)). In the same work an improved GA known as the Hybrid-GA was presented, which is based on SA-GA integrating a preprocessing step that was inspired in the heuristic of elimination of 2-reversals used by several approximation algorithms such as in Bafna and Pevzner (1993) and Christie (1998). These 2-reversals eliminate two breakpoints simultaneously in the breakpoint graph of the initial unsigned permutation until it is not more possible. Results of the Hybrid-GA were reported to outperform those of the SA-GA. Later, it was proposed a parallel version of SA-GA, namely the Parallel-GA, that as far as we know, provides the higher quality results, in Soncco-Álvarez et al. (2013), but using more computational resources since in each generation it works with a population consisting of the number of processors (23) times nlog(n), that is, each processor deals with a different population of size nlog(n).

Recently, the authors proposed a Memetic Algorithm (MA) in Soncco-Álvarez and Ayala-Rincón (2014) which is based on the SA-GA. The contribution of that approach is twofold: (1) the inclusion of a local search as a way to improve the solutions and (2) the inclusion of a new stage for restarting the population when it reaches a degenerate state, that is, when the elements of the population have high similarity.

2.3  Genetic Algorithms (GA)

Auyeung and Abraham (2003) proposed the first standard GA to deal with SUPBR (Au-GA). In this approach the search space is formed by the 2n signed versions of the unsigned permutation to be sorted by the GA. For example: If the unsigned permutation to be sorted is π=3,1,2 then the search space would be made up of the family of eight signed permutations {±3,±1,±2}, where ± stands for either the sign + or -. Note that the permutations π'=-3,+1,-2 and π''=+3,-1,-2 represent two different individuals, in general the individuals in the search space just differ in the signs and not in the elements.

This approach is based mainly on two straightforward observations:

  • A sorting sequence of reversals for any of the signed versions of a given unsigned permutation is also a valid sorting sequence of reversals for the unsigned permutation.

  • One of the sorting sequences of reversals for the signed versions of a given unsigned permutation is an optimal sorting sequence for the unsigned permutation.

Based on the previous observations the fitness function of an individual (signed permutation) is defined as the reversal distance for the SSPBR problem, for which there exists a linear time algorithm (Bader et al., 2001). Thus, the genetic algorithm will try to find the minimum reversal distance in the search space consisting of signed permutations, that hopefully will be the reversal distance for the initial unsigned permutation; otherwise, this distance will be at least a valid number of reversals for sorting the unsigned permutation.

As an example, consider the following two individuals (signed permutations): πa=-2,-5,-1,3,-4,-6 and πb=2,-5,-1,3,4,6 whose cycle decompositions are shown in Figures 4a and 4b, respectively. Note that b(πa)=7, c(πa)=2, b(πb)=5 and c(πb)=1. In this case, the reversal distance is exactly b(π)-c(π),1 that is d(πa)=5 and d(πb)=4; these values represent the fitness of the individuals πa and πb.

The selection of the best individuals for the crossover is performed by sorting all the individuals by their fitness values. The number of selected individuals is defined by the parameter Percentage for selection; so, according to this parameter a percentage of the best individuals are selected for crossover. Similarly, the replacement is performed by replacing the worst individuals of the population, being the number of individuals to be replaced defined by the parameter Percentage for replacement.

The crossover was performed using a single point crossover, that is, a random point is chosen and the signs after that point are swapped between two individuals. For instance, Figures 5a and 5b show the resulting offspring σa and σb, respectively, after applying the crossover operator over permutations πa and πb, where the crossover point is between elements 1 and 3. The reversal distance of the first offspring is d(σa)=3. This new individual has greater chances to be considered in the next generation of the GA since the fitness is improved. The reversal distance of the second offspring is d(σb)=6; thus, this individual has to be ignored in next generations.

Figure 5:

Resulting offspring (σa and σb) after applying the crossover operator over the permutations πa and πb of Figure 4, where the crossover point is between the elements 1 and 3.

Figure 5:

Resulting offspring (σa and σb) after applying the crossover operator over the permutations πa and πb of Figure 4, where the crossover point is between the elements 1 and 3.

The mutation was performed by changing the sign of an element from positive to negative or vice versa. This process is applied for all the elements of an individual with a certain probability. For instance, Figure 6 shows the resulting individual σc after applying the mutation operator over the individual σb=2,-5,-1,3,-4,-6, where the elements 5 and 3 are those with the sign modified. In this case, the reversal distance for σc is 5, which improves the fitness from σb (d(σb)=6).

Figure 6:

Resulting individual (σc) after applying the mutation operator over permutation σb of Figure 5, where the modified elements are 5 and 3.

Figure 6:

Resulting individual (σc) after applying the mutation operator over permutation σb of Figure 5, where the modified elements are 5 and 3.

Following the Auyeung and Abraham's (2003) approach in Au-GA, the standard GA was implemented in Soncco-Álvarez and Ayala-Rincón (2013) (SA-GA), but instead using O(n2) Kaplan et al.'s (2000) algorithm for the fitness calculation. SA-GA uses O(n) Bader et al.'s (2001) algorithm. Also, the counting sort algorithm is used in the selection stage. In all the experiments SA-GA will be used as the implementation of the standard GA. Algorithm 1 presents the pseudocode of the standard GA.

graphic

Also, an improvement over SA-GA was proposed in Soncco-Álvarez and Ayala-Rincón (2013) that involves the inclusion of a preprocessing step in which 2-reversals are applied until no more possible simultaneous elimination of two breakpoints is possible. This heuristic of elimination of 2-reversals was applied by several approximation algorithms such as those introduced in Bafna and Pevzner (1993) and Christie (1998). Algorithm 2 shows the pseudocode of this algorithm, denoted as Hybrid-GA.

graphic

The parameters used by SA-GA and Hybrid-GA are the same and are the following:

  • Crossover probability that describes how often the crossover operation will be performed over two selected parents. This parameter is used at the stage of lines 5 and 6 of Algorithms 1 and 2, respectively.

  • Mutation probability that describes how often the mutation operation will be performed over each element of an individual. This parameter is used at the stage of lines 6 and 7 of Algorithms 1 and 2, respectively.

  • Number of points of crossover that indicates how many places (points) will be used for the crossover operation. This parameter is used at the stage of lines 5 and 6 of Algorithms 1 and 2, respectively.

  • Percentage for selection that indicates how many of the best individuals will be selected for crossover and mutation. This parameter is used at the stage of lines 4 and 5 of Algorithms 1 and 2, respectively.

  • Percentage for replacement that indicates how many of the worst individuals will be replaced by the new offspring. This parameter is used at the stage of lines 8 and 9 of Algorithms 1 and 2, respectively.

The exact values of these parameters and how they were set is discussed in Section 4 (see Table 1).

2.4  Parallel Genetic Algorithm (Parallel-GA)

Soncco-Álvarez et al. (2013) proposed a parallel genetic algorithm, which is based on an independent run model, according to Sudholt (2015) classification. In this model each slave process has its own instance of a GA, and after each generation the best result is sent to a master process which determines the best solution among the slave processes. The population size of each instance of a GA is nlogn, therefore, since there are 23 slave processes, the overall population of the Parallel-GA is 23 times nlogn. Algorithm 3 shows the pseudocode of the Parallel-GA. It is necessary to stress that in this parallelization of the GA there is no migration of individuals among processes.

graphic

2.5  Memetic Algorithms (MA)

Memetic algorithms are optimization techniques that combine evolutionary algorithms with one or more phases of local search, and even may include exact methods and approximation algorithms (Moscato, 1989; Krasnogor et al., 2006); in this regard, MA's showed potential applicability to treat discrete optimization problems (Hao, 2012). The application of local search over an individual involves generating a neighbor promising solution; this process is repeated a fixed number of times. This technique will accelerate the discovery of new good solutions and therefore will improve the overall population (Moscato and Cotta, 2003; Krasnogor et al., 2006).

Here it is important to stress that the memetic algorithms can be considered as a subset of a broader subject known as Memetic Computing (MC) (Chen et al., 2011; Neri and Cotta, 2012). This subject studies algorithmic structures composed of many operators, called memes, that interact and evolve in order to solve optimization problems. An automatic design of MC structures, for continuous optimization problems, was proposed by Caraffini et al. (2014), where the selection of the operators is performed based on an analysis of the features of the problem being addressed.

The authors proposed in Soncco-Álvarez and Ayala-Rincón (2014) a memetic algorithm to deal with SUPBR based on SA-GA, in which the local search procedure was embedded into the algorithm. The SUPBR problem can be seen as a discrete optimization problem, since one is dealing with a search space consisting of signed permutations. The proposed local search procedure is based on the mutation of a random position of a signed permutation, that is, changing the sign of the element. Algorithm 4 presents the pseudocode of the local search process.

graphic

According to the classification proposed in de Oca et al. (2012), the local search (showed in Algorithm 4) can be classified as:

  • Stochastic, because the generation of a new trial individual is done in a random way.

  • Single-solution, because just one solution is processed.

  • Greedy, because a replacement is performed as soon as a solution outperforming the current solution is found.

This local search procedure was applied in the following stages of the MA:

  • Generation of the initial population.

  • Restarting of the population.

Additionally, a new stage of pure local search was included after the breeding cycle. Note that when the population reaches a degenerate state, i.e., individuals with high similarity, it is restarted. As a measure of similarity the Shannon entropy was used as defined in Shannon (1948). Algorithm 5 presents the pseudocode of the MA.

graphic

The MA shares the same parameters used by SA-GA and Hybrid-GA, and further includes the following new parameters:

  • Percentage for local search that indicates how many of the best individuals will be selected for applying the local search operator. This parameter is used at the stage of line 2 of Algorithm 5.

  • Number of local search steps that indicates the number of steps (iterations) of the local search procedure for one permutation. This parameter is used at line 2 of Algorithm 4.

  • Percentage for preservation that indicates how many of the best individuals will be preserved. The remaining part of the population will be restarted. This parameter is used at the stage of line 12 of Algorithm 5.

  • Minimum entropy threshold that indicates the minimum Shannon entropy value that the population can achieve. This parameter is used at line 10 of Algorithm 5.

2.6  The Opposition-Based Learning (OBL)

The intuitive idea behind OBL is that, in the worst case of a searching process, an individual would be in the opposite side of the optimal solution in the search space; thus, applying the concept of opposition would give good results. The opposed point is a central symmetry (also known as point of reflection) where the central point is the middle point of the hyper-rectangle in which the search is performed. Tizhoosh and Ventesca (2008) have defined the concepts of opposite number and the concepts of type-I and type-II opposite points (AlQunaieer et al., 2010).

  • Let x be a real number in the interval [a,b]. The opposite numberx˘ is defined in the following way
    x˘=a+b-x.
    (1)
  • Let P=(x1,x2,,xN) be a N-dimensional point with xi and xi[ai,bi], then a type-I opposite point is defined by P˘=(x1˘,x2˘,,xN˘), where
    xi˘=ai+bi-xii=1,2,,N.
    (2)
  • Let f be an arbitrary RnR function with image in the interval [ymin,ymax]. For every N-dimensional point P=(x1,,xn), the type-II opposite point is defined by
    f˘(x1,,xn)=ymin+ymax-f(x1,,xn).
    (3)

In the context of optimization algorithms while the type-I OBL is based on the opposite points in the search space, the type-II OBL regards the opposite of fitness value. Also, it is important to point out that the type-I opposition requires to know the search space in order to apply a linear definition of opposition, whereas the type-II opposition requires a priori knowledge of the fitness function (Salehinejad et al., 2014). Motivated by these reasons and for the sake of simplicity, this work uses the type-I opposition operator to perform the permutations of a given sequence.

For instance, Figure 7 shows the application of the type-I opposition operator over the signed permutation πa=-2,-5,1,-3,-4,-6. The resulting permutation is πb=2,5,-1,3,4,6, where all its elements have the sign swapped. Also, in this case the fitness (reversal distance) was improved from d(πa)=5 to d(πb)=4.

Figure 7:

A signed permutation πa and its opposite signed permutation πb.

Figure 7:

A signed permutation πa and its opposite signed permutation πb.

Initially, OBL was proposed for extending GAs, Reinforcement Learning and Neural Networks in Tizhoosh (2005). Further, OBL was applied in several optimization algorithms (Xu et al., 2014) such as Differential Evolution (DE), Particle Swarm Optimization, Biogeography-Based Optimization, Harmony Search, Ant Colony System, and Artificial Bee Colony. From these algorithms, DE is the one with more prevalence, maybe due to the results given in the classical paper by Rahnamayan et al. (2008), where DE was improved by the inclusion of OBL in the generation of the initial population and in the generation jumping.

3  The Opposition-Based Learning Integrated with Memetic Algorithms and the Hybrid Approach

3.1  Opposition-Based Memetic Algorithm (OBMA) for SUPBR

As previously mentioned, OBL was applied into the MA for improving the individuals of the initial population and after restarting the population. The OBL was combined with the Local Search as explained below:

  • firstly, OBL is applied to a signed permutation π;

  • then, if the value of the fitness of the permutation π is improved then π is replaced by the permutation generated by OBL;

  • otherwise, the Local Search is applied over the permutation π.

Algorithm 6 shows the application of OBL over a signed permutation.

graphic

It can be observed that OBMA takes advantage of both Local Search for exploiting known solutions and of OBL for exploring the search space. Algorithm 7 presents the pseudocode of OBMA.

graphic

The OBMA shares the same parameters used by SA-GA, Hybrid-GA, and MA, but also includes a new parameter:

  • Number of times fitness invariant, that indicates the maximum number of generations that the best fitness is allowed to stay invariant before restarting the population. This parameter is used at line 15 of Algorithm 7.

3.2  Hybrid-OBMA for SUPBR

As seen in previous work (see Soncco-Álvarez and Ayala-Rincón, 2013) the preprocessing step of applying reversals that eliminate simultaneously two breakpoints gives good results when applied to the standard GA (SA-GA); in the same way, this hybrid approach is applied to OBMA with the aim of improving the quality of the results. Algorithm 8 presents the pseudocode of Hybrid-OBMA, which shares the same parameters as OBMA.

graphic

3.3  Comparison of the Time Complexity of the Algorithms

The analysis of the time complexity of the algorithms was conducted for each generation, and assuming as basic operation a single evaluation of the fitness function. Let p be the population size, and k the number of iterations in the local search.

  • From line 7 of SA-GA (Algorithm 1), the number of individuals of the offspring is bounded by p and for each individual a single evaluation of the fitness function is performed. Then, the time complexity per generation of SA-GA is at most p. The same complexity is shared by Hybrid-GA (Algorithm 2), this time due to line 8.

  • From lines 10 and 13 of MA (Algorithm 5), the number of applications of the local search is bounded by p per each line, and each application of the fitness function is evaluated k times. Then, the time complexity per generation of the MA is at most 2kp.

  • Regarding OBMA (Algorithm 7). From line 11, the number of applications of the local search is bounded by p, and for each application the fitness function is evaluated k times. From line 18, the number of applications of (OBL + local search) is bounded by p, so in the worst scenario there will be one application of the fitness function for OBL and k applications for the local search. Then, the time complexity per generation of OBMA is at most kp+(1+k)p=(2k+1)p. The same complexity is shared by Hybrid-OBMA (Algorithm 8), this time due to lines 12 and 19.

From the previous analysis, clearly the OBMA and Hybrid-OBMA have more evaluations of the fitness function per generation than SA-GA, Hybrid-GA, and MA.

4  Experiments and Results

All the algorithms proposed in this article were implemented in C language and executed in an OSX platform, with an Intel Core I7 processor operating at 3.4 GHz and an Ubuntu Linux platform, with two Intel Xeon E5-2620 processors operating at 2.4 GHz. The sequential algorithms were executed in the former and the parallel in the latter. The source code of the algorithms is available at http://genoma.cic.unb.br.

Algorithm 9 presents the pseudocode of the generator of random unsigned permutations that were used in the first and second experiment.

graphic

The parameters of SA-GA (and Hybrid-GA) are the same as those stated in Soncco-Álvarez and Ayala-Rincón (2013). Also, the parameters for MA are the same as those stated in Soncco-Álvarez and Ayala-Rincón (2014). For OBMA (and Hybrid-OBMA) the parameters were established in the same way as done in the latter reference, through performing a sensitivity analysis in the following way:

  • First, each parameter is converted into a discrete set, that is, {0.1;0.2;;0.9}.

  • Then, the discrete set for each parameter is reduced by running OBMA exhaustively (e.g., one-hundred times). The reduced set of a parameter represents those values that have the best average results for OBMA.

  • Finally, the elements of these reduced sets are permuted in order to find the best configuration of parameters by running OBMA exhaustively (e.g., one-hundred times). The configuration of parameter that has the best average result is chosen as the parameter settings.

Table 1 presents all parameter settings for the algorithms SA-GA, Hybrid-GA, MA, OBMA, and Hybrid-OBMA. Note that Hybrid-GA and Hybrid-OBMA have the same parameters as SA-GA and OBMA, respectively. It is important to stress here that the number of local search steps for OBMA (and Hybrid-OBMA) was chosen to be 3, based on a sensitivity analysis. For more than three steps, the decrements of the number of reversals does not differ significantly regarding the number of reversals obtained using just three steps. Thus, three was the best choice (since increasing the number of steps also increases the number of evaluations of the fitness function).

Table 1:
Parameter settings for SA-GA, MA, and OBMA.
ParameterSA-GAaMAbOBMA
Crossover probability 0.90 0.98 0.98 
Mutation probability 0.02 0.01 0.01 
Num. points of crossover 
% for selection 0.60 0.96 0.96 
% for replacement 0.60 0.60 0.60 
% for local search 0.94 0.94 
# local search steps 
% of preservation 0.98 0.40 
Min. entropy threshold 0.20 0.15 
# of times fitness invariant 14 
ParameterSA-GAaMAbOBMA
Crossover probability 0.90 0.98 0.98 
Mutation probability 0.02 0.01 0.01 
Num. points of crossover 
% for selection 0.60 0.96 0.96 
% for replacement 0.60 0.60 0.60 
% for local search 0.94 0.94 
# local search steps 
% of preservation 0.98 0.40 
Min. entropy threshold 0.20 0.15 
# of times fitness invariant 14 

aParameters taken from Soncco-Álvarez and Ayala-Rincón (2013).

bParameters taken from Soncco-Álvarez and Ayala-Rincón (2014).

Three different kind of inputs were used in the experiments:

  1. Sets of randomly generated permutations of different lengths, each of them consisting of one hundred unsigned permutations of the same length. Algorithm 9 was used for generating these permutations.

  2. Single unsigned permutations previously proposed in the literature as benchmarks.

  3. Permutations built from biological data.

4.1  Experiment Using Sets of One-Hundred Unsigned Permutations

This experiment was performed in a similar way to previous works (e.g., Auyeung and Abraham, 2003; Soncco-Álvarez and Ayala-Rincón, 2012, 2013; Soncco-Álvarez et al., 2013; Soncco-Álvarez and Ayala-Rincón, 2014). The number of individuals and of evaluations of the fitness function were fixed in order to obtain a fair comparison of the performance of the algorithms SA-GA, Hybrid-GA, MA, OBMA, Hybrid-OBMA, and Parallel-GA. Assuming that the length of the input permutation is n, the population size was fixed to nlogn for all algorithms. For the case of the Parallel-GA, the population size was divided by 23, which is the number of slave processes. The configuration of this experiment follows.

  • For each set of one-hundred randomly generated permutations of length i, with i{10,20,,150}:

    • Initially, choose a permutation of length i, execute OBMA for 200 generations and record the number of evaluations of the fitness function. This number of evaluations, say k, will be used as stop criterion for the other algorithms when using as input permutations of length i. Early experiments were performed in order to set this number of 200 generations which were based on convergence behavior after these number of generations.

    • Afterwards, for each permutation in a set:

      • Execute all the algorithms 50 times for the current permutation. In each execution, stop the algorithms when they reach k evaluations of the fitness function.

      • Calculate the average of 50 outputs (number of reversals) for each algorithm. This value, for an algorithm, represents the result for the current permutation.

    • Finally, for all algorithms calculate the average and standard deviation of the results for the one hundred permutations of length i.

Table 2 presents the results of this experiment, where the bold numbers are the best values for each length. From this table, the following can be observed: OBMA has the best values for permutations up to length 50; for permutations of length greater than or equal to 60 until 120, Hybrid-OBMA has the best values; for permutations greater than or equal to 130 Hybrid-GA has the best values, with SA-GA having the second best values.

Table 2:
Average number reversals and standard deviation of the experiment with sets of hundreds unsigned permutations.
SA-GAH.GAMAOBMAH.OBMAP.GA
Len.Avg.S.D.Avg.S.D.Avg.S.D.Avg.S.D.Avg.S.D.Avg.S.D.
10 5.74 0.861 5.77 0.849 5.73 0.863 5.73 0.863 5.76 0.851 5.73 0.863 
20 13.18 1.064 13.22 1.07 13.13 1.054 13.12 1.047 13.17 1.055 13.16 1.068 
30 20.73 1.105 20.73 1.101 20.59 1.091 20.55 1.084 20.59 1.079 20.77 1.084 
40 28.6 1.237 28.61 1.232 28.38 1.211 28.34 1.207 28.38 1.189 28.75 1.247 
50 36.9 1.245 36.89 1.252 36.6 1.214 36.52 1.186 36.53 1.175 37.26 1.268 
60 45.26 1.444 45.26 1.455 44.91 1.448 44.8 1.429 44.78 1.423 45.83 1.377 
70 53.55 1.573 53.54 1.575 53.19 1.571 53.07 1.538 53.03 1.558 54.34 1.514 
80 62.05 1.444 62.04 1.454 61.74 1.432 61.59 1.428 61.55 1.425 63.07 1.377 
90 70.6 1.577 70.56 1.554 70.32 1.593 70.16 1.585 70.1 1.583 71.79 1.509 
100 78.71 1.713 78.66 1.734 78.57 1.74 78.4 1.696 78.32 1.721 80.16 1.561 
110 87.4 1.609 87.37 1.597 87.35 1.633 87.18 1.619 87.05 1.612 88.99 1.464 
120 95.94 1.63 95.91 1.606 96.12 1.602 95.93 1.613 95.81 1.672 97.77 1.488 
130 104.66 1.739 104.62 1.725 104.99 1.742 104.84 1.697 104.7 1.742 106.68 1.557 
140 113.14 2.085 113.1 2.034 113.62 2.097 113.48 2.029 113.36 2.105 115.39 1.835 
150 121.91 1.481 121.86 1.482 122.63 1.523 122.52 1.475 122.34 1.529 124.26 1.342 
SA-GAH.GAMAOBMAH.OBMAP.GA
Len.Avg.S.D.Avg.S.D.Avg.S.D.Avg.S.D.Avg.S.D.Avg.S.D.
10 5.74 0.861 5.77 0.849 5.73 0.863 5.73 0.863 5.76 0.851 5.73 0.863 
20 13.18 1.064 13.22 1.07 13.13 1.054 13.12 1.047 13.17 1.055 13.16 1.068 
30 20.73 1.105 20.73 1.101 20.59 1.091 20.55 1.084 20.59 1.079 20.77 1.084 
40 28.6 1.237 28.61 1.232 28.38 1.211 28.34 1.207 28.38 1.189 28.75 1.247 
50 36.9 1.245 36.89 1.252 36.6 1.214 36.52 1.186 36.53 1.175 37.26 1.268 
60 45.26 1.444 45.26 1.455 44.91 1.448 44.8 1.429 44.78 1.423 45.83 1.377 
70 53.55 1.573 53.54 1.575 53.19 1.571 53.07 1.538 53.03 1.558 54.34 1.514 
80 62.05 1.444 62.04 1.454 61.74 1.432 61.59 1.428 61.55 1.425 63.07 1.377 
90 70.6 1.577 70.56 1.554 70.32 1.593 70.16 1.585 70.1 1.583 71.79 1.509 
100 78.71 1.713 78.66 1.734 78.57 1.74 78.4 1.696 78.32 1.721 80.16 1.561 
110 87.4 1.609 87.37 1.597 87.35 1.633 87.18 1.619 87.05 1.612 88.99 1.464 
120 95.94 1.63 95.91 1.606 96.12 1.602 95.93 1.613 95.81 1.672 97.77 1.488 
130 104.66 1.739 104.62 1.725 104.99 1.742 104.84 1.697 104.7 1.742 106.68 1.557 
140 113.14 2.085 113.1 2.034 113.62 2.097 113.48 2.029 113.36 2.105 115.39 1.835 
150 121.91 1.481 121.86 1.482 122.63 1.523 122.52 1.475 122.34 1.529 124.26 1.342 

4.1.1  Statistical Tests

For the statistical comparison of the performance of the algorithms the following methodology proposed by Demšar (2006) was applied (see also García and Herrera, 2008 and Derrac et al., 2011):

  • First, the Friedman test is used to test the null hypothesis that all the algorithms have the same performance.

  • If the previous test rejects the null hypothesis, it proceeds to perform the Holm test as a post-hoc test. This test considers a control algorithm and compares it with the remaining algorithms in order to determine which one is significantly better.

The CONTROLTEST package (implemented in Java) was used for performing the Friedman and Holm test, which is available at the SCI2S web site http://sci2s.ugr.es/sicidm. In this package the control algorithm is the one with the lowest rank computed in the Friedman test. A significance level of α=0.05 was used for both tests.

The output of each algorithm, for the current experiment, was used as input sample for the Friedman and Holm test. These samples contain 100 elements corresponding to the results (number of reversals) of an algorithm for a set of one-hundred permutations. Each element of a sample of 100 elements was preprocessed by calculating its multiplicative inverse, in order to use the statistical tests, which compare the performance of the algorithms. Thus, while the number of reversal decreases the performance increases.

The results of the statistical tests are the following. The Friedman test rejected the null hypothesis for all permutation lengths, except the samples for permutations of length 10, thus this sample will not be included for the Holm test. Tables 3 and 4 show the results of the Holm test, where the algorithms in bold are those that have statistically significant difference (p-value α/i) with their respective control algorithm.

Table 3:
Results of the Holm test for sets of hundreds of permutations with lengths from 20 to 50.
LengthControl AlgorithmiAlgorithmRankP-valueα/i
  Hybrid-GA 4.52 1.9625E-6 0.01 
  GA 4.26 4.8577E-5 0.0125 
20 OBMA Parallel-GA 3.41 0.0733 0.0167 
 (Rank: 2.73) Hybrid-OBMA 3.21 0.2091 0.025 
  MA 2.86 0.7484 0.05 
  Parallel-GA 4.77 1.4166E-12 0.01 
  GA 4.68 7.8156E-12 0.0125 
30 OBMA Hybrid-GA 4.42 7.8958E-10 0.0167 
 (Rank: 2.12) MA 2.61 0.1903 0.025 
  Hybrid-OBMA 2.40 0.4543 0.05 
  Parallel-GA 5.72 3.5526E-26 0.01 
  GA 4.60 3.1936E-14 0.0125 
40 OBMA Hybrid-GA 4.52 1.6261E-13 0.0167 
 (Rank: 1.76) MA 2.35 0.1148 0.025 
  Hybrid-OBMA 2.05 0.4383 0.05 
  Parallel-GA 5.96 1.4433E-30 0.01 
  GA 4.62 2.5547E-15 0.0125 
50 OBMA Hybrid-GA 4.42 1.6261E-13 0.0167 
 (Rank: 1.66) MA 2.55 0.0174 0.025 
  Hybrid-OBMA 1.79 0.7283 0.05 
LengthControl AlgorithmiAlgorithmRankP-valueα/i
  Hybrid-GA 4.52 1.9625E-6 0.01 
  GA 4.26 4.8577E-5 0.0125 
20 OBMA Parallel-GA 3.41 0.0733 0.0167 
 (Rank: 2.73) Hybrid-OBMA 3.21 0.2091 0.025 
  MA 2.86 0.7484 0.05 
  Parallel-GA 4.77 1.4166E-12 0.01 
  GA 4.68 7.8156E-12 0.0125 
30 OBMA Hybrid-GA 4.42 7.8958E-10 0.0167 
 (Rank: 2.12) MA 2.61 0.1903 0.025 
  Hybrid-OBMA 2.40 0.4543 0.05 
  Parallel-GA 5.72 3.5526E-26 0.01 
  GA 4.60 3.1936E-14 0.0125 
40 OBMA Hybrid-GA 4.52 1.6261E-13 0.0167 
 (Rank: 1.76) MA 2.35 0.1148 0.025 
  Hybrid-OBMA 2.05 0.4383 0.05 
  Parallel-GA 5.96 1.4433E-30 0.01 
  GA 4.62 2.5547E-15 0.0125 
50 OBMA Hybrid-GA 4.42 1.6261E-13 0.0167 
 (Rank: 1.66) MA 2.55 0.0174 0.025 
  Hybrid-OBMA 1.79 0.7283 0.05 
Table 4:
Results of the Holm test for sets of hundreds of permutations with lengths from 60 to 150.
LengthControl AlgorithmiAlgorithmRankP-valueα/i
  Parallel-GA 5.96 7.0047E-34 0.01 
  GA 4.56 4.7793E-17 0.0125 
60 Hybrid-OBMA Hybrid-GA 4.48 2.8813E-16 0.0167 
 (Rank: 1.42) MA 2.83 1.6431E-4 0.025 
  OBMA 1.75 0.3778 0.05 
  Parallel-GA 6.00 7.0046E-35 0.01 
  GA 4.52 5.9974E-17 0.0125 
70 Hybrid-OBMA Hybrid-GA 4.48 1.4767E-16 0.0167 
 (Rank: 1.39) MA 2.81 1.4758E-4 0.025 
  OBMA 1.80 0.2732 0.05 
  Parallel-GA 6.00 1.8592E-33 0.01 
  GA 4.51 6.9560E-16 0.0125 
80 Hybrid-OBMA Hybrid-GA 4.45 2.5547E-15 0.0167 
 (Rank: 1.49) MA 2.96 8.5392E-5 0.025 
  OBMA 1.59 0.7893 0.05 
  Parallel-GA 6.00 2.5706E-33 0.01 
  GA 4.51 8.6553E-16 0.0125 
90 Hybrid-OBMA Hybrid-GA 4.19 6.5111E-13 0.0167 
 (Rank: 1.50) MA 3.08 2.4136E-5 0.025 
  OBMA 1.72 0.5565 0.05 
  Parallel-GA 6.00 5.0268E-35 0.01 
  GA 4.21 3.9239E-14 0.0125 
100 Hybrid-OBMA Hybrid-GA 3.99 3.0474 0.0167 
 (Rank: 1.38) MA 3.44 3.6795E-8 0.025 
  OBMA 1.98 0.1088 0.05 
  Parallel-GA 6.00 3.6409E-34 0.01 
  Hybrid-GA 3.95 1.9696E-11 0.0125 
110 Hybrid-OBMA GA 3.93 2.8368E-11 0.0167 
 (Rank: 1.44) MA 3.52 2.7127E-8 0.025 
  OBMA 2.16 0.0543 0.05 
  Parallel-GA 6.00 7.8536E-25 0.01 
  MA 4.49 4.0029E-10 0.0125 
120 Hybrid-OBMA GA 2.83 0.0692 0.0167 
 (Rank: 2.15) OBMA 2.82 0.0733 0.025 
  Hybrid-GA 2.71 0.1345 0.05 
  Parallel-GA 6.00 4.5017E-25 0.01 
  MA 4.41 1.1043E-9 0.0125 
130 Hybrid-GA OBMA 3.57 1.1881E-4 0.0167 
 (Rank: 2.13) Hybrid-OBMA 2.48 0.3496 0.025 
  GA 2.41 0.4543 0.05 
  Parallel-GA 6.00 5.6861E-31 0.01 
  MA 4.44 1.3301E-13 0.0125 
140 Hybrid-GA OBMA 3.79 1.4622E-8 0.0167 
 (Rank: 1.67) Hybrid-OBMA 3.05 2.2584E-4 0.025 
  GA 2.05 0.3098 0.05 
  Parallel-GA 6.00 5.0519E-34 0.01 
  MA 4.45 1.0762E-15 0.0125 
150 Hybrid-GA OBMA 4.11 1.1676E-12 0.0167 
 (Rank: 1.45) Hybrid-OBMA 3.24 1.7186E-6 0.025 
  GA 1.75 0.4227 0.05 
LengthControl AlgorithmiAlgorithmRankP-valueα/i
  Parallel-GA 5.96 7.0047E-34 0.01 
  GA 4.56 4.7793E-17 0.0125 
60 Hybrid-OBMA Hybrid-GA 4.48 2.8813E-16 0.0167 
 (Rank: 1.42) MA 2.83 1.6431E-4 0.025 
  OBMA 1.75 0.3778 0.05 
  Parallel-GA 6.00 7.0046E-35 0.01 
  GA 4.52 5.9974E-17 0.0125 
70 Hybrid-OBMA Hybrid-GA 4.48 1.4767E-16 0.0167 
 (Rank: 1.39) MA 2.81 1.4758E-4 0.025 
  OBMA 1.80 0.2732 0.05 
  Parallel-GA 6.00 1.8592E-33 0.01 
  GA 4.51 6.9560E-16 0.0125 
80 Hybrid-OBMA Hybrid-GA 4.45 2.5547E-15 0.0167 
 (Rank: 1.49) MA 2.96 8.5392E-5 0.025 
  OBMA 1.59 0.7893 0.05 
  Parallel-GA 6.00 2.5706E-33 0.01 
  GA 4.51 8.6553E-16 0.0125 
90 Hybrid-OBMA Hybrid-GA 4.19 6.5111E-13 0.0167 
 (Rank: 1.50) MA 3.08 2.4136E-5 0.025 
  OBMA 1.72 0.5565 0.05 
  Parallel-GA 6.00 5.0268E-35 0.01 
  GA 4.21 3.9239E-14 0.0125 
100 Hybrid-OBMA Hybrid-GA 3.99 3.0474 0.0167 
 (Rank: 1.38) MA 3.44 3.6795E-8 0.025 
  OBMA 1.98 0.1088 0.05 
  Parallel-GA 6.00 3.6409E-34 0.01 
  Hybrid-GA 3.95 1.9696E-11 0.0125 
110 Hybrid-OBMA GA 3.93 2.8368E-11 0.0167 
 (Rank: 1.44) MA 3.52 2.7127E-8 0.025 
  OBMA 2.16 0.0543 0.05 
  Parallel-GA 6.00 7.8536E-25 0.01 
  MA 4.49 4.0029E-10 0.0125 
120 Hybrid-OBMA GA 2.83 0.0692 0.0167 
 (Rank: 2.15) OBMA 2.82 0.0733 0.025 
  Hybrid-GA 2.71 0.1345 0.05 
  Parallel-GA 6.00 4.5017E-25 0.01 
  MA 4.41 1.1043E-9 0.0125 
130 Hybrid-GA OBMA 3.57 1.1881E-4 0.0167 
 (Rank: 2.13) Hybrid-OBMA 2.48 0.3496 0.025 
  GA 2.41 0.4543 0.05 
  Parallel-GA 6.00 5.6861E-31 0.01 
  MA 4.44 1.3301E-13 0.0125 
140 Hybrid-GA OBMA 3.79 1.4622E-8 0.0167 
 (Rank: 1.67) Hybrid-OBMA 3.05 2.2584E-4 0.025 
  GA 2.05 0.3098 0.05 
  Parallel-GA 6.00 5.0519E-34 0.01 
  MA 4.45 1.0762E-15 0.0125 
150 Hybrid-GA OBMA 4.11 1.1676E-12 0.0167 
 (Rank: 1.45) Hybrid-OBMA 3.24 1.7186E-6 0.025 
  GA 1.75 0.4227 0.05 

From Table 3, the following can be observed: OBMA is the control algorithm for all cases (length 20 to 50), having the minimum ranks. Note that in all cases OBMA does not have statistically significant difference regarding Hybrid-OBMA.

From Table 4, the following can be observed: for permutations of length 60 to 120, Hybrid-OBMA is the control algorithm and does not have statistically significant difference regarding OBMA; for permutations of length 130 to 150, Hybrid-GA is the control algorithm and does not have statistically significant difference regarding GA.

4.2  Experiment Using Single Unsigned Permutations (Benchmarks)

For this experiment, the hardest cases of the benchmark permutations proposed in Soncco-Álvarez and Ayala-Rincón (2014) were taken. These permutations are the following: 1RPL50, 2RPL50, 1RPL100, 2RPL100, 1RPL150, 2RPL150; the numeric suffix after “RPL” stands for the length of the permutation. The comparison was performed with the algorithms SA-GA, Hybrid-GA, MA, OBMA, Hybrid-OBMA, and Parallel-GA. The population size, for all algorithms, was fixed in the same way as in Subsection 4.1. The configuration of this experiment follows.

  • For each benchmark permutation:

    • First, execute the OBMA algorithm for 200 generations and record the number of evaluations of the fitness function. This number, say k, will be used as stop criterion for the other algorithms when using as input the current benchmark permutation.

    • Next, execute all algorithms 50 times for the current permutation. In each execution, stop the algorithm when it reaches k evaluations of the fitness function.

    • Finally, calculate the following measures for the 50 outputs (number of reversals) of each algorithm: best, worst, mean, median, and standard deviation.

Table 5 presents the results of this experiment, where the rows in bold represent the algorithms with the minimum mean value. From this table, the following can be observed: for the benchmarks 1RPL50 and 2RPL50, Hybrid-OBMA and OBMA compute the minimum values for all measures respectively; for the benchmarks 1RPL100 and 2RPL100, Hybrid-OBMA has the minimum values for the measures worst, median, and mean; for the benchmarks 1RPL150 and 2RPL150, Hybrid-GA has the minimum values for the measures best, median, and mean.

Table 5:
Different measures for the results (number of reversals) of the experiment with six benchmark permutations.
Bench.AlgorithmBestWorstMedianMeanStd. Dev
 SA-GA 37 38 37.0 37.18 0.388 
 Hybrid-GA 37 39 37.0 37.28 0.497 
1RPL50 MA 37 38 37.0 37.06 0.24 
 OBMA 37 38 37.0 37.06 0.24 
 Hybrid-OBMA 37 37 37.0 37.0 0.0 
 Parallel-GA 37 39 38.0 37.8 0.452 
 SA-GA 36 39 37.0 37.2 0.756 
 Hybrid-GA 36 39 37.0 37.14 0.756 
2RPL50 MA 36 38 37.0 36.58 0.575 
 OBMA 36 37 36.0 36.28 0.454 
 Hybrid-OBMA 36 38 36.0 36.48 0.544 
 Parallel-GA 36 38 38.0 37.58 0.538 
 SA-GA 78 83 80.0 80.48 1.074 
 Hybrid-GA 79 83 81.0 80.68 0.999 
1RPL100 MA 79 82 81.0 80.46 0.908 
 OBMA 79 83 80.0 80.36 0.776 
 Hybrid-OBMA 79 82 80.0 80.2 0.728 
 Parallel-GA 80 83 82.0 81.72 0.607 
 SA-GA 77 81 79.0 78.9 0.886 
 Hybrid-GA 76 80 79.0 78.62 0.923 
2RPL100 MA 77 80 79.0 78.5 0.735 
 OBMA 77 80 79.0 78.58 0.859 
 Hybrid-OBMA 77 80 78.0 78.3 0.707 
 Parallel-GA 79 81 80.0 80.0 0.606 
 SA-GA 120 125 122.5 122.44 1.128 
 Hybrid-GA 120 126 122.0 122.12 1.409 
1RPL150 MA 121 125 123.0 123.12 1.272 
 OBMA 120 125 123.0 122.82 1.173 
 Hybrid-OBMA 120 125 123.0 122.88 0.982 
 Parallel-GA 121 126 125.0 124.5 0.995 
 SA-GA 121 127 124.0 124.12 1.256 
 Hybrid-GA 121 129 124.0 123.94 1.621 
2RPL150 MA 123 128 125.0 125.14 1.125 
 OBMA 122 127 125.0 124.9 1.199 
 Hybrid-OBMA 122 127 125.0 125.04 1.087 
 Parallel-GA 125 128 127.0 126.56 0.837 
Bench.AlgorithmBestWorstMedianMeanStd. Dev
 SA-GA 37 38 37.0 37.18 0.388 
 Hybrid-GA 37 39 37.0 37.28 0.497 
1RPL50 MA 37 38 37.0 37.06 0.24 
 OBMA 37 38 37.0 37.06 0.24 
 Hybrid-OBMA 37 37 37.0 37.0 0.0 
 Parallel-GA 37 39 38.0 37.8 0.452 
 SA-GA 36 39 37.0 37.2 0.756 
 Hybrid-GA 36 39 37.0 37.14 0.756 
2RPL50 MA 36 38 37.0 36.58 0.575 
 OBMA 36 37 36.0 36.28 0.454 
 Hybrid-OBMA 36 38 36.0 36.48 0.544 
 Parallel-GA 36 38 38.0 37.58 0.538 
 SA-GA 78 83 80.0 80.48 1.074 
 Hybrid-GA 79 83 81.0 80.68 0.999 
1RPL100 MA 79 82 81.0 80.46 0.908 
 OBMA 79 83 80.0 80.36 0.776 
 Hybrid-OBMA 79 82 80.0 80.2 0.728 
 Parallel-GA 80 83 82.0 81.72 0.607 
 SA-GA 77 81 79.0 78.9 0.886 
 Hybrid-GA 76 80 79.0 78.62 0.923 
2RPL100 MA 77 80 79.0 78.5 0.735 
 OBMA 77 80 79.0 78.58 0.859 
 Hybrid-OBMA 77 80 78.0 78.3 0.707 
 Parallel-GA 79 81 80.0 80.0 0.606 
 SA-GA 120 125 122.5 122.44 1.128 
 Hybrid-GA 120 126 122.0 122.12 1.409 
1RPL150 MA 121 125 123.0 123.12 1.272 
 OBMA 120 125 123.0 122.82 1.173 
 Hybrid-OBMA 120 125 123.0 122.88 0.982 
 Parallel-GA 121 126 125.0 124.5 0.995 
 SA-GA 121 127 124.0 124.12 1.256 
 Hybrid-GA 121 129 124.0 123.94 1.621 
2RPL150 MA 123 128 125.0 125.14 1.125 
 OBMA 122 127 125.0 124.9 1.199 
 Hybrid-OBMA 122 127 125.0 125.04 1.087 
 Parallel-GA 125 128 127.0 126.56 0.837 

4.2.1  Statistical Tests

The same methodology previously explained in Subsection 4.1.1 was applied for the statistical comparison. In this case, the input samples contain 50 elements that are the results of 50 executions of an algorithm for a benchmark permutation.

The results of the statistical tests are the following. The Friedman test rejected the null hypothesis that all algorithms have the same performance for all benchmarks. Table 6 shows the results of the Holm test, where the algorithms in bold are those that have statistically significant difference (p-value α/i) with their respective control algorithm.

Table 6:
Results of the Holm test for benchmarks permutations.
LengthControl AlgorithmiAlgorithmRankP-valueα/i
  Parallel-GA 5.16 4.7488E-10 0.01 
  Hybrid-GA 3.62 0.0347 0.0125 
1rpl50 Hybrid-OBMA GA 3.37 0.1490 0.0167 
 (Rank: 2.83) MA 3.01 0.6305 0.025 
  OBMA 3.01 0.6305 0.05 
  Parallel-GA 5.0 1.6261E-13 0.01 
  GA 4.15 3.3134E-7 0.0125 
2rpl50 OBMA Hybrid-GA 4.0 2.5537E-6 0.0167 
 (Rank: 2.24) MA 2.93 0.0652 0.025 
  Hybrid-OBMA 2.68 0.2396 0.05 
  Parallel-GA 5.29 1.6395E-11 0.01 
  Hybrid-GA 3.47 0.0614 0.0125 
1rpl100 Hybrid-OBMA GA 3.29 0.1646 0.0167 
 (Rank: 2.77) MA 3.23 0.2189 0.025 
  OBMA 2.95 0.6305 0.05 
  Parallel-GA 5.52 3.1654E-15 0.01 
  GA 3.57 0.0075 0.0125 
2rpl100 Hybrid-OBMA Hybrid-GA 3.23 0.0777 0.0167 
 (Rank: 2.57) OBMA 3.17 0.1088 0.025 
  MA 2.94 0.3227 0.05 
  Parallel-GA 5.40 1.1287E-14 0.01 
  MA 3.66 0.0021 0.0125 
1rpl150 Hybrid-GA Hybrid-OBMA 3.45 0.012 0.0167 
 (Rank: 2.51) OBMA 3.15 0.0872 0.025 
  GA 2.83 0.3924 0.05 
  Parallel-GA 5.38 3.9194E-15 0.01 
  MA 3.65 0.0012 0.0125 
2rpl150 Hybrid-GA Hybrid-OBMA 3.6 0.0019 0.0167 
 (Rank: 2.44) OBMA 3.47 0.0059 0.025 
  GA 2.46 0.9574 0.05 
LengthControl AlgorithmiAlgorithmRankP-valueα/i
  Parallel-GA 5.16 4.7488E-10 0.01 
  Hybrid-GA 3.62 0.0347 0.0125 
1rpl50 Hybrid-OBMA GA 3.37 0.1490 0.0167 
 (Rank: 2.83) MA 3.01 0.6305 0.025 
  OBMA 3.01 0.6305 0.05 
  Parallel-GA 5.0 1.6261E-13 0.01 
  GA 4.15 3.3134E-7 0.0125 
2rpl50 OBMA Hybrid-GA 4.0 2.5537E-6 0.0167 
 (Rank: 2.24) MA 2.93 0.0652 0.025 
  Hybrid-OBMA 2.68 0.2396 0.05 
  Parallel-GA 5.29 1.6395E-11 0.01 
  Hybrid-GA 3.47 0.0614 0.0125 
1rpl100 Hybrid-OBMA GA 3.29 0.1646 0.0167 
 (Rank: 2.77) MA 3.23 0.2189 0.025 
  OBMA 2.95 0.6305 0.05 
  Parallel-GA 5.52 3.1654E-15 0.01 
  GA 3.57 0.0075 0.0125 
2rpl100 Hybrid-OBMA Hybrid-GA 3.23 0.0777 0.0167 
 (Rank: 2.57) OBMA 3.17 0.1088 0.025 
  MA 2.94 0.3227 0.05 
  Parallel-GA 5.40 1.1287E-14 0.01 
  MA 3.66 0.0021 0.0125 
1rpl150 Hybrid-GA Hybrid-OBMA 3.45 0.012 0.0167 
 (Rank: 2.51) OBMA 3.15 0.0872 0.025 
  GA 2.83 0.3924 0.05 
  Parallel-GA 5.38 3.9194E-15 0.01 
  MA 3.65 0.0012 0.0125 
2rpl150 Hybrid-GA Hybrid-OBMA 3.6 0.0019 0.0167 
 (Rank: 2.44) OBMA 3.47 0.0059 0.025 
  GA 2.46 0.9574 0.05 

From Table 6, the following can be observed: for benchmarks 1RPL50 and 2RPL50, Hybrid-OBMA and OBMA are the control algorithms, and do not have statistically significant difference regarding OBMA and Hybrid-OBMA respectively; for benchmarks 1RPL100 and 2RPL100, Hybrid-OBMA is the control algorithm, and does not have statistically significant difference regarding OBMA; for benchmark 1RPL150 and 2RPL150, Hybrid-GA is the control algorithm, and does not have statistically significant difference regarding GA.

4.3  Experiment Using Permutations Based on Biological Data

Permutations based on the mitochondrial genomes of several organisms, as those used in Kececioglu and Sankoff (1993), were built according to the procedure given in Soncco-Álvarez and Ayala-Rincón (2014). These permutations are built in the following way:

  • First, two genomes A and B are taken.

  • Those genes that are not common to both A and B are deleted.

  • Then, an increasing sequence of naturals is assigned to the genes of the genome B so that this sequence form the identity permutation.

  • Finally, the permutation πA_B is built as the sequence of naturals of genes at genome A, according to the same designation of naturals given for the genes in the genome B.

The permutations used in this experiment are those built based on the mitochondrial genomes of the organisms listed in Table 7 as done in Soncco-Álvarez and Ayala-Rincón (2014). The permutation related to Homo sapiens and Caretta caretta was not included because after the elimination of non-common genes, the generated permutation was the identity, which does not need to be sorted. The comparison was performed with the following algorithms: SA-GA, Hybrid-GA, MA, OBMA, Hybrid-OBMA, and Parallel-GA. The population size was fixed to nlogn for all algorithms, where n is the length of the input permutation. For the case of the Parallel-GA the population size was divided by 23, which is the number of slave processes. The configuration of this experiment follows.

  • For each permutation based on biological data:

    • First, execute OBMA algorithm for 200 generations and record the number of evaluations of the fitness function. This number of evaluations, say k, will be used as stop criterion for the other algorithms when using as input the current benchmark permutation.

    • Second, execute all the algorithms 50 times for the current permutation. In each execution, stop the algorithms when they reach k evaluations of the fitness function.

    • Finally, calculate the following measures for the 50 outputs (number of reversals) of each algorithm: best, worst, mean, median, and standard deviation.

Table 7:
Number of genes of mitochondrial genomes of some organisms.
Scientific NameCommon NameNum.Abbrev.
Homo sapiens Human 37 Hom. 
Drosophila melanogaster Fruit Fly 37 Dro. 
Crocodylus mindorensis Philippine Crocodile 35 Cro. 
Sibon nebulatus Clouded Snake 37 Sib. 
Caretta caretta Loggerhead Sea Turtle 36 Car. 
Scientific NameCommon NameNum.Abbrev.
Homo sapiens Human 37 Hom. 
Drosophila melanogaster Fruit Fly 37 Dro. 
Crocodylus mindorensis Philippine Crocodile 35 Cro. 
Sibon nebulatus Clouded Snake 37 Sib. 
Caretta caretta Loggerhead Sea Turtle 36 Car. 

Table 8 presents the results of this experiment, where the rows in bold represent the algorithms that do not have the minimum values for some measure. From this table, the following results can be observed: for all biological permutations OBMA is the only one to reach the minimum results; for permutations Hom.-Cro., Hom.-Sib., Cro.-Sib., Cro.-Car., and Sib.-Car., all algorithms have the same results.

Table 8:
Different measures for the results (number of reversals) of the experiment with biological permutations.
Perm.AlgorithmBestWorstMedianMeanStd. Dev
 SA-GA 16 17 16.0 16.04 0.198 
 Hybrid-GA 16 16 16.0 16.0 0.0 
Hom.-Dro. MA 16 16 16.0 16.0 0.0 
 OBMA 16 16 16.0 16.0 0.0 
 Hybrid-OBMA 16 16 16.0 16.0 0.0 
 Parallel-GA 16 16 16.0 16.0 0.0 
Hom.-Cro. All algorithms 3.0 3.0 0.0 
Hom.-Sib. All algorithms 2.0 2.0 0.0 
 SA-GA 15 16 15.0 15.02 0.141 
 Hybrid-GA 15 15 15.0 15.0 0.0 
Dro.-Cro. MA 15 15 15.0 15.0 0.0 
 OBMA 15 15 15.0 15.0 0.0 
 Hybrid-OBMA 15 15 15.0 15.0 0.0 
 Parallel-GA 15 15 15.0 15.0 0.0 
 SA-GA 17 18 17.0 17.12 0.328 
 Hybrid-GA 17 18 17.0 17.04 0.198 
Dro.-Sib. MA 17 18 17.0 17.02 0.141 
 OBMA 17 17 17.0 17.0 0.0 
 Hybrid-OBMA 17 17 17.0 17.0 0.0 
 Parallel-GA 17 17 17.0 17.0 0.0 
 SA-GA 16 16 16.0 16.0 0.0 
 Hybrid-GA 16 17 16.0 16.38 0.49 
Dro.-Car. MA 16 16 16.0 16.0 0.0 
 OBMA 16 16 16.0 16.0 0.0 
 Hybrid-OBMA 16 17 16.0 16.36 0.485 
 Parallel-GA 16 16 16.0 16.0 0.0 
Cro.-Sib. All algorithms 5.0 5.0 0.0 
Cro.-Car. All algorithms 3.0 3.0 0.0 
Sib.-Car. All algorithms 2.0 2.0 0.0 
Perm.AlgorithmBestWorstMedianMeanStd. Dev
 SA-GA 16 17 16.0 16.04 0.198 
 Hybrid-GA 16 16 16.0 16.0 0.0 
Hom.-Dro. MA 16 16 16.0 16.0 0.0 
 OBMA 16 16 16.0 16.0 0.0 
 Hybrid-OBMA 16 16 16.0 16.0 0.0 
 Parallel-GA 16 16 16.0 16.0 0.0 
Hom.-Cro. All algorithms 3.0 3.0 0.0 
Hom.-Sib. All algorithms 2.0 2.0 0.0 
 SA-GA 15 16 15.0 15.02 0.141 
 Hybrid-GA 15 15 15.0 15.0 0.0 
Dro.-Cro. MA 15 15 15.0 15.0 0.0 
 OBMA 15 15 15.0 15.0 0.0 
 Hybrid-OBMA 15 15 15.0 15.0 0.0 
 Parallel-GA 15 15 15.0 15.0 0.0 
 SA-GA 17 18 17.0 17.12 0.328 
 Hybrid-GA 17 18 17.0 17.04 0.198 
Dro.-Sib. MA 17 18 17.0 17.02 0.141 
 OBMA 17 17 17.0 17.0 0.0 
 Hybrid-OBMA 17 17 17.0 17.0 0.0 
 Parallel-GA 17 17 17.0 17.0 0.0 
 SA-GA 16 16 16.0 16.0 0.0 
 Hybrid-GA 16 17 16.0 16.38 0.49 
Dro.-Car. MA 16 16 16.0 16.0 0.0 
 OBMA 16 16 16.0 16.0 0.0 
 Hybrid-OBMA 16 17 16.0 16.36 0.485 
 Parallel-GA 16 16 16.0 16.0 0.0 
Cro.-Sib. All algorithms 5.0 5.0 0.0 
Cro.-Car. All algorithms 3.0 3.0 0.0 
Sib.-Car. All algorithms 2.0 2.0 0.0 

Notice that the data provided in Table 8 could be used as a valuable piece of information for constructing the phylogenetic tree among the organisms showed in Table 7 as it is done in Sankoff et al. (1992) using a variant of the reversal distance.

4.3.1  Statistical Tests

The statistical comparison was done using the methodology explained in Subsection 4.1.1. In this case, the input samples contain 50 elements that are the results of 50 executions of an algorithm for a biological permutation.

The results of the statistical tests are the following. The Friedman test rejected the null hypothesis that all algorithms have the same performance just for the permutation Dro.-Cro. Table 9 shows the results of the Holm test where the algorithms in bold are those that have statistically significant difference (p-value α/i) regarding the control algorithm. From this table, the following can be observed: the control algorithm is GA with rank 3.13, but note that Parallel-GA, OBMA, and MA have the same rank, and any of these algorithms could be taken as the control algorithm.

Table 9:
Results of the Holm test for permutations Dro.-Car.
LengthControl AlgorithmiAlgorithmRankP-valueα/i
  Hybrid-GA 4.27 0.0023 0.01 
  Hybrid-OBMA 4.21 0.0039 0.0125 
Dro.-Car. GA MA 3.13 1.0 0.0167 
 (Rank: 3.13) OBMA 3.13 1.0 0.025 
  Parallel-GA 3.13 1.0 0.05 
LengthControl AlgorithmiAlgorithmRankP-valueα/i
  Hybrid-GA 4.27 0.0023 0.01 
  Hybrid-OBMA 4.21 0.0039 0.0125 
Dro.-Car. GA MA 3.13 1.0 0.0167 
 (Rank: 3.13) OBMA 3.13 1.0 0.025 
  Parallel-GA 3.13 1.0 0.05 

Summarizing the results of the statistical tests, the following observations are mostly consistent with the Tables 3, 4, 6, and 9:

  • For permutations of length up to 40: (Hybrid-)OBMA provides good results but not significantly better than other algorithm.

  • For permutations of length in the interval [50110]: (Hybrid-)OBMA is significantly better than other algorithms.

  • For permutations of length [120130]: Hybrid-GA is not worse than (Hybrid-)OBMA.

  • For permutations of length [140150]: Hybrid-GA is better than (Hybrid-)OBMA.

From these results, it can be observed that applying OBL in stages of initializing and restarting population is an effective way of exploring moderate search spaces, that is, those corresponding to permutation of length less than 120. On the other hand, when the search space increases, the effectiveness of applying OBL (when restarting the population) decreases. In this last scenario, using just the mutation operator for exploring the search space is shown to be more successful for SA-GA and Hybrid-GA.

5  Discussion

Other interesting types of permutation that were not included in the section of experiments are the Gollan permutations, which need exactly n-1 reversals to be sorted. The Gollan permutations and their inverses are the permutations with worst case behavior regarding the reversal distance problem (Bafna and Pevzner, 1993). Additional experiments were performed using these permutations by running the algorithms until reaching n-1 reversals. Results of this experiment showed that all algorithms just needed 1 or 2 generation to reach n-1 reversals, therefore, these kind of permutations are instances easier to be solved and are not adequate for performance comparisons.

In Soncco-Álvarez et al. (2013) and Soncco-Álvarez and Ayala-Rincón (2014), it was claimed that the Parallel-GA (based on the independent run model, according to Sudholt (2015)) had the best known results for the SUPBR problem, but the fact that this algorithm uses more resources (23 times the population of SA-GA (nlogn)) clearly indicates an unfair comparison. When the same amount of resources (population size and number of evaluations) are assigned for all algorithms, as previously seen in the experiments using sets of one hundred permutations, the Parallel-GA does not compute the best results. Then, an additional experiment was performed (using the same resources) in order to compare the execution time of SA-GA and Parallel-GA for sets of one hundred permutations. Table 10 shows the results of this experiment, where the Parallel-GA shows a better speed-up for permutation lengths greater than or equal to 50. Thus, as expected, the real contribution of the Parallel-GA is speeding-up the SA-GA.

Table 10:
Average, standard deviation, and speedup of the execution time (in milliseconds) of the GA and Parallel-GA.
SA-GAP.GA
Len.Avg.Std. Dev.Avg.Std. Dev.Speedup
10 71.02 8.688 1258.69 10.101 0.06 
20 279.31 21.104 1323.28 24.963 0.21 
30 641.78 26.525 1374.06 24.622 0.47 
40 1193.57 33.215 1424.4 45.558 0.84 
50 1912.76 43.92 1531.56 73.856 1.25 
60 2820.85 67.228 1647.03 96.263 1.71 
70 3939.69 67.111 1772.48 126.515 2.22 
80 5218.54 81.364 1880.58 158.324 2.77 
90 6706.0 92.378 1979.74 140.465 3.39 
100 8501.2 148.582 2174.99 180.419 3.91 
110 10374.58 141.598 2224.43 97.91 4.66 
120 12600.36 143.978 2470.85 118.27 5.1 
130 14896.29 187.142 2703.62 130.188 5.51 
140 17620.71 212.51 2981.15 171.148 5.91 
150 20523.42 218.488 3231.33 128.509 6.35 
SA-GAP.GA
Len.Avg.Std. Dev.Avg.Std. Dev.Speedup
10 71.02 8.688 1258.69 10.101 0.06 
20 279.31 21.104 1323.28 24.963 0.21 
30 641.78 26.525 1374.06 24.622 0.47 
40 1193.57 33.215 1424.4 45.558 0.84 
50 1912.76 43.92 1531.56 73.856 1.25 
60 2820.85 67.228 1647.03 96.263 1.71 
70 3939.69 67.111 1772.48 126.515 2.22 
80 5218.54 81.364 1880.58 158.324 2.77 
90 6706.0 92.378 1979.74 140.465 3.39 
100 8501.2 148.582 2174.99 180.419 3.91 
110 10374.58 141.598 2224.43 97.91 4.66 
120 12600.36 143.978 2470.85 118.27 5.1 
130 14896.29 187.142 2703.62 130.188 5.51 
140 17620.71 212.51 2981.15 171.148 5.91 
150 20523.42 218.488 3231.33 128.509 6.35 

In the experiment discussed in the previous paragraph, we emphasize the importance of using the same population size for SA-GA and the Parallel-GA, which implies we are using the same sample size of the whole search space for both algorithms. Although using an independent run model is not the best way of using parallel resources, the Parallel-GA showed a better speed-up than SA-GA. In a recent study performed by da Silveira et al. (2017) new communication models were implemented to parallelize SA-GA which include migration policies between processes; the results of the experiments performed in that work showed that one of the new models improved accuracy and speed-up regarding both the SA-GA and the independent run Parallel-GA.

Moreover, the convergence of the algorithms to the minimum results was computed for visualizing the fact that Hybrid-GA has the best results for permutations greater than or equal to 130. Figures 8, 9, and 10 show the converge graphics that were computed for the benchmarks permutations 1RPL50, 1RPL100, and 1RPL150.

Figure 8 shows that for the benchmark 1RPL50, OBMA and Hybrid-OBMA have a convergence to the minimum values (number of reversals) when increasing the number of evaluations. Figure 9 shows that for the benchmark 1RPL100, still OBMA and Hybrid-OBMA have a convergence to the minimum values when increasing the number of evaluations, but this time closer to those values of SA-GA and Hybrid-GA. Figure 10 shows that for the benchmark 1RPL150, SA-GA and Hybrid-GA have a convergence to the minimum values, this time better than OBMA and Hybrid-OBMA. Based on this result we can say that the convergence of OBMA (and Hybrid-OBMA) get worse for bigger permutations (lengths 130), perhaps because the number of evaluations for 200 generations of OBMA is equivalent to approximately 1300–1400 generations of SA-GA. This means that OBMA executes more evaluations of the fitness function than SA-GA per generation. This fact is confirmed by the time complexity analysis, where OBMA executes at most (2k+1)p evaluations and SA-GA executes at most p evaluations. Then, in order for SA-GA perform the same number of evaluations as OBMA, it needs to execute the same number of generations of OBMA times 2k+1, and since k for OBMA is equal to 3, this factor is equal to 7.

Figure 8:

Mean result of 50 executions for the benchmark 1RPL50.

Figure 8:

Mean result of 50 executions for the benchmark 1RPL50.

Figure 9:

Mean result of 50 executions for the benchmark 1RPL100.

Figure 9:

Mean result of 50 executions for the benchmark 1RPL100.

Figure 10:

Mean result of 50 executions for the benchmark 1RPL150.

Figure 10:

Mean result of 50 executions for the benchmark 1RPL150.

6  Conclusion

This article proposed two new memetic algorithms for sorting unsigned permutations by reversals: OBMA and Hybrid-OBMA. The novelty of the OBMA algorithm is that it integrates the memetic algorithm with the concept of opposition-based learning in some stages where local search is applied, namely in the generation of the initial population and when the population is restarted. Besides, the novelty of Hybrid-OBMA is that it introduces a pre-processing phase into the OBMA algorithm, which is a heuristic that applies all possible reversals that eliminate two breakpoints simultaneously. The inclusion of this heuristic yielded an improvement over the results of OBMA, as observed in the experiments with sets of randomly generated permutations of a size greater than or equal to 60.

Experiments were performed using different types of generated permutations; for a fair comparison the same resources were assigned to all algorithms, that is, the same size of population and the same number of evaluations of the fitness function. The experiment using sets of one-hundred randomly generated permutations showed that OBMA and Hybrid-OBMA have the best results for permutations of length up to 120. These instances can be considered practical cases, since the largest number of mitochondrial genes that has been found is 97, which corresponds to the mitochondrial DNA of the protozoan Reclinomonas americana (see Cooper, 2000). For permutations of length 130, the best results were obtained by SA-GA and Hybrid-GA.

The experiment with benchmark permutations confirmed the results observed using sets of hundreds of permutations: OBMA and Hybrid-OBMA are the best choice for these practical cases. For the benchmarks of length 50 (1RPL50 and 2RPL50), Hybrid-OBMA and OBMA showed the best results (in average) respectively. For the benchmarks of length 100 (1RPL100 and 2RPL100) OBMA showed the best results in average. For the benchmarks of length 150, 1RPL150 and 2RPL150---Hybrid-GA (and SA-GA) showed the best results in average.

The experiment with permutations based on biological data showed that for all cases OBMA has the best results, in 5 cases out of 9 all algorithms have the same results, and in 3 cases out of 9 SA-GA does not have the best results in some of the measures.

Based on the results of the experiments we conclude that OBMA and Hybrid-OBMA are algorithms suitable for calculating the reversal distance of unsigned permutations of length up to 120, which can be considered cases of practical interest. And for permutations from 60 to 120, Hybrid-OBMA is the best choice. Thus, since the size of the permutation input is known a priori, the method that best suits our needs can be chosen. Also, the results of the experiments, using the same resources for all algorithms, showed that the Parallel-GA provides better results only when the size of the population is incremented, which should happen for all algorithms. Indeed, the Parallel-GA uses a model of independent run or island model without migration, whose real contribution is restricted to speeding-up its sequential version. A natural next step would be to explore other models of parallelization such as those with migration policies over different parallel interconnection topologies (see Sudholt, 2015) not only for the Parallel-GA but also for parallel versions of OBMA and Hybrid-OBMA.

As a future work, we will include improvements in the stage of local search such as adapting dynamically its computation according to the generations progress; this would make this stage computationally less intensive. Also, we will explore the adaptation into OBMA and Hybrid-OBMA of the type-II opposition, a technique that will use the opposite of the fitness function. It is also relevant to adapt the current approach to deal with other interesting metrics different from the reversal distance, for example, the translocation distance and furthermore, use other interesting permutations based on biological data (such as mitochondrial DNA of organisms) for the construction of phylogenetic trees. Finally, we will explore other interesting approaches such as Spiking Neural Networks which proved to have competitive results for treating combinatorial optimization problems (Zhang et al., 2014).

Acknowledgments

This research was funded by the Brazilian National Council for Scientific and Technological Development (CNPq) under the Brazilian Ministry for Scientific and Technological Development, under a Universal Grant (process number 476952/2013-1) and by the District Federal Research Support Foundation (FAPDF) under grant (process number 193.001.369/2016). During the development of this research, the first author was funded by a Ph.D. scholarship from the Brazilian Coordination for the Improvement of Higher Education Personnel (CAPES) under the Brazilian Ministry for Education, and the third author was partially funded by a CNPq high productivity research grant (process number 307009/2013-0).

Notes

1

This information is consistent with the lower bound d(π)b(π)-c(π) found by Bafna and Pevzner (1993). For the calculation of the reversal distance of signed permutations there exists a more accurate relation d(π)=b(π)-c(π)+h(π)+f(π), found by Hannenhalli and Pevzner (1995), where h(π)0 and f(π){0,1} stands for hurdles and fortresses which are notions that indicate whether a permutation is hard to be sorted.

References

AlQunaieer
,
F.
,
Tizhoosh
,
H.
, and
Rahnamayan
,
S.
(
2010
).
Opposition based computing a survey
. In
Proceedings of the IEEE Conference on Neural Networks
, pp.
1098
7576
.
Auyeung
,
A.
, and
Abraham
,
A.
(
2003
).
Estimating genome reversal distance by genetic algorithm
. In
Congress on Evolutionary Computation
, pp.
1157
1161
.
Bader
,
D. A.
,
Moret
,
B. M. E.
, and
Yan
,
M.
(
2001
).
A linear-time algorithm for computing inversion distance between signed permutations with an experimental study
. In
Workshop on Algorithms and Data Structures
, pp.
365
376
.
Lecture Notes on Computer Science
, Vol.
2125
.
Bafna
,
V.
, and
Pevzner
,
P.
(
1993
).
Genome rearrangements and sorting by reversals
. In
Proceedings of the Foundations of Computer Science
, pp.
148
157
.
Bergeron
,
A.
(
2001
).
A very elementary presentation of the Hannenhalli-Pevzner Theory
, pp.
106
117
.
Lecture Notes in Computer Science
, Vol.
2089
.
Berman
,
P.
, and
Hannenhalli
,
S.
(
1996
).
Fast sorting by reversal
. In
Annual Symposium on Combinatorial Pattern Matching
, pp.
168
185
.
Lecture Notes in Computer Science
, Vol.
1075
.
Berman
,
P.
,
Hannenhalli
,
S.
, and
Karpinski
,
M.
(
2002
).
1.375-approximation algorithm for sorting by reversals
. In
Proceedings of the 10th Annual European Symposium on Algorithms
, pp.
200
210
.
Caprara
,
A.
(
1997
).
Sorting by reversals is difficult
. In
Proceedings of the First Annual International Conference on Computational Molecular Biology
, pp.
75
83
.
Caraffini
,
F.
,
Neri
,
F.
, and
Picinali
,
L.
(
2014
).
An analysis on separability for memetic computing automatic design
.
Information Sciences
,
265:1
22
.
Chen
,
X.
,
Ong
,
Y.-S.
,
Lim
,
M.-H.
, and
Tan
,
K. C.
(
2011
).
A multi-facet survey on memetic computation
.
IEEE Transactions on Evolutionary Computation
,
15
(
5
):
591
607
.
Christie
,
D. A.
(
1998
).
A 3/2-approximation algorithm for sorting by reversals
. In
Proceedings of the Ninth Annual ACM-SIAM Symposium on Discrete Algorithms
, pp.
244
252
.
Cooper
,
G. M.
(
2000
).
Mitochondria
. In
The cell: A molecular approach
. 2nd ed.
Sunderland, MA
:
Sinauer Associates
.
da Silveira
,
L. Â.
,
Soncco-Álvarez
,
J. L.
, and
Ayala-Rincón
,
M.
(
2017
).
Parallel genetic algorithms with sharing of individuals for sorting unsigned genomes by reversals
. In
Congress on Evolutionary Computation
, pp.
741
748
.
de Lima
,
T. A.
, and
Ayala-Rincón
,
M.
(
2018
).
On the average number of reversals needed to sort signed permutations
.
Discrete Applied Mathematics
,
235:59
80
.
de Oca
,
M. A. M.
,
Cotta
,
C.
, and
Neri
,
F.
(
2012
).
Local search
. In
Handbook of memetic algorithms
, pp.
29
41
.
Berlin
:
Springer
.
Demšar
,
J.
(
2006
).
Statistical comparisons of classifiers over multiple data sets
.
The Journal of Machine Learning Research
,
7:1
30
.
Derrac
,
J.
,
García
,
S.
,
Molina
,
D.
, and
Herrera
,
F.
(
2011
).
A practical tutorial on the use of nonparametric statistical tests as a methodology for comparing evolutionary and swarm intelligence algorithms
.
Swarm and Evolutionary Computation
,
1
(
1
):
3
18
.
García
,
S.
, and
Herrera
,
F.
(
2008
).
An extension on statistical comparisons of classifiers over multiple data sets for all pairwise comparisons
.
Journal of Machine Learning Research
,
9:2677
2694
.
Ghaffarizadeh
,
A.
,
Ahmadi
,
K.
, and
Flann
,
N. S.
(
2011
).
Sorting unsigned permutations by reversals using multi-objective evolutionary algorithms with variable size individuals
. In
IEEE Congress of Evolutionary Computation
, pp.
292
295
.
Grusea
,
S.
, and
Labarre
,
A.
(
2013
).
The distribution of cycles in breakpoint graphs of signed permutations
.
Discrete Applied Mathematics
,
161
(
10–11
):
1448
1466
.
Hannenhalli
,
S.
, and
Pevzner
,
P.
(
1995
).
Transforming cabbage into turnip: Polynomial algorithm for sorting signed permutations by reversals
. In
Proceedings of the Twenty-Seventh Annual ACM Symposium on Theory of Computing
, pp.
178
189
.
Hao
,
J.-K.
(
2012
).
Memetic algorithms in discrete optimization
. In
Handbook of memetic algorithms
, pp.
73
94
.
Berlin
:
Springer
.
Kaplan
,
H.
,
Shamir
,
R.
, and
Tarjan
,
R. E.
(
2000
).
A faster and simpler algorithm for sorting signed permutations by reversals
.
SIAM Journal on Computing
,
29
(
3
):
880
892
.
Kececioglu
,
J.
, and
Sankoff
,
D.
(
1993
).
Exact and approximation algorithms for the inversion distance between two chromosomes
, pp.
87
105
.
Lecture Notes in Computer Science
, Vol.
684
.
Krasnogor
,
N.
,
Aragón
,
A.
, and
Pacheco
,
J.
(
2006
).
Memetic algorithms
,
Operations Research/Computer Science Interfaces Series
,
225
248
.
Moscato
,
P.
(
1989
).
On evolution, search, optimization, genetic algorithms and martial arts—Towards memetic algorithms
.
Technical Report. Caltech Concurrent Computation Program, 158-179
.
Moscato
,
P.
, and
Cotta
,
C.
(
2003
).
A gentle introduction to memetic algorithms
.
International Series in Operations Research & Management Science
,
105
144
.
Neri
,
F.
, and
Cotta
,
C.
(
2012
).
Memetic algorithms and memetic computing optimization: A literature review
.
Swarm and Evolutionary Computation
,
2:1
14
.
Rahnamayan
,
S.
,
Tizhoosh
,
H. R.
, and
Salama
,
M.
(
2008
).
Opposition-based differential evolution
.
IEEE Transactions on Evolutionary Computation
,
12
(
1
):
64
79
.
Salehinejad
,
H.
,
Rahnamayan
,
S.
, and
Tizhoosh
,
H.
(
2014
).
Type-ii opposition-based differential evolution
. In
Congress on Evolutionary Computation
, pp.
1768
1775
.
Sankoff
,
D.
,
Leduc
,
G.
,
Antoine
,
N.
,
Paquin
,
B.
,
Lang
,
B. F.
, and
Cedergren
,
R.
(
1992
).
Gene order comparisons for phylogenetic inference: Evolution of the mitochondrial genome
.
Proceedings of the National Academy of Sciences of the United States of America
,
89
(
14
):
6575
6579
.
Shannon
,
C. E.
(
1948
).
A mathematical theory of communication
.
Bell System Technical Journal
,
379
427
.
Soncco-Álvarez
,
J. L.
, and
Ayala-Rincón
,
M.
(
2012
).
A genetic approach with a simple fitness function for sorting unsigned permutations by reversals
. In
7th Colombian Computing Congress
, pp.
1
6
.
Soncco-Álvarez
,
J. L.
, and
Ayala-Rincón
,
M.
(
2013
).
Sorting permutations by reversals through a hybrid genetic algorithm based on breakpoint elimination and exact solutions for signed permutations
.
Electronic Notes in Theoretical Computer Science
,
292:119
133
.
Soncco-Álvarez
,
J. L.
, and
Ayala-Rincón
,
M.
(
2014
).
Memetic algorithm for sorting unsigned permutations by reversals
. In
IEEE Congress on Evolutionary Computation
, pp.
2770
2777
.
Soncco-Álvarez
,
J. L.
,
Marchesan Almeida
,
G.
,
Becker
,
J.
, and
Ayala-Rincón
,
M.
(
2013
).
Parallelization and virtualization of genetic algorithms for sorting permutations by reversals
. In
World Congress on Nature and Biologically Inspired Computing
, pp.
29
35
.
Sudholt
,
D.
(
2015
).
Springer handbook of computational intelligence
, pp.
929
959
.
Berlin Heidelberg
:
Springer
.
Swenson
,
K.
,
Rajan
,
V.
,
Lin
,
Y.
, and
Moret
,
B.
(
2009
).
Sorting signed permutations by inversions in o (nlogn) time
. In
S.
Batzoglou
(Ed.),
Research in computational molecular biology
, pp.
386
399
. In
Lecture Notes in Computer Science
, Vol.
5541
.
Tannier
,
E.
,
Bergeron
,
A.
, and
Sagot
,
M.-F.
(
2007
).
Advances on sorting by reversals
.
Discrete Applied Mathematics
,
155
(
6-7
):
881
888
.
Tizhoosh
,
H.
(
2005
).
Opposition-based learning: A new scheme for machine intelligence
. In
International Conference on Computational Intelligence for Modelling, Control and Automation
, and International Conference on Intelligent Agents, Web Technologies and Internet Commerce, Vol.
1
, pp.
695
701
.
Tizhoosh
,
H.
, and
Ventresca
,
M.
(
2008
).
Oppositional concepts in computational intelligence
.
Berlin Heidelberg
:
Springer
.
Xu
,
Q.
,
Wang
,
L.
,
Wang
,
N.
,
Hei
,
X.
, and
Zhao
,
L.
(
2014
).
A review of opposition-based learning from 2005 to 2012
.
Engineering Applications of Artificial Intelligence
,
29:1
12
.
Zhang
,
G.
,
Rong
,
H.
,
Neri
,
F.
, and
Pérez-Jiménez
,
M. J.
(
2014
).
An optimization spiking neural p system for approximately solving combinatorial optimization problems
.
International Journal of Neural Systems
,
24
(
05
):
1440006
.