## Abstract

In genetic programming, the size of a solution is typically not specified in advance, and solutions of larger size may have a larger benefit. The flexibility often comes at the cost of the so-called bloat problem: individuals grow without providing additional benefit to the quality of solutions, and the additional elements can block the optimization process. Consequently, problems that are relatively easy to optimize cannot be handled by variable-length evolutionary algorithms. In this article, we analyze different single- and multiobjective algorithms on the sorting problem, a problem that typically lacks independent and additive fitness structures. We complement the theoretical results with comprehensive experiments to indicate the tightness of existing bounds, and to indicate bounds where theoretical results are missing.

## 1 Introduction

Evolutionary algorithms using variable-length representations have been applied in different problem domains (Falco et al., 2005; Lee and Antonsson, 2000). They are, in particular, useful if there is a trade-off between the quality of a solution and its complexity in terms of the size of its representation. This happens frequently in the area of regression where a more complex model can give a better fit to the given data.

Genetic programming (GP) (Koza, 1992) is the most prominent example of a variable-length evolutionary algorithm, as it often evolves tree-like solutions for a given problem. Its main application area lies in the field of symbolic regression. The first computational complexity results on this type of algorithm were obtained following the line of successful research on evolutionary algorithms with fixed-length representation (Auger and Doerr, 2011; Neumann and Witt, 2010, for an overview). In general, variable-length representations increase the search space significantly, and it is desirable to better understand the behavior of algorithms using such representations from a theoretical point of view.

For example, Cathabard et al. (2011) investigated nonuniform mutation rates for problems with unknown solution lengths. A simple evolutionary algorithm was used to find a bit string with an unknown number of leading 1s, and although the bit string had some predetermined maximum length, only an unknown number of initial bits was used by the fitness function. A simple tree-based genetic programming algorithm was investigated by Durrett et al. (2011). The problems were separable, with independent and additive fitness structures. Similarly, Kötzing et al. (2012) analyzed simple GP algorithms for the MAX problem. A new form of GP called geometric semantic genetic programming was investigated, with positive results for Boolean, classification, and regression domains (see, e.g., Moraglio et al., 2013; Mambrini et al., 2013).

Many evolutionary algorithms that work with a variable-length representation do not work (in their most basic variant) with a form of bloat control. One popular way to deal with the bloat problem is inspired by Occam’s Razor: in case two solutions are equal in quality, the solution of lower complexity is preferred. Another frequently taken approach to coping with the bloat problem is the use of multicriteria approaches that use sets of solutions representing the different trade-offs according to the original goal function and the complexity of a solution. Such multicriteria approaches are used even in industrial GP packages such as Datamodeler by Evolved Analytics LLC (2010). Both approaches to coping with the bloat problem have been examined for different problems in the context of genetic programming (Neumann, 2012; Wagner and Neumann, 2012; Nguyen et al., 2013).

In this article, we investigate the sorting problem, which is one of the fundamental problems in computer science.^{1} In addition, the sorting problem is the first combinatorial optimization problem for which computational complexity results have been obtained in the area of discrete evolutionary algorithms (Scharnow et al., 2004; Doerr and Happ, 2008). Scharnow et al. (2004) formulated sorting as a problem where the task is the minimization of unsortedness (or the maximization of sortedness) of a given permutation of the input elements. Several different functions have been explored in the past to measure unsortedness, and they have been studied with respect to the difficulty of being optimized by permutation-based evolutionary algorithms. Depending on the chosen measure and in contrast to the problems WORDER and WMAJORITY (see, e.g., Nguyen et al., 2013), the sorting problem cannot typically be split into subproblems that can be solved independently. Consequently, the dependencies between the subproblems can significantly impact the time needed to solve the overall optimization problem.

With our work, we continue the analyses started by Wagner and Neumann (2012), which focussed on the advantages of a parsimonious algorithm over a multiobjective one. Here, we present several analyses for a total of three single- and multiobjective algorithms using five sortedness measurements. Despite our analyses, no (or no exact) runtime bounds are given for different combinations of algorithms and problems. Because of this, we explore experimentally the open cases and questions. The intention is that this will guide further rigorous analyses (similar to Lässig and Sudholt, 2010; Briest et al., 2004; Urli et al., 2012) by exploring the important measures within a computational complexity analysis of the algorithms. We complement the theoretical results with conjectures about the expected optimization times for the variants lacking a formal proof. In our experimental investigations, we concentrate on important measures, such as the size of the largest tree during the run of the single-objective algorithms analyzed by Durrett et al. (2011) and the maximum population size of the multiobjective algorithm analyzed by Neumann (2012). In both articles, these measures have different implications for the computational complexities of the analyzed algorithms.

This article is organized as follows. We first introduce the sorting problem in Section 2. Then we present the different genetic programming algorithms, which are analyzed in Section 3. In Section 4 we study the single-objective approach, in Section 5 the parsimony approach, and in Section 6 the multiobjective approach. The experimental investigations on the behaviors of (1+1)-GP and SMO-GP follow in Section 7. We finish with some conjectures and concluding remarks in Section 8.

## 2 Preliminaries

Our goal is to investigate theoretically and experimentally the differences between bloat control mechanisms for genetic programming. In our investigations, we consider simple tree-based genetic programming already analyzed by Durrett et al. (2011); Neumann (2012); Wagner and Neumann (2012); Nguyen et al. (2013) and Neumann et al. (2011) for problems with isolated program semantics. A possible solution is represented by a syntax tree. The tree’s inner nodes are labeled by function symbols from a set *F*, and the tree’s leaves are labeled by terminals from a set *T*.

Even though many GP algorithms allow complex functions for the inner nodes, we restrict the set of functions to the single binary function “join” *J*. Effectively, we use *J*s to achieve variable-length lists by concatenating leaf nodes.

The problem that we use as the basis for our investigations is a classical problem from computational complexity analysis, namely, the sorting problem SORTING. Scharnow et al. (2004) considered SORTING as an optimization problem, where different functions measure the sortedness of a permutation of given input elements. They discovered that different fitness functions lead to problems of different difficulties.

It is important to note that in contrast to WORDER and WMAJORITY (analyzed in previous articles), the SORTING problem cannot be split into subproblems that can be solved independently. These dependencies have a significant impact on the time needed to solve the problem.

*n*elements, the task is to find a permutation of the elements of

*T*such that holds, where is the order on

*T*. Without loss of generality, we assume , meaning that for all

*i*, throughout our analyses.

The set of all permutations forms a search space that has already been investigated by Scharnow et al. (2004) for the analysis of permutation-based evolutionary algorithms. The authors of that article investigated SORTING as an optimization problem where the goal was to maximize the sortedness (or equivalently, minimize the unsortedness) of a given permutation. Here, we consider the same fitness functions as introduced by Scharnow et al. (2004):

measures the number of pairs of neighboring elements in correct order (larger values are better);

measures the number of elements that are at their correct position, which is the number of indices

*i*such that (larger values are better);measures the number of maximal sorted blocks, which is the number of indices

*i*such that plus 1 (smaller values are better);measures the length of the longest ascending subsequence within of elements (larger values are better);

measures the smallest number of pairwise exchanges in in order to sort the sequence (smaller values are better);

Given a tree *X*, we determine the permutation that it represents according to Algorithm 1. Once we have seen an element during an in-order parse, we skip its duplicates. This is necessary, as the resulting sequence of elements for which we determine its sortedness should contain each element at most once. Note also that a single target permutation can be represented by many different trees.

Note that can be computed in linear time because of the cycle structure of permutations. The sequence is sorted if and only if it has *n* cycles. Otherwise, it is always possible to increase the number of cycles by exchanging an element that is not sitting at its correct position with the element that is currently sitting there. For any given permutation consisting of *n*−*k* cycles, .^{2}

We investigate the five listed measures for variable-length evolutionary algorithms. Consequently, we might have to deal with incomplete permutations, as not all elements have to be contained in a given individual. Most measures can also be used for incomplete permutation, but we have to make sure that complete permutations always obtain a better fitness than incomplete ones, so that the sortedness measure guides the algorithm from incomplete permutations to complete ones. Therefore, we use the sortedness measures as described and use the following special fitness assignments that enforce these properties:

is the number of pairs in order, except if , and if ;

if , otherwise is the sum of the number of maximal sorted blocks

*b*, and the number of elements missing ;If then , else , where

*e*is the number of necessary exchanges, and the number of missing elements.

Note that *e* can be computed for incomplete permutations as well, as only the order on the expressed variables has to be respected. This means that the permutations and require no changes, but , as the number of missing elements differs.

For example, for a tree *X* with and , the sortedness results are , , and .

MO-INV, MO-HAM, MO-RUN, MO-LAS, and MO-EXC are variants of the described problems. They take as the second objective the complexity *C* of a syntax tree (computed by the number of leaves of the tree), for instance, MO-INV (X) = (INV (X), C(X)). Optimization algorithms can then make use of this in order to deal with the bloat problem: given two solutions with identical fitness value, the algorithms can prefer the solution of lower complexity.

## 3 Algorithms

In this article, all GP algorithms use only the HVL-Prime as the mutation operator to generate new solutions. HVL-Prime is a variant of O’Reilly’s HVL mutation operator (O’Reilly, 1995; O’Reilly and Oppacher, 1994) and it is motivated by minimality rather than by problem-specific operations. HVL-Prime produces a new tree by making changes to the original tree via three basic operators: insertion, deletion, and substitution (see Algorithm 2). In each iteration of the algorithms, *k* mutations are applied to the selected solution. For the single-operation variants of the algorithms, holds. For the multioperation variants, the number of operations performed is drawn each time from the distribution , where denotes the Poisson distribution with mean 1.

The algorithm (1+1)-GP* that we investigate first has no explicit mechanism to control bloat whatsoever. The only feature that can potentially prevent the solution’s size from becoming too large is that only strict fitness improvements are accepted. Thus, the maximum solution size is limited based on the size of the initial tree and by the number of possible improvements that can be performed.^{3}

The single-objective variant called (1+1)-GP*-single (see Algorithm 3) starts with an initial solution *X* and produces in each iteration a single offspring *Y* by applying the mutation operator HVL-Prime, given in Algorithm 2 with . This means that it is a stochastic hill-climber that explores its local neighborhood. In the case of maximization, *Y* replaces *X* if holds. Minimization problems are tackled in the analogous way.

The single-objective variant called (1+1)-GP (see Algorithm 4 for the single-mutation variant) is identical to the just described (1+1)-GP* with the exception that in the case of maximization *Y* replaces *X* if holds. Again, minimization problems are tackled in the analogous way. As a consequence of the relaxed acceptance condition, the complexity of the solution can increase as long as the fitness does not decrease. Thus, (1+1)-GP has no mechanism to prevent bloat whatsoever.

*C*is to be minimized and all fitness functions

*F*except RUN and EXC are maximized.

^{4}In the parsimony approach, we optimize the defined multiobjective fitness functions with respect to the lexicographic order, that is, is true iff

As the last algorithm, we consider the simple evolutionary multi-objective genetic programming algorithm (SMO-GP, see Algorithm 5) introduced by Neumann (2012) and motivated by the SEMO algorithm for fixed-length representations by Laumanns et al. (2004). Variants of SEMO have been frequently used in the runtime analysis of evolutionary multiobjective optimization for fixed-length representations (see Giel, 2003; Neumann and Wegener, 2005; Friedrich et al., 2010; Giel and Lehre, 2010; Neumann and Witt, 2010).

In this multiobjective variable-length algorithm, both criteria *F* and *C* are equally important. In order to compare two solutions, we consider the classical Pareto dominance relations:

A solution

*X**weakly dominates*a solution*Y*(denoted by ) iff .A solution

*X**dominates*a solution*Y*(denoted by ) iff .Two solutions

*X*and*Y*are called*incomparable*iff neither nor holds.

A solution that is not dominated by any other solution in the search space is called a Pareto optimal solution. The set of all such Pareto optimal solutions forms the Pareto optimal set, and the set of all corresponding objective vectors forms the Pareto front. In multiobjective optimization, the classical goal is to compute a Pareto optimal solution for each objective vector of the Pareto front. Or, if the Pareto front is too large, the goal then is to find a representative subset of the front—the definition of “representative” depends on the investigator’s preference.

SMO-GP is a population-based approach that starts with a single solution. During the optimization run it maintains a set of nondominated solutions. This set constantly approximates the true Pareto front, that is, the set of optimal trade-offs between fitness and complexity. In each iteration, it picks one solution uniformly at random and produces one offspring *Y* by mutation. *Y* is introduced into the population iff it is not weakly dominated by any other solution in *P*. If *Y* is added to the population, all individuals that are dominated by *Y* are discarded.

Similar to the previously introduced algorithms, SMO-GP-single uses the mutation operator HVL-Prime with . We also consider SMO-GP-multi, which differs from SMO-GP-single by choosing *k* according to .

We analyze the introduced single-objective algorithms in terms of the number of iterations of their repeat loops until they have produced an optimal solution, that is, a solution of maximal or minimal fitness value for the first time. The expected number of iterations to achieve this goal is called the expected optimization time of the algorithm. Considering multiobjective algorithms, the expected optimization time refers to the expected number of iterations of the repeat loop until the population includes for each Pareto optimal objective vector a corresponding solution.

## 4 Standard Approach without Bloat Control

We begin our analyses with the theoretical investigation of (1+1)-GP (see Algorithm 3), which has no mechanism to control bloat. The only feature that can potentially prevent the solution size from becoming too large is that only strict fitness improvements are accepted. Thus, the maximum solution size is limited based on the size of the initial solution, the increase in complexity per improvement, and the total number of fitness improvements during the run of the algorithm.

Recall that the single-objective variant called (1+1)-GP*-single starts with an initial solution *X* and produces in each iteration a single offspring *Y* by applying the mutation operator given in Algorithm 2 with . This means that it is a stochastic hill-climber that explores its local neighborhood. In the case of maximization, *Y* replaces *X* if holds. Minimization problems are tackled in the analogous way.

### 4.1 Upper Bound

In this section we analyze the performance of our (1+1)-GP* variants on one of the fitness functions introduced in Section 3.

We exploit a similarity between our variants and evolutionary algorithms to obtain an upper bound on the time needed to find an optimal solution. We use the method of *fitness-based partitions*, which was originally introduced for the analysis of elitist evolutionary algorithms (see, e. g., Wegener, 2002), where the fitness of the current search point can never decrease. Although the HVL-Prime operator is complex, we can obtain a lower bound on the probability of making an improvement considering fitness improvements that arise from the HVL-Prime suboperations insertion and substitution. In combination with fitness levels defined individually for the sortedness measures, this gives us the runtime bounds in this section.

We denote by *T*_{max} the maximum size of the tree during the run of the algorithm and show the following theorem.

The expected optimization time is for (1+1)-GP*-single and (1+1)-GP*-multi, using INV as the sortedness measure, where *n* is the number of elements that are to be sorted.

As there are at most advancing steps between fitness levels to be made, the total expected runtime is upper bounded by the sum over all times needed to make such steps.

We bound the times by investigating the case when only a particular insertion of a specific leaf at its correct position achieves an increase of the fitness.^{5} For this particular insertion, we consider the lexicographically smallest pair , , which is currently incorrect: putting *i* directly before *j* makes this pair correct. We now have to show that this does not make incorrect any other pair that was previously correct. Assume there is a pair , , that was previously correct and that has become incorrect because of the insertion of *i*. As only *i* is moved, has to hold, but we can show that this cannot be the case. *k* has to be smaller than *j*; otherwise the pair cannot become incorrect. Thus, has to hold because and and because of our assumption . was correct before the insertion, so it has to be lexicographically smaller than . Therefore *k* is before *j* in the list of expressed leaf nodes. As *i* is placed directly before *j* and therefore after *k*, cannot become incorrect.

*J*-node is at least . This, together with the probability of selecting the right element to add, which is bounded by , and the probability of adding it to the right position in the tree, which is bounded by , gives us a lower bound on the probability for doing such an improvement in (1+1)-GP*-single :

^{6}

### 4.2 Local Optima

In the following, we present several worst-case examples for HAM, RUN, LAS, and EXC that demonstrate that (1+1)-GP* can get stuck in local optima during the optimization process. This shows that evolving a solution with this GP system is much harder than working with the permutation-based EA presented by Scharnow et al. (2004), where only the sortedness measure RUN leads to an exponential optimization time.

We study worst-case solutions that are hard to improve by our algorithms. In the following, we write down such solutions by the order of the leaves in which they are visited by the in-order parse of the tree. We restrict ourselves to the case where we initialize with a tree of size linear in *n* and show that even this leads to difficulties for the previously mentioned sortedness measures. Note, that a tree of size linear in *n* is necessary to represent a complete permutation of the given input elements.

Let be the initial solution. Using the sortedness measures RUN and LAS, the expected optimization time of (1+1)-GP*-single is infinite and that of (1+1)-GP*-multi is .

We consider (1+1)-GP*-single first. It is clear that with a single HVL-Prime application, it is possible to remove only one of the leftmost *n*s. To improve the sortedness based on RUN or LAS, all leftmost leaves have to be removed at once. Clearly, (1+1)-GP*-single cannot do this, which results in an infinite runtime.

Similarly, (1+1)-GP*-multi can only improve the sortedness if it removes the leftmost leaves. Hence, in order to successfully improve the sortedness, at least suboperations have to be performed, assuming that we delete one of the leftmost *n*s each time. As the number of suboperations per mutation is distributed as , the Poisson random variable has to take a value of at least *n*. Thus, the probability for such a fitness-improving step is , with the expected waiting time for such a step being .

Let be the initial solution. Using the sortedness measures HAM and EXC, the expected optimization time of (1+1)-GP*-single is infinite and of (1+1)-GP*-multi is .

We use similar ideas as in the previous proof. Again, it is not possible for (1+1)-GP*-single to increase the sortedness in a single step, as all leftmost leaf nodes need to be deleted in order for the rightmost *n* to become expressed. In addition, a leaf labeled 1 has to be inserted at the beginning, or alternatively, one of the leaves labeled *n* has to be replaced by a leaf labeled 1. This results in a minimum number of suboperations that have to be performed by one HVL-Prime application, leading to the lower bound of for (1+1)-GP*-multi.

## 5 Parsimony Approach

In this section we consider simple variable-length evolutionary algorithms using the parsimony approach. The single-objective variant called (1+1)-GP is identical to the previously investigated (1+1)-GP* with the exception that in the case of maximization, *Y* replaces *X* if .

Wagner and Neumann (2012) showed that the optimization time of (1+1)-GP-single on MO-EXC, MO-RUN, and MO-HAM is infinite when initialized with specific solutions.

In the following, we add to these results by proving polynomial runtime bounds for the other functions. The idea behind the proof of the expected polynomial optimization time on MO-LAS is as follows. Given a tree *T* with its tree size of *T*_{init} and its sortedness LAS. For such a tree, we always have at least one of the following two ways to create a new tree that is accepted. First, we can improve the sortedness by extending the longest ascending sequence. Or, second, we can reduce the size of the tree, if the tree has more than *k* leaves. If the latter is the case, we can trim the number of leaves down to *k*, thus eliminating blocking elements and duplicates, and then we can build up the sought permutation. Thus, we can now deal with trees such as (see Section 4.2) that have previously been problematic.

The expected optimization time of (1+1)-GP-single on MO-LAS is .

We consider two phases. First, we show that we arrive at a tree with fitness *k* and *k* leaves after steps. Then we analyze the time needed to get from there to the optimal solution.

*1. Phase*. Initially, let LAS be the fitness of the current tree

*T*with

*s*leaves. Then, the fitness distance to the desired tree size is . As the probability for HVL-Prime to perform a deletion is , the probability to reduce the size via a deletion in a single mutation step is at least where the term comes from the fact that we need to select one of the redundant elements. Note that

*d*cannot increase: for

*d*to increase,

*k*would have to decrease, which is impossible, as the primary objective is the maximization of the LAS value. Alternatively,

*d*could increase if

*s*increases. However, the tree size can only increase if the last accepted step increased the sortedness as well. In a single step, if

*s*increases by 1, then

*k*had to increase by 1 as well, which leaves the distance unchanged.

*d*, and the fitness levels with we can bound the expected runtime for this first phase:

*2. Phase.* Next, we investigate the time needed in the second phase to arrive at the optimum. Therefore, we again apply the described fitness-based partitions method. We define the fitness levels with . As there are at most advancing steps between fitness levels to be made, the total expected runtime is upper bounded by the sum over all expected times needed to make such steps.

*any*of the

*n*−

*k*unblocked elements that are missing in the sequence into its correct position. The probability for a single of such an insertion to happen is at least Consequently, the expected runtime of the second phase can then be bounded from above by

Hence, the expected optimization time of the algorithm is .

The expected optimization time of (1+1)-GP-single on MO-INV is upper bounded by .

^{1}and

^{4}. First, after steps, we arrive at a nonredundant tree. Next, as we can have at most

*n*

^{2}fitness-improving insertions, the maximum tree size

*T*

_{max}is bounded by after the initial trimming phase. Consequently, the probability for a fitness-improving mutation is bounded by . Thus, we can now bound the overall optimization time by

Achieving a similar bound for the multimutation variant is not as easy, since the insertion of a missing element (i.e., a fitness improvement) may be accompanied by the insertion of many elements that are already present. Because of the Poisson-distributed number of operations performed by HVL-Prime within (1+1)-GP-multi, the algorithm’s *typical* local behavior is difficult to predict.

Therefore, we take an alternative approach, looking at a polynomial-sized sequence of steps . Let *T*_{init} be a tree with size . The failure probability for inserting at most in a single HVL-Prime operation is . Furthermore, given any initial tree, we can have at most *n* improvements of the sortedness when the measurements LAS and EXC are used. Now, we compute a bound of the tree size. Looking at *n* mutations that increase the fitness, the failure probability for adding at most leaves in *t* time steps is exponentially small: . Thus, the tree size does not exceed within time steps, with high probability.

Let be a constant. The optimization time of (1+1)-GP-multi on MO-LAS is , with probability .

We split the proof into two parts: first, we bound the total time needed for deletions during a run, and second, we investigate the time needed to perform the necessary insertions to find the optimal solution.

First, given a solution where *k _{m}* elements have to be removed in order to arrive at a nonredundant tree after the

*m*th fitness-increasing insertion. In the following, let

*i*be the number of redundant elements in the tree, and let

*j*be the number of nonredundant elements in the tree.

*Stage 1, .*As the probability for a single operation is , the probability for the deletion of a single redundant element at any time is lower bounded by

Then, the expected time to delete *k _{m}* elements is upper bounded by 6

*ek*. Furthermore, as we know that we can delete at most

_{m}*T*

_{max}leaves over a full optimization run, . Thus, we can bound the expected time needed for

*all*deletions (when ) by .

^{7}(with ) we get that

*Stage 2, .* To bound the number of steps, we apply the technique of multiplicative drift with tail bounds (see Definition 1 and Theorem ^{1} in Doerr and Goldberg, 2010).

In our situation, is a feasible -drift function on the number of redundant elements (with implicit constant ). For the optimal solutions (”no redundant elements left”) holds as required, holds for all nonoptimal solutions, and . Thus, and . Consequently, we get that the time needed for all deletions (when ) during a run exceeds with probability at most . As these deletion phases take place at most *n* times, the resulting overall deletion time does not exceed with probability .

Next, we consider the time necessary to perform the insertions of the missing elements once the insertion was unblocked. We again apply the multiplicative drift with tail bounds. Note that the situation is very similar: instead of reducing the number of redundant elements, we are now reducing the number of missing elements.

Let *j* be the number of elements currently missing. As the probability for a single operation is , the probability for a single insertion of a missing element to happen at the required position is lower bounded by . With , we get and . Consequently, by applying Theorem ^{1} from Doerr and Goldberg (2010), we get that the time needed for all insertions during a run exceeds with probability at most . Thus, the resulting overall time needed for all insertions does not exceed with probability .

## 6 Multiobjective Approach

In the following, we consider the simple evolutionary multi-objective genetic programming algorithm (SMO-GP, see Algorithm 5) introduced by Neumann (2012), which modifies the original SEMO algorithm of Laumanns et al. (2004) to work with fixed-length representations.

Let us recall that SMO-GP is a population-based approach that is initialized with a single solution. During the run, it keeps in each iteration the set of nondominated solutions obtained so far. This set of solutions constantly approximates the true Pareto front, namely, the set of optimal trade-offs between fitness and complexity. In each iteration, it picks one solution uniformly at random and produces one offspring *Y* by mutation. *Y* is introduced into the population iff it is not weakly dominated by any other solution in *P*. If *Y* is added to the population, all individuals that are dominated by *Y* are discarded.

In the following, we analyze the expected number of iterations before the set of nondominated solutions becomes the true Pareto front. We call this the *expected optimization time* of SMO-GP algorithms.

For arbitrary optimization problems, the following lemma bounds the expected time needed for the populations to include the empty solution (i.e., the empty tree):

Let be the size of the initial solution and *k* be the number of different fitness values of a problem *F*. Then the expected time until the population of SMO-GP-single and SMO-GP-multi applied to MO-F contains the empty solution is .

The expected optimization time of SMO-GP-single and SMO-GP-multi is on MO-INV, and on MO-LAS.

First, as INV has different fitness values, using Lemma ^{7}, the empty solution is produced after an expected number of steps. First, note that each Pareto optimal solution with complexity has an INV value of , if .^{8}

Second, we bound the time needed to discover the whole Pareto front once the empty solution is introduced into the population. Assume that the population contains all Pareto optimal solutions with complexities , . Then, a population that includes all Pareto optimal solutions with complexities , can be achieved by producing a solution *Y* that is Pareto optimal and that has complexity . *Y* can be obtained from a Pareto optimal solution *X* with by inserting an element that increases the INV value by . This operation produces from a solution of complexity a solution of complexity , as one leaf node and one inner node are added.

Based on this idea we can bound the expected optimization time once we can bound the probability for such steps to happen. With probability at least it is possible to choose *X*, as the population size is upper bounded by . Next, a single mutation operation happens with probability at least , and the inserting operation of HVL is chosen with probability . The probability to select one of the missing elements is at least . However, the correct position for such a randomly chosen element has to be chosen in order to produce a solution that is Pareto optimal and of complexity . This probability is at least , as the number of leaf nodes is bound by *n*, and the probability to insert as the correct child of the newly introduced inner node is at least . Thus, the total probability of such a generation can be bounded by .

*n*Pareto optimal improvements are possible once the empty solution has been introduced into the population, the expected time until all Pareto optimal solutions have been generated is bounded by

Similarly, we can prove an upper bound for MO-LAS. First, note that each Pareto optimal solution with LAS value *i* represents a perfectly sorted permutation of *i* elements. Next, after an expected number of steps the empty solution is produced, as only *n* different LAS values are possible. As before, we assume that the population already contains all solutions that are Pareto optimal and of complexities , . Then, the minimally larger population that includes all Pareto optimal solutions with complexities , , can be achieved by inserting any of the missing *n*−*i* elements into its correct position in the Pareto optimal individual *X* with .

## 7 Complementary Experimental Analyses

Tables 1 and 2 summarize our theoretical findings, list existing bounds, and show open problems. As can be observed from the tables, all bounds consider tree sizes of some kind: either the size *T*_{max} of the largest tree or the size of the initial solution *T*_{init}. In particular, the runtime of (1+1)-GP*, F(X) depends on the maximum tree size *T*_{max}, since the expected time to get to the optimal solution grows larger and larger as the tree grows in size. The runtimes of several MO-F(X) variants depend on the initial tree size *T*_{init}, as often the first step of the proof involves deconstructing the original solutions until a tree of size zero is found. Furthermore, it is quite striking that not too many bounds for the multioperation GP algorithms are known so far. It is also not known whether the bounds are tight or not. As the maximum tree size for (1+1)-GP and the population size for SMO-GP play a relevant role in the theoretical analyses, we focus our attention on these.

. | (1+1)-GP*, F(X) . | (1+1)-GP, F(X) . | |
---|---|---|---|

F(X) . | single . | multi . | single/multi . |

INV | |||

LAS | |||

HAM | ? | ||

EXC | |||

RUN |

. | (1+1)-GP*, F(X) . | (1+1)-GP, F(X) . | |
---|---|---|---|

F(X) . | single . | multi . | single/multi . |

INV | |||

LAS | |||

HAM | ? | ||

EXC | |||

RUN |

The question mark indicates combinations for which we do not know any bounds. Asterisks indicate the bounds presented in this article. *T*_{max} denotes the size of the largest tree at any stage during the evolution of the algorithm.

. | (1+1)-GP, MO-F(X) . | SMO-GP, MO-F(X) . | |
---|---|---|---|

F(X) . | single . | multi . | single/multi . |

INV | ? | ||

LAS | |||

HAM | ? | ||

EXC | ? | ||

RUN | ? |

. | (1+1)-GP, MO-F(X) . | SMO-GP, MO-F(X) . | |
---|---|---|---|

F(X) . | single . | multi . | single/multi . |

INV | ? | ||

LAS | |||

HAM | ? | ||

EXC | ? | ||

RUN | ? |

indicates a bound that holds with probability . Question marks indicate combinations for which we do not know any bounds. Asterisks indicate bounds presented in this article. *T*_{init} denotes the size of the initial tree.

In this section we carry out experimental investigations about the runtime of different variable-length algorithms over the presented fitness functions. The purpose of this analysis is threefold:

to complement the theoretical results with conjectures about the expected optimization times for the variants lacking a formal proof;

to assess the impact on the runtime of two collected measures, namely, the maximum tree size

*T*_{max}and, for SMO-GP, the maximum population size*P*_{max}encountered during an optimization run; andto give useful insight for guiding further rigorous theoretical analysis.

### 7.1 Experimental Setup

In our experimental investigations, we considered all the GP algorithms: (1+1)-GP on F(X), (1+1)-GP on MO-F(X), (1+1)-GP* on F(X), and SMO-GP on MO-F(X). Each GP algorithm was run in its single-mutation and multimutation variants, and we investigated problems of sizes . For the initialization of the individuals, we considered the schemes init (empty tree) and init (tree with *n* leaves constructed by applying *n* insertion mutations at random positions on an initially empty tree). In total, our experiments spanned ten problems: INV, HAM, RUN, LAS, and EXC in their single and multiobjective variants.

We ran the experiments on Intel Xeon E5430 CPUs (2.66GHz) on Debian GNU/Linux 7 with Java SE RE 1.7. We limited the computation budget to a maximum runtime of 3 hours or evaluations each, whichever was reached first. Furthermore, we repeated each experiment 200 times, resulting in a standard error of the mean (the standard deviation of the sampling distribution) of . As a curiosity, the whole set of experiments took about 30 CPU-years to complete.

The complete source code of the framework is available on BitBucket (Mercurial, at https://bitbucket.org/tunnuz/gpframework), on GitHub (Git, https://github.com/tunnuz/gpframework), and on Google Code (Subversion at http://code.google.com/p/gpframework).

### 7.2 Experimental Analysis of the (1+1)-GP Variants

We now analyze the experimental results of the (1+1)-GP variants with respect to the maximum tree size obtained during execution and the required optimization time.

#### 7.2.1 Tree Size

As mentioned, the known theoretical bounds for the (1+1)-GP variants depend on *T*_{max}, the size of the largest tree encountered during the run of the algorithm. It is important to observe that the maximum solution size is not a parameter that is set in advance but rather a measure that emerges from the nature of the employed fitness function and mutation operators. In addition, the optimization can involve a degree of randomness, which makes *T*_{max} (and thus bloating) extremely difficult to predict. For this reason, we investigate the maximum solution size experimentally to detect when bloat occurs within the analyzed GP algorithms. As statistics, we employ the median (the second quartile) as a measure of central tendency and the interquartile range (, the distance between the first and the third quartiles) as a measure of variance.

Table 3 reports results for , and the results for the other input sizes are comparable. The missing data (—) represent experiments for specific input sizes where the algorithms did not find an optimal solution within the time or evaluations bound for more than of the repetitions.^{9} For the sake of clarity we recall that (1+1)-GP*, F(X) accepts a new solution only if the fitness is strictly better than the previous one, while (1+1)-GP, F(X) always accepts a solution of the same value. (1+1)-GP, MO-F(X) accepts a solution of the same fitness only if the complexity is lower.

k . | F(X) . | n . | (1+1)-GP*, F(X) . | (1+1)-GP, F(X) . | (1+1)-GP,MO-F(X) . | |||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|

. | . | . | init . | init . | init . | init . | init . | init . | ||||||

. | . | . | m
. | . | m
. | . | m
. | . | m
. | . | m
. | . | m
. | . |

INV | 40 | 307 | 46 | 327 | 33.5 | 528 | 185.5 | 528 | 202.5 | 95 | 4 | 97 | 5.5 | |

80 | 821 | 79 | 849 | 105 | 1259 | 472 | 1269 | 473 | 189 | 8 | 191 | 6 | ||

160 | — | — | — | — | 2645 | 612 | 2688 | 627.5 | 375 | 10 | 381 | 14 | ||

LAS | 40 | 79 | 0 | — | — | 525 | 212 | 592 | 265.5 | 79 | 0 | 79 | 0 | |

80 | 159 | 0 | — | — | 1352 | 508.5 | 1401 | 526.5 | 159 | 0 | 159 | 0 | ||

160 | 319 | 0 | — | — | 2670 | 527.5 | — | — | 319 | 0 | 319 | 0 | ||

HAM | 40 | 79 | 0 | — | — | 1665 | 1042 | 1672 | 723.5 | 79 | 0 | 79 | 0 | |

80 | 159 | 0 | — | — | — | — | — | — | 159 | 0 | 159 | 0 | ||

160 | 319 | 0 | — | — | — | — | — | — | 319 | 0 | 319 | 0 | ||

EXC | 40 | 81 | 0 | — | — | 1573 | 908 | — | — | 81 | 2 | — | — | |

80 | 161 | 0 | — | — | — | — | — | — | 161 | 0.5 | — | — | ||

160 | 321 | 2 | — | — | — | — | — | — | 321 | 0 | — | — | ||

RUN | 40 | 79 | 0 | — | — | — | — | — | — | 79 | 0 | — | — | |

80 | 159 | 0 | — | — | — | — | — | — | 159 | 0 | — | — | ||

160 | 319 | 0 | — | — | — | — | — | — | 319 | 0 | — | — | ||

INV | 40 | 249 | 33 | 259 | 34 | 512 | 183 | 543 | 199.5 | 107 | 8 | 112 | 10 | |

80 | 611 | 48 | 627 | 58 | 1245 | 490 | 1308 | 435.5 | 213 | 12.5 | 219 | 14 | ||

160 | — | — | — | — | 2793 | 733 | 2821 | 700 | 419 | 18 | 437 | 22 | ||

LAS | 40 | 95 | 10 | — | — | 555 | 276.5 | 560 | 261.5 | 79 | 2 | 79 | 2 | |

80 | 187 | 10.5 | — | — | 1334 | 592 | 1382 | 420.5 | 159 | 2 | 159 | 2 | ||

160 | — | — | — | — | 2893 | 698 | 2789 | 498 | 319 | 2 | 319 | 2 | ||

HAM | 40 | 87 | 6 | — | — | 1767 | 926 | 1833 | 1042 | 79 | 2 | 79 | 2 | |

80 | 177 | 8 | — | — | — | — | — | — | 159 | 0 | 159 | 2 | ||

160 | 353 | 11.5 | — | — | — | — | — | — | 319 | 2 | 319 | 2 | ||

EXC | 40 | 93 | 6 | — | — | 1852 | 1042 | 1964 | 964.5 | 81 | 0 | 83 | 2 | |

80 | — | — | — | — | — | — | — | — | 161 | 2 | — | — | ||

160 | — | — | — | — | — | — | — | — | — | — | — | — | ||

RUN | 40 | — | — | — | — | — | — | — | — | — | — | — | — | |

80 | — | — | — | — | — | — | — | — | — | — | — | — | ||

160 | — | — | — | — | — | — | — | — | — | — | — | — |

k . | F(X) . | n . | (1+1)-GP*, F(X) . | (1+1)-GP, F(X) . | (1+1)-GP,MO-F(X) . | |||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|

. | . | . | init . | init . | init . | init . | init . | init . | ||||||

. | . | . | m
. | . | m
. | . | m
. | . | m
. | . | m
. | . | m
. | . |

INV | 40 | 307 | 46 | 327 | 33.5 | 528 | 185.5 | 528 | 202.5 | 95 | 4 | 97 | 5.5 | |

80 | 821 | 79 | 849 | 105 | 1259 | 472 | 1269 | 473 | 189 | 8 | 191 | 6 | ||

160 | — | — | — | — | 2645 | 612 | 2688 | 627.5 | 375 | 10 | 381 | 14 | ||

LAS | 40 | 79 | 0 | — | — | 525 | 212 | 592 | 265.5 | 79 | 0 | 79 | 0 | |

80 | 159 | 0 | — | — | 1352 | 508.5 | 1401 | 526.5 | 159 | 0 | 159 | 0 | ||

160 | 319 | 0 | — | — | 2670 | 527.5 | — | — | 319 | 0 | 319 | 0 | ||

HAM | 40 | 79 | 0 | — | — | 1665 | 1042 | 1672 | 723.5 | 79 | 0 | 79 | 0 | |

80 | 159 | 0 | — | — | — | — | — | — | 159 | 0 | 159 | 0 | ||

160 | 319 | 0 | — | — | — | — | — | — | 319 | 0 | 319 | 0 | ||

EXC | 40 | 81 | 0 | — | — | 1573 | 908 | — | — | 81 | 2 | — | — | |

80 | 161 | 0 | — | — | — | — | — | — | 161 | 0.5 | — | — | ||

160 | 321 | 2 | — | — | — | — | — | — | 321 | 0 | — | — | ||

RUN | 40 | 79 | 0 | — | — | — | — | — | — | 79 | 0 | — | — | |

80 | 159 | 0 | — | — | — | — | — | — | 159 | 0 | — | — | ||

160 | 319 | 0 | — | — | — | — | — | — | 319 | 0 | — | — | ||

INV | 40 | 249 | 33 | 259 | 34 | 512 | 183 | 543 | 199.5 | 107 | 8 | 112 | 10 | |

80 | 611 | 48 | 627 | 58 | 1245 | 490 | 1308 | 435.5 | 213 | 12.5 | 219 | 14 | ||

160 | — | — | — | — | 2793 | 733 | 2821 | 700 | 419 | 18 | 437 | 22 | ||

LAS | 40 | 95 | 10 | — | — | 555 | 276.5 | 560 | 261.5 | 79 | 2 | 79 | 2 | |

80 | 187 | 10.5 | — | — | 1334 | 592 | 1382 | 420.5 | 159 | 2 | 159 | 2 | ||

160 | — | — | — | — | 2893 | 698 | 2789 | 498 | 319 | 2 | 319 | 2 | ||

HAM | 40 | 87 | 6 | — | — | 1767 | 926 | 1833 | 1042 | 79 | 2 | 79 | 2 | |

80 | 177 | 8 | — | — | — | — | — | — | 159 | 0 | 159 | 2 | ||

160 | 353 | 11.5 | — | — | — | — | — | — | 319 | 2 | 319 | 2 | ||

EXC | 40 | 93 | 6 | — | — | 1852 | 1042 | 1964 | 964.5 | 81 | 0 | 83 | 2 | |

80 | — | — | — | — | — | — | — | — | 161 | 2 | — | — | ||

160 | — | — | — | — | — | — | — | — | — | — | — | — | ||

RUN | 40 | — | — | — | — | — | — | — | — | — | — | — | — | |

80 | — | — | — | — | — | — | — | — | — | — | — | — | ||

160 | — | — | — | — | — | — | — | — | — | — | — | — |

We first analyze *T*_{max} for the single-operation variant, where a single mutation operator is applied at each step (upper half of Table 3). Here (1+1)-GP* and (1+1)-GP, MO-F(X) share similar tree sizes of about (sometimes , which is a minimum for the optimal solution on all fitness values but INV, where (1+1)-GP, MO-F(X) obtains a tree size of about on both initialization schemes and (1+1)-GP* shows a tree size close to . On the other hand, (1+1)-GP, F(X) appears cursed by bloating in all fitness functions, with tree sizes above . Nonetheless, unlike (1+1)-GP*, (1+1)-GP, F(X) seems independent on the employed initialization scheme and can reach optimal solutions for INV and LAS even with init where (1+1)-GP* fails. As for the interquartile range, (1+1)-GP, MO-F(X) appears to be the most stable algorithm with an *iqr* of zero on all fitness functions but INV. Overall, the best algorithm with respect to tree size, interquartile range, and robustness with respect to initialization schemes is (1+1)-GP with parsimony (MO-F(X) variant).

As for the multioperation variant, that is, where (1) applications of each mutation operator are executed at each step, tree sizes increase in every algorithmic variant on INV. On the other fitness functions, the negative impact of multiple operations appears especially on the F(X) variants, while (1+1)-GP, MO-F(X) is less susceptible to this parameter. Overall, the interquartile range in the tree size increases along with it. Again, with respect to tree size, interquartile range, and initialization scheme independence, the best algorithm is (1+1)-GP with parsimony (MO-F(X) variant).

#### 7.2.2 Average Case Optimization Time

Figures 1 and 2 indicate the asymptotic behavior of the investigated measures. They show

the

*distributions*of values, represented as box plots;the

*failure rate*, that is, the fraction of repetitions that did not make it to the end because of the imposed timeout or evaluations budget, represented in red; andtwo blue lines representing for each input size the medians of the distributions divided by some polynomial, whose interpretation gives an indication of the asymptotic behavior of the measure.

In order to deduce the asymptotic behavior of a measure, one must look at the polynomial line that is closest to constant (i.e., the most horizontal one). A horizontal line means that barring a multiplicative factor, the measure behaves like the corresponding polynomial, at least for the analyzed input sizes. The figures exclude the input sizes where a failure rate above did not allow computing a reliable median (and thus obtaining a reliable estimate on the asymptotic behavior).

Figures 1 and 2 show the distributions of the required number of evaluations to reach the first optimal solution for the (1+1)-GP variants, respectively, in the init and init initialization schemes.

By analyzing the results it can be noted that overall the init initialization scheme is beneficial for (1+1)-GP*, F(X) both in single- and multioperation modes, allowing it to optimize every fitness function in single-operation mode and to reach some optima for all fitness functions except RUN for multioperation mode. To the contrary, (1+1)-GP does not seem to be influenced significantly by a particular choice of initial individuals. The performances of the two initialization schemes are identical for INV across every algorithmic variant but consistently worse for init in all other fitness values, at least in terms of failure rate. In general, initializing the population with full trees appears to be an obstacle to optimization. Also, multioperation when applied with the init scheme appears to be detrimental.

The theoretical bounds are confirmed by the experiments, suggesting that they might be tight.

### 7.3 Experimental Analysis of the SMO-GP Variants

We now analyze the experimental results of the SMO-GP variants. We focus on the maximum tree size and population size during execution, and on the expected optimization time.

Table 4 shows two measurements. We list the maximum tree sizes and maximum population sizes that were observed up to the two different but connected events: (1) until the individual with optimal fitness is found, and (2) until the entire true Pareto front is represented by the population.

. | . | F(X) . | n
. | Maximum Tree Size . | Maximum Population Size . | ||||||
---|---|---|---|---|---|---|---|---|---|---|---|

. | . | . | . | to X_{opt}
. | to . | to X_{opt}
. | to . | ||||

. | . | . | . | m
. | . | m
. | . | m
. | . | m
. | . |

SMO-GP, with | init | INV | 80 | 169 | 4 | 169 | 4 | 85 | 1 | 85 | 1 |

LAS | 80 | 159 | 0 | 159 | 0 | 81 | 0 | 81 | 0 | ||

HAM | 80 | 159 | 0 | 159 | 0 | 81 | 0 | 81 | 0 | ||

EXC | 80 | 159 | 2 | 159 | 2 | 81 | 1 | 81 | 1 | ||

RUN | 80 | 159 | 0 | 159 | 0 | 81 | 0 | 81 | 0 | ||

init | INV | 80 | 173 | 6 | 173 | 6 | 86 | 2 | 86 | 2 | |

LAS | 80 | 159 | 0 | 159 | 0 | 81 | 0 | 81 | 0 | ||

HAM | 80 | 159 | 0 | 159 | 0 | 81 | 0 | 81 | 0 | ||

EXC | 80 | 159 | 2 | 159 | 2 | 81 | 1 | 81 | 1 | ||

RUN | 80 | 159 | 2 | 159 | 2 | 81 | 0 | 81 | 0 | ||

SMO-GP, with | init | INV | 80 | 183 | 8 | 183 | 8 | 89 | 2 | 89 | 2 |

LAS | 80 | 159 | 2 | 159 | 2 | 81 | 0 | 81 | 0 | ||

HAM | 80 | 159 | 2 | 159 | 2 | 81 | 0 | 81 | 0 | ||

EXC | 80 | 161 | 2 | 161 | 2 | 81 | 1 | 81 | 1 | ||

RUN | 80 | 161 | 2 | 161 | 2 | 81 | 0 | 81 | 0 | ||

init | INV | 80 | 185 | 10 | 185 | 10 | 89 | 3 | 89 | 3 | |

LAS | 80 | 159 | 2 | 159 | 2 | 81 | 0 | 81 | 0 | ||

HAM | 80 | 159 | 2 | 159 | 2 | 81 | 0 | 81 | 0 | ||

EXC | 80 | 161 | 0 | 161 | 0 | 81.5 | 1 | 81.5 | 1 | ||

RUN | 80 | 161 | 2 | 161 | 2 | 81 | 0 | 81 | 0 |

. | . | F(X) . | n
. | Maximum Tree Size . | Maximum Population Size . | ||||||
---|---|---|---|---|---|---|---|---|---|---|---|

. | . | . | . | to X_{opt}
. | to . | to X_{opt}
. | to . | ||||

. | . | . | . | m
. | . | m
. | . | m
. | . | m
. | . |

SMO-GP, with | init | INV | 80 | 169 | 4 | 169 | 4 | 85 | 1 | 85 | 1 |

LAS | 80 | 159 | 0 | 159 | 0 | 81 | 0 | 81 | 0 | ||

HAM | 80 | 159 | 0 | 159 | 0 | 81 | 0 | 81 | 0 | ||

EXC | 80 | 159 | 2 | 159 | 2 | 81 | 1 | 81 | 1 | ||

RUN | 80 | 159 | 0 | 159 | 0 | 81 | 0 | 81 | 0 | ||

init | INV | 80 | 173 | 6 | 173 | 6 | 86 | 2 | 86 | 2 | |

LAS | 80 | 159 | 0 | 159 | 0 | 81 | 0 | 81 | 0 | ||

HAM | 80 | 159 | 0 | 159 | 0 | 81 | 0 | 81 | 0 | ||

EXC | 80 | 159 | 2 | 159 | 2 | 81 | 1 | 81 | 1 | ||

RUN | 80 | 159 | 2 | 159 | 2 | 81 | 0 | 81 | 0 | ||

SMO-GP, with | init | INV | 80 | 183 | 8 | 183 | 8 | 89 | 2 | 89 | 2 |

LAS | 80 | 159 | 2 | 159 | 2 | 81 | 0 | 81 | 0 | ||

HAM | 80 | 159 | 2 | 159 | 2 | 81 | 0 | 81 | 0 | ||

EXC | 80 | 161 | 2 | 161 | 2 | 81 | 1 | 81 | 1 | ||

RUN | 80 | 161 | 2 | 161 | 2 | 81 | 0 | 81 | 0 | ||

init | INV | 80 | 185 | 10 | 185 | 10 | 89 | 3 | 89 | 3 | |

LAS | 80 | 159 | 2 | 159 | 2 | 81 | 0 | 81 | 0 | ||

HAM | 80 | 159 | 2 | 159 | 2 | 81 | 0 | 81 | 0 | ||

EXC | 80 | 161 | 0 | 161 | 0 | 81.5 | 1 | 81.5 | 1 | ||

RUN | 80 | 161 | 2 | 161 | 2 | 81 | 0 | 81 | 0 |

#### 7.3.1 Tree Size

With respect to maximum tree size we can note that with both init and init initialization schemes and both single- and multioperation variants the tree size is always very close to the theoretical minimum of except for INV, in which there is an increase of about in single operation mode and about 30%–37% in multioperation mode. The interquartile range in these data is minimal, often zero in single operation mode and up to in multioperation mode. It is worth observing that the maximum attained tree size is quite independent of the initialization scheme.

#### 7.3.2 Population Size

While *T*_{max} already appears as a factor in the computational complexity bounds for the (1+1)-GP variants, the impact of *P*_{max} on SMO-GP is not yet completely clear. It is reasonable to presume that for large populations, for instance, exponential in *n*, the expected optimization time grows because of the lower probability of selecting the correct individual to improve. Unfortunately it is not clear how often such a large population occurs, since this depends on factors such as the number of different objectives and the fitness levels for each of these objectives.

For the sorting problem, four out of five of the considered sortedness measures yield a linear number of trade-offs, hence population individuals, between fitness value and complexity. Only one of the fitness functions, namely INV, can potentially generate a quadratic number of trade-offs. However, our experiments showed that even in the case of INV the maximum population size is mostly *about n* and always linear in

*n*(see Figure 3, where the population size has been divided by and

*n*). As for the maximum population size, there is no evident correlation between the choice of a particular initialization scheme and the maximum population size.

#### 7.3.3 Average Case Optimization Time

Figure 4 shows the distribution of the expected optimization time for SMO-GP. For multiobjective algorithms the expected optimization time is the number of evaluations to reach the true Pareto front. However, since for our experiments these two measures almost always coincided, we droped the latter and promoted the comparison between (1+1)-GP variants and SMO-GP variants.

As can be seen from the plots, the theoretical bounds on HAM, EXC, and RUN are always verified, suggesting they are tight. As for INV and LAS, the polynomial lines show a strong indication toward a runtime in , mostly close to . Overall, when *n* is large, the single operation mode seems to yield better factors for the polynomials and a lower failure rate than the multioperation mode.

## 8 Summary

Existing computational complexity analyses of simple genetic programming has resulted in many insights into the inner workings. Through our theoretical and experimental investigations, we contribute to the understanding of the algorithms.

We discussed two methods for dealing with bloat that frequently occurs when using such a representation. In order to point out the differences between these two approaches, we examined different measures of sortedness that have been analyzed for evolutionary algorithms with fixed-length representations. Interestingly, our analysis for the parsimony approach shows that variable-length representations might have difficulties when dealing with simple measures of sortedness because of the presence of local optima. Contrary to this, our runtime analysis for simple multiobjective algorithms shows that they compute the whole Pareto front for all examined sortedness measures in expected polynomial time. In order to complement the theoretical results, we carried out comprehensive experimental investigations.

Crucial parameters in the theoretical analyses are the size of the largest solution encountered during the run of the algorithm, as well as the population size when dealing with multiobjective approaches. In addition, just a few runtime bounds for the multioperation variants are known so far, and the tightness of all bounds is unclear.

Our empirical investigations allow us to conjecture average case complexities where our theoretical analyses left gaps (see Tables 5 and 6):

**(1+1)-GP, F(X)**: When no bloat control is applied, the algorithm fails regularly to solve RUN. INV and LAS appear to be solvable in , while EXC and HAM are solvable in .**(1+1)-GP*, F(X)**: This situation changes quite dramatically for the worse when introducing the minimal bloat control mechanism of accepting new solutions only if they are of better fitness. INV is solved in (as theory predicted), assuming a maximum tree size (see Table 3). All other sortedness measures are unsuccessful when the initial tree already has*n*leaves. When initializing with the empty tree, the single-mutation variant achieves a runtime of on LAS, EXC, and RUN, and a runtime of on HAM.**(1+1)-GP, MO-F(X)**: Here, the combination of applying just a single mutation at a time and initializing with the empty tree is the most successful one. When initialized with trees with , the algorithm has some chance to get stuck in a local optimum on MO-HAM but still achieves an upper bound of in the average case.**SMO-GP, MO-F(X)**: All problems are solved in , except for MO-HAM, which is solved on average in . Regarding the missing proofs, it should now be easy to show the for MO-INV, assuming that the maximum population size , as supported by Figure 3.

F(X) . | (1+1)-GP*, F(X) . | (1+1)-GP, F(X) . | ||
---|---|---|---|---|

. | single . | multi . | single . | multi . |

INV | ||||

LAS | ||||

HAM | ||||

EXC | ||||

RUN |

F(X) . | (1+1)-GP*, F(X) . | (1+1)-GP, F(X) . | ||
---|---|---|---|---|

. | single . | multi . | single . | multi . |

INV | ||||

LAS | ||||

HAM | ||||

EXC | ||||

RUN |

indicates case conjectures, *T*_{init} denotes the size of the initial tree, and *T*_{max} denotes the size of the largest tree encountered during the optimization.

F(X) . | (1+1)-GP, MO-F(X) . | SMO-GP, MO-F(X) . | |
---|---|---|---|

. | single . | multi . | single/multi . |

INV | , | , | |

LAS | |||

HAM | , | ||

EXC | |||

RUN |

F(X) . | (1+1)-GP, MO-F(X) . | SMO-GP, MO-F(X) . | |
---|---|---|---|

. | single . | multi . | single/multi . |

INV | , | , | |

LAS | |||

HAM | , | ||

EXC | |||

RUN |

indicates average case conjectures.

marks a conjecture based on the idea that a single *exchange* operation can be simulated with HVL-Prime in time . *T*_{init} denotes the size of the initial tree, and *T*_{max} denotes the size of the largest tree encountered during the optimization.

Note that our results are based on an initial tree size *T*_{init}, which is always linear in *n*. Consequently, the term always dominates the *T*_{init} term suggested by theoretical results. In spite of this, it is easy to prove that the *T*_{init} term can become relevant when arbitrarily large initial trees are used.

To continue this avenue of research, it would be interesting to theoretically prove the conjectured bounds, and to investigate how the maximum tree sizes and population sizes can be bounded in different scenarios.

In general, in order to narrow the gap between theory and application, the investigated problems need to resemble real-world problems more closely. One direction that we might take is the analysis of variable-length algorithms when they are used for symbolic regression, which is one of the most relevant use cases for genetic programming.

## References

## Notes

^{2}

A cycle can be determined by starting at a position in the permutation and then following the positions by using the values at the positions as indices for the next position. We recommend that the reader write down a random permutation underneath the sorted sequence.

^{3}

The naming of our GP variants follows the conventions often used in the computational complexity analysis of evolutionary algorithms: An asterisk indicates that a strict fitness improvement over the old solution is required in order for the new solution to replace the current solution.

^{4}

The notions can be easily adjusted to other minimization/maximization problems.

^{5}

For example, the tree with the sequence of leaves (when parsed in order) can only be improved (in a single HVL-Prime step) by inserting a leaf labeled 1 at the leftmost position.

^{6}

For example, for the new element to be inserted as the leftmost node of the tree, insertion has to be chosen, then the old leftmost node has to be chosen, and then the new node has to be placed as the left sibling of the old leftmost node, not as its right sibling.

^{7}

Let random variables be independent random variables taking on values 0 or 1. Further, assume that . Then, if we let and be the expectation of *X*, then the following bound holds: .

^{8}

For the sake of readability, the special cases for and are omitted in the following.

^{9}

For the computation of the median, at least 50% of the independent runs need to be successful.