## Abstract

We describe a novel hyper-heuristic system that continuously learns over time to solve a combinatorial optimisation problem. The system continuously generates new heuristics and samples problems from its environment; and representative problems and heuristics are incorporated into a self-sustaining network of interacting entities inspired by methods in artificial immune systems. The network is plastic in both its structure and content, leading to the following properties: it exploits existing knowledge captured in the network to rapidly produce solutions; it can adapt to new problems with widely differing characteristics; and it is capable of generalising over the problem space. The system is tested on a large corpus of 3,968 new instances of 1D bin-packing problems as well as on 1,370 existing problems from the literature; it shows excellent performance in terms of the quality of solutions obtained across the datasets and in adapting to dynamically changing sets of problem instances compared to previous approaches. As the network self-adapts to sustain a minimal repertoire of both problems and heuristics that form a representative map of the problem space, the system is further shown to be computationally efficient and therefore scalable.

## 1 Introduction

The previous two decades have seen significant advances in meta-heuristic optimisation techniques that are able to quickly find optimal or near-optimal solutions to problem instances in many combinatorial optimisation domains. Techniques employed vary widely: typical meta-heuristic algorithms (e.g., evolutionary algorithms, particle swarm optimisation) operate by searching a space of potential problem solutions. Hyper-heuristic algorithms, on the other hand, operate by searching a space of heuristics that are used to either perturb existing solutions or construct completely new solutions. Considering many successful applications of both approaches, they typically operate in the same manner: an algorithm is tuned to work well on a (possibly large) set of representative problems and each time a new problem instance needs to be solved, the algorithm conducts a search of either the solution space or the heuristic space to locate good solutions. Although this often leads to acceptable solutions, such approaches have a number of weaknesses in that if the nature of the problems to be solved changes over time, then the algorithm needs to be periodically retuned. Furthermore, such approaches are likely to be inefficient, failing to exploit previously learned knowledge in the search for a solution.

In contrast, in the field of machine-learning, several contemporary learning systems employ methods that use prior knowledge when learning behaviours in new but similar tasks, leading to a recent proposal from Silver et al. (2013) that *it is now appropriate for the AI community to move beyond learning algorithms to more seriously consider the nature of systems that are capable of learning over a lifetime*. They suggest that algorithms should be capable of learning a variety of tasks over an extended period of time such that the knowledge of the tasks is retained and can be used to improve learning in the future. They name such systems *lifelong machine learning*, or *LML* systems, in accord with earlier proposals by Thrun and Pratt (1997). Silver et al. (2013) identify three essential components of an LML system: it should be able to retain and/or consolidate knowledge, that is, incorporate a long-term memory; it should selectively transfer prior knowledge when learning new tasks; and it should adopt a systems approach that ensures the effective and efficient interaction of the elements of the system. In terms of the memory of the LML, they further specify that the system should be computationally efficient when storing learned knowledge in long-term memory and that ideally, retention should occur online.

Silver et al. (2013) propose a framework for a generic LML that encompasses supervised, unsupervised, and reinforcement learning techniques, with a view to developing test applications in the robotics and agents domains. In contrast, we turn to biology for inspiration in building an LML for optimisation purposes, and in particular to the natural immune system, noting that it has properties that fulfil the three requirements for an LML system listed above. It exhibits memory that enables it to respond rapidly when faced with pathogens it has previously been exposed to; it can selectively adapt prior knowledge via clonal selection mechanisms that can rapidly adapt existing antibodies to new variants of previous pathogens and finally, it embodies a systemic approach by maintaining a repertoire of antibodies that collectively cover the space of potential pathogens.

Using this analogy, we describe an LML system for optimisation that combines inspiration from immunology with a hyper-heuristic approach to optimisation, using 1D bin-packing as an example domain. The system continuously generates new knowledge in the form of novel deterministic heuristics that produce solutions to problems; these are integrated into a network of interacting heuristics and problems: the problems incorporated in the network provide a minimal representative map of the problem space; and the heuristics generalise over the problem space, each occupying its own niche. Memory is encapsulated in the network and is exploited to rapidly find solutions to new problems. The network is plastic both in its contents and in its topology, enabling it to continuously adapt as the problems in its environment change. We show that the system not only produces effective solutions, but also responds efficiently to changing problems in terms of the response time required to obtain an effective solution.

## 2 Previous Related Work

We describe an *LML* for bin-packing that combines inspiration from both hyper-heuristics and artificial immune systems with the concept of LML set out by Silver et al. (2013). We briefly review relevant previous work in each of these domains that informs our approach, focusing on the aspects of each field that particularly relate to learning.

### 2.1 LML Systems

Systems that learn over extended periods of time are common in the machine learning literature. Silver et al. (2013) describe several examples that cover supervised, unsupervised, and reinforcement learning methods. Ruvolo and Eaton (2013) propose an efficient lifelong learning algorithm dubbed ELLA that focuses on multitask learning in machine-learning applications relating to prediction and recogniton tasks; ELLA is able to transfer previously learned knowledge to a new task and integrate new knowledge through the use of shared basis vectors in a manner that has been proven to be computationally efficient. Other examples are found in the multiagent system (Kira and Schultz, 2006) and robotics literature (Carlson et al., 2010). The potential benefits to be gleaned by incorporating concepts taken from the meta-learning community into hyper-heuristic approaches are explored in Pappa et al. (2013). However, there appears to be little literature in the optimisation field. An exception is work originated by Louis and McDonnell (2004) on case-injected genetic algorithms—CIGAR algorithms. Conjecturing that a system that combined a robust search algorithm with an associative memory could learn by experience to solve problems, they combine a genetic algorithm with a case-based memory of past problem solutions. Solutions in the memory are used to seed the population of the genetic algorithm; cases are chosen on the basis that similar problems will have similar solutions, and therefore the memory will contain building blocks that will speed up the evolution of good solutions. Using a set of design and optimisation problems, Louis and McDonnell (2004) demonstrate that their system learns to take less time to provide quality solutions to a new problem as it gains experience from solving other similar problems. The system differs from other case-based reasoning systems in that it does not use prior knowledge based on a comparison to previously encountered problems, but simply injects solutions that are deemed similar to those already in the GA population.

The CIGAR methodology can be considered to be an LML technique as it builds up a case history over time. It differs from the system proposed in this article in its use of genetic algorithms for the optimisation technique rather than a hyper-heuristic approach, and in its assumption that similar problems will have similar solutions; this limits the generality of the system to those where the similarity metric between solutions can be easily defined. In addition, the memory in a CIGAR algorithm does not adapt over time, for example, by enabling forgetting, but simply increases in size as time passes.

### 2.2 Hyper-heuristics

A hyper-heuristic is an automated method for selecting or generating heuristics to solve hard computational search problems, motivated by a desire to raise the level of generality at which search methodologies can operate (Burke et al., 2003). As opposed to typical meta-heuristic methodologies, which tend to be customised to a particular problem or a narrow class of problems, the goal of hyper-heuristics is to develop algorithms that work well across many instances from a problem class and do not require parameters to be tuned before use. Although this may lead to some trade-offs in quality compared to specifically designed meta-heuristic approaches, hyper-heuristics should be fast and exhibit good performance across a wide range of problems (Ross, 2005).

A recent categorisation of hyper-heuristics by Burke et al. (2010a) separated hyper-heuristics into two categories—heuristic *selection*, that is, choosing between existing heuristics, and heuristic *generation*, that is, creating new heuristics from components of existing ones. The same authors further classify hyper-heuristics according to the type of learning used to inform the search process. Thus, according to their definition, in an *on-line* hyper-heuristic, learning takes place while the algorithm is solving an instance of a problem. In contrast, *off-line* hyper-heuristics learn from a representative set of training instances resulting in a model that generalises to unseen problem instances. However, both categorisations suffer from the same weakness; learning ends at some point. In online systems, every time a new instance is solved, the learning process starts from scratch. Although off-line systems do generalise from a set of problems, they need to be periodically retrained if the nature of the problems changes.

With respect to heuristic generation, genetic programming has commonly been used in an off-line learning procedure to generate a set of novel heuristics that work well on a training set and are expected to generalise to unseen instances (e.g., Burke et al., 2010b, 2009, 2012). However, research on *disposable* heuristics has also been conducted (Bader-El-Den and Poli, 2008) that evolves novel heuristics for solving a single instance of a problem; this type of approach is in direct conflict to the appeal from Silver et al. (2013) to develop LML systems. Previous studies in the domain of bin packing (Burke et al., 2007b; Terashima-Marín et al., 2010) have strengthened the notion that while it is possible to create heuristics that generalise well across a range of problem instances, there is a trade-off between the generality of a heuristic and the solution quality it provides. The approach taken here is to generate sets of complementary heuristics that can be seen to generalise well across the complete set of problem instances while being individually tailored to niche areas of the problem space.

Finally, it is worth mentioning an approach from Sim and Hart (2013) in which heuristic generation is combined with a greedy selection algorithm in a learning system inspired by island models in evolution, to learn a set of heuristics that collaborate to solve representative sets of problem instances. This work differs from much previous hyper-heuristic work in trying to learn an appropriate set of heuristics rather than learn to select from a large but fixed set of heuristics, but suffers from the same criticisms that have been levelled at other off-line hyper-heuristic methods in that the learning phase must be repeated if the nature of the problem instances changes.

### 2.3 Immune Networks

As described in Section 1, the immune system exhibits many of the properties and functions desired in LML systems. Many mechanisms have been proposed as to how such functions are achieved, with no clear consensus emerging from the immunology community. Of most relevance to this work is the idiotypic network model proposed by Jerne (1974) as a possible mechanism for explaining long-term memory. Challenging existing thinking at the time, Jerne proposed that antibodies produced by the immune system interact with each other, even in the absence of pathogens, to form a self-sustaining network of collaborating cells that collectively embodies a memory of previous responses. Bersini (1999) proposed that the engineering community might benefit from developing algorithms inspired by double plasticity of the network view of the immune system: *parametric* plasticity provides an adaptive mechanism that adjusts parameters while executing a task to improve performance, whereas *structural* plasticity enables new elements to be incorporated into the network and other elements to be removed, thereby enabling the network to adapt to a time-varying environment—properties that result in an LML system.

Idiotypic network theory has been translated into a number of computational algorithms in the machine learning, optimisation, and engineering domains (see Timmis, 2007, for an overview). However, very few of these applications really address problems in the kind of complex, dynamic environments envisaged by Bersini (1999). Nasraoui et al. (2003) describe a system based on an idiotypic network for tracking evolving clusters in noisy datastreams, finding it to be both capable of learning and scalable. However, the majority of work in relevant dynamic environments lies in the robotics domain.

Idiotypic networks were first used in mobile robotics in Watanabe et al. (1998), where parametric plasticity was exploited to enable a robot to adapt its behaviours in order to fulfil a task, depending on environmental conditions. An antibody in the network described a tuple : conditions match environmental data, and the action specifies an atomic behaviour of the robot that should be executed if the antibody has high enough concentration. Connections between antibodies either further stimulate or suppress antibodies, altering their concentration and therefore their probability of selection. Early work relied on hand-coded antibodies within the network, and focused on learning connections. This was significantly improved by Whitbrook et al. (2007, 2008, 2010), who used an evolutionary algorithm in a separate learning phase to produce antibodies that are used to seed the network. Even taking this into account, the network does not have structural plasticity—the initial learning phase happens only once and hence the networks cannot adapt to significant environmental changes.

Although immune-inspired algorithms are common in the field of optimisation, the majority are applied to static optimisation problems (e.g., Kromer et al., 2012). However, some research exists relating to dynamic optimisation in which the fitness function of a problem changes over time. Nanas and de Roeck (2007) give a comprehensive overview of immune-inspired research in this area, with Trojanowski and Wierzchon (2009) providing a detailed analytical comparison on a series of benchmarks. For example, in Gaspar and Collard (2000) and in de França et al. (2005), idiotypic network inspiration results in a dynamic allocation of population size; these algorithms can thus be said to exhibit structural plasticity, in that the network itself selects which recruited nodes will survive in the population, essentially removing similar nodes to preserve diversity. Idiotypic effects also implicitly provide memory in these algorithms, by sustaining nodes within the network.

### 2.4 Immune Networks and Hyper-heuristics

An initial attempt at combining hyper-heuristics with immune inspiration from immune network theory to evolve a collaborative set of heuristcs for solving optimisation problems was described by Sim et al. (2013). To the best of our knowledge, this was the first example of an immune-inspired hyper-heuristic optimisation system and was tested using 1,370 1D bin-packing problems. This system used methods from single-node genetic programming (SNGP; Jackson, 2012a, 2012b) to generate a stream of novel heuristics. If one heuristic *H _{a}* could solve a problem from a set of problems of interest using fewer bins than another heuristic

*H*, then

_{b}*H*received a positive stimulation, proportional to the difference in bins used. Heuristics with zero stimulation were removed from the system, resulting in a network of heuristics in which a direct measure of interaction could be calculated between every pair of heuristics. Although the system provided promising results, it had a number of drawbacks. Firstly, the algorithm required each heuristic in the system to be evaluated against all of the problems in the set of interest, which can be computationally costly for large problem sets. Secondly, the algorithm required the use of a greedy procedure to remove heuristics that were subsumed by other heuristics in the system in order to limit the network size. However, this system inspired the work presented in this article, which both addresses limitations of previous work and extends it to create a continuous-learning optimisation system that is capable of learning over long periods of time in dynamic environments. Note that the term dynamic does not refer here to changing problem instances, but to dynamically changing environments that are composed of varying sets of problem instances.

_{a}## 3 Problem Definition: 1D Bin-Packing—Benchmarks and Heuristics

*b*, of fixed capacity

*c*required to accommodate a set of

*n*items with weights falling in the range while enforcing the constraint that the sum of weights in any bin does not exceed the bin capacity

*c*. The lower and upper bounds on

*b*, (

*b*and

_{l}*b*) respectively, are given by Equation (1). Any heuristic that does not return empty bins will produce, for a given problem instance,

_{u}*p*, a solution using

*b*bins where .

_{p}Studies relating to bin-packing generally use instances from one or all of the data-sets given in Table 1. In each of these sets, a number of problem instances are generated from a set of parameters, also shown in the table.

. | . | . | . | . | Number of . |
---|---|---|---|---|---|

Dataset . | Capacity (c)
. | n
. | . | . | problem . |

ds_{1} | 100, 120, 150 | 50, 100, 200, 500 | [1, 100], [20, 100], [30, 100] | — | |

ds_{3} | 100,000 | 200 | [20,000, 30,000] | — | 10 |

150 | 120, 250, 500, 1000 | [20, 100] | — | ||

1 | 60, 120, 249, 501 | [0.25, 0.5] | — | ||

ds_{2} | 1,000 | 50, 100, 200, 500 | 20, 50, 90 |

. | . | . | . | . | Number of . |
---|---|---|---|---|---|

Dataset . | Capacity (c)
. | n
. | . | . | problem . |

ds_{1} | 100, 120, 150 | 50, 100, 200, 500 | [1, 100], [20, 100], [30, 100] | — | |

ds_{3} | 100,000 | 200 | [20,000, 30,000] | — | 10 |

150 | 120, 250, 500, 1000 | [20, 100] | — | ||

1 | 60, 120, 249, 501 | [0.25, 0.5] | — | ||

ds_{2} | 1,000 | 50, 100, 200, 500 | 20, 50, 90 |

Average weight is used for dataset *ds*_{2}.

Datasets , and were introduced by Scholl et al. (1997). All have optimal solutions that differ from the lower bound given by Equation (1). However, all are known and have been solved since their introduction (Schwerin and Wäscher, 1997). The datasets and were created by generating *n* items with weights randomly sampled from a uniform distribution between the bounds given by . Dataset was created by randomly generating weights from a uniform distribution in the range given by .

All of the instances from and , introduced by Falkenauer (1996), have optimal solutions at the lower bound except for one, for which it has been proven not to exist (Gent, 1998). was created by generating *n* items with weights randomly sampled from a uniform distribution between the bounds given by . The instances in were generated such that the optimal solution has exactly three items in each bin with no free space.

In the remainder of the article, the complete set of problems described in this section obtained from the benchmark literature is referred to as problem set *A*. The suffix , for example, denotes problems from *ds*_{1} (refer to Table 1) in the benchmark set *A*. A number of deterministic heuristics are commonly used to solve these problems, and are particularly prevalent in the hyper-heuristic literature. The heuristics are described in Table 2. Note that none of the heuristics listed in the table were able to find optimal solutions to any problems from the datasets or . A number of other heuristics were examined including best fit algorithm (Garey and Johnson, 1979) and the sum of squares algorithm (Csirik et al., 1999), but were excluded from this study as they provided no further improvement when evaluated on the problem sets used.

Heuristic . | Acronym . | Summary . |
---|---|---|

First fit descending | FFD | Packs each item into the first bin that will accommodate it. If no bin is available, then a new bin is opened. |

Djang and Finch (Djang and Finch, 1998) | DJD | Packs items into a bin until it is at least one-third full. The set of up to three items which best fills the remaining space is then found with preference given to sets with the lowest cardinality. The bin is then closed and the procedure repeats using a new bin. |

DJD more tuples (Ross et al., 2002) | DJT | Works as for DJD but considers sets of up to five items after the bin is filled more than one-third full. |

Adaptive DJD (Sim et al., 2012) | ADJD | Packs items into a bin until the free space is less than or equal to three times the average size of the remaining items. It then operates as for DJD. |

Heuristic . | Acronym . | Summary . |
---|---|---|

First fit descending | FFD | Packs each item into the first bin that will accommodate it. If no bin is available, then a new bin is opened. |

Djang and Finch (Djang and Finch, 1998) | DJD | Packs items into a bin until it is at least one-third full. The set of up to three items which best fills the remaining space is then found with preference given to sets with the lowest cardinality. The bin is then closed and the procedure repeats using a new bin. |

DJD more tuples (Ross et al., 2002) | DJT | Works as for DJD but considers sets of up to five items after the bin is filled more than one-third full. |

Adaptive DJD (Sim et al., 2012) | ADJD | Packs items into a bin until the free space is less than or equal to three times the average size of the remaining items. It then operates as for DJD. |

### 3.1 Additional Novel Problems

In order to evaluate the system on a larger set of problems, a generator was implemented that facilitates generation of problem instances with similar characteristics to those described previously. The generator (Sim, 2013) attempts to generate three new problem instances from parameters obtained from each of the problems in set *A* but tries to enforce an additional constraint that the total free space summed across all bins is zero, thus increasing the complexity of the problem.^{1} Initially, new problems were generated; however, not of all the generated instances respected the free space constraint. One-hundred-forty-two instances in which the total free space was greater than one bin were removed. Of the remainining 3,968 new instances, 3,178 had optimal solutions where all bins were filled to capacity. The remaining 790 had optimal solutions at the lower bound given by Equation (1) where the free space summed across all bins was less than the capacity of one bin. The problem instances can be downloaded from the internet (Sim, 2013) along with a known optimal solution for each.

In the remainder of the article, this new problem set is denoted problem set *B*. A problem described as being from denotes a novel instance generated from parameters derived by sampling the corresponding problem instances from problem set (Scholl et al., 1997).

## 4 An LML Hyper-heuristic

The LML hyper-heuristic system proposed is composed of three main parts: a stream of problem instances, a heuristic generator, and an AIS, as illustrated in Figure 1. The system is dubbed NELLI, NEtwork for Life Long learnIng.

NELLI is designed to run continuously; problem instances can be added or removed from the system at any point. A heuristic generator akin to gene libraries in the natural immune system provides a continual source of potential heuristics. The AIS itself consists of a network of interacting problems and heuristics (akin to immune cells in the natural immune system) that interact with each other based on an affinity metric.

The immune network sustains a minimal repertoire of heuristics *and* a minimal repertoire of problems that provide a representative map of the problem space to which the system has been exposed over its lifetime. From a problem perspective, the network does not contain representatives of *all* problems from the problem stream shown in Figure 1, but a representative set that is sufficient to map the problem space. From a heuristic perspective, only heuristics that provide a unique contribution in that they produce a better result on at least one problem than any other heuristic are retained.

This is represented conceptually in Figure 2. Figure 2(a) shows a set of problems that the system is currently exposed to. Figure 2(b) shows a set of heuristics that collectively cover the problems in . The problems and are solved equally by two or more heuristics. H2 is subsumed in that it cannot solve *any* problem better than another heuristic. In Figure 2(c), H2 is removed as it does not have a niche in solving problems; problems P1 and P2 are removed as they do not have a niche in describing the problem space.^{2} A competitive exclusion effect is observed between heuristics (and also between problems) that results in efficient coverage of the problem space. A key aspect of the compression is that it significantly decreases the computation time of the method (discussed in more detail in Section 5.2.2). The mechanism by which this is achieved is described later in Section 4.1. Finally, metadynamic processes continuously generate novel heuristics and adapt the network structure. Thus, the system has the following features.

It rapidly produces solutions to new problem instances that are similar in structure to previous problem instances that the system has been exposed to.

It responds by generating new heuristics to provide solutions to new problems that differ from those previously seen.

In the next sections, we describe the key components of the LML system.

### 4.1 The Artificial Immune System

The AIS component is responsible for constructing a network of interacting heuristics and problems, and for governing the dynamic processes that enable heuristics to be incorporated or rejected from the current network. Pseudocode describing the network dynamics is given in Algorithm 1: the following variables are used in the algorithm definition and in the remainder of the article.

The set of all possible problems from the class of 1D-BPP of interest.

A subset of that contains a specific set of problems, for example, problem set

*A*or*B*.The current environment, that is, the set of problems we are currently interested in solving, that is, .

The set of all problems to which the system has been exposed during its lifetime.

The immune network, composed of a set of problems and a set of heuristics.

The set of problems currently sustained in the immune network , that is, .

The set of heuristics currently sustained in the immune network .

The algorithm captures the three essential concepts of an immune network as proposed by Varela et al. (1988)—structure, dynamics, and metadynamics. The term *structure* refers to the interactions between components of the network, in this case, problems and heuristics, as described in Section 4.2 and by Steps 4 and 5 of Algorithm 1. *Dynamics* refers to the variations in time of the concentration and affinities between components of the network, and crucially, the dynamics describes how the network adapts to itself and the environment (Steps 6--8 in Algorithm 1). Finally, the network *metadynamics* refers to a unique property of the immune system, that is, the ability to continuously produce and recruit novel components, in this case, heuristics and problems, as described in Section 4.4 and Step 8. These elements are discussed in detail in the next sections.

### 4.2 Network Structure

The network sustains a set of interacting heuristics and problems. Problems are *directly* stimulated by heuristics, and vice versa. Heuristics are *indirectly* stimulated by other heuristics.

A heuristic *h* can be stimulated by one or more problems. The total stimulation of a heuristic is the sum of its affinity with each problem in the set currently in the network . A heuristic *h* has a nonzero affinity with a problem if and only if it provides a solution that uses fewer bins than any other heuristic currently in . If this is the case, then the value of the affinity is equal to the improvement in the number of bins used by *h* compared to the next-best heuristic. If a heuristic provides the best solution for a problem *p* but one or more other heuristics give an equal result, then the affinity between problem *p* and the heuristic *h* is zero. If a heuristic *h* uses more bins than another heuristic on the problem, then the affinity between problem *p* and the heuristic *h* is also zero.

*h*currently under consideration, that is, .

Note that heuristics are directly stimulated by problems. A heuristic only survives if it is able to solve at least one problem better than any other heuristic in the system. This provides competition between heuristics that forces a heuristic to find an individual niche to ensure survival. Thus, although no quantitative value is calculated for heuristicheuristic interactions, we consider this an indirect interaction arising from the method of calculating the problemheuristic interactions.

### 4.3 Network Dynamics

In each iteration, the environment optionally changes (step 2 of Algorithm 1). This can range from adding new problem instances to the current environment to completely replacing the current environment with a new set of problem instances. In step 3, one new heuristic is generated and is made available to the network. The affinity metric described encourages diversity in the network, in sustaining heuristics that cover different parts of the problem space. In a practical application, however, it is reasonable to assume that in addition to maintaining diversity, important goals of the system should be to find (1) the set of heuristics that most efficiently covers the problem space, and (2) the set that collectively minimises the total number of bins used to solve all problems the network is exposed to.^{3} While the latter is addressed by sustaining any heuristic with nonzero affinity, the former goal requires some attention.

Previous AIS models relating to idiotypic networks generally make use of an equation first defined by Farmer et al. (1986) to govern the dynamics of addition and removal of nodes from a network. In machine-learning applications, such as data-clustering, this was quickly found to lead to population explosion (Timmis et al., 2000), later addressed by using resource limiting mechanisms (Timmis and Neal, 2001). In previous robotics applications, the situation is avoided completely by using a network of fixed size and focusing only on evolving connections. In more theoretical models (Hart, 2006) the criteria are not relevant, as the goal is simply to show that a network can be sustained. In this heuristic case, simply sustaining all heuristics that contribute to covering the heuristic space is likely to lead to a population explosion in the same manner observed in data-mining applications, as no pressure exists on the system to encourage efficiency.

### 4.4 Network Metadynamics

The proposed LML system requires a continuous source of novel heuristics to be generated. Burke et al. (2009) describe the use of genetic programming (GP) to generate new heuristics within a hyper-heuristic framework; this has been applied specifically to bin-packing (Burke et al., 2006, 2007a). Although achieving some success, the approaches suffer from the usual afflictions of GP, in that efforts must be made to control unnecessary bloat. Sim and Hart (2013) proposed the use of single node genetic programming (SNGP; Jackson, 2012a, 2012b) as an alternative method of generating new bin-packing heuristics.

SNGP differs from the conventional GP model introduced by Koza (1992) in a number of key respects:

Each individual node may be the starting point for evaluation, not only the topmost node.

Nodes may have any number of parent nodes (including none and duplicates), allowing for network structures other than trees to be formed.

No crossover is used, only mutation, which is employed as a hill climber where the mutation is undone if no improvement is achieved.

The key benefit of this method is that the heuristics are of fixed maximum size (in terms of the number of nodes). Sim and Hart (2013) showed that SNGP could successfully evolve new bin-packing heuristics that outperformed existing ones from the literature. In this study, we only use the initialisation procedure of the SNGP method to produce new heuristics, and do not apply evolutionary operators to improve the generated heuristics. The justification for this is as follows: the role of the heuristic generator is to provide a continuous source of novel material for potential integration into the network of heuristics. The network dynamics will eradicate poor heuristics, and furthermore, given the relatively small number of terminal and function nodes outlined in Table 3, heuristics of reasonable quality are likely to be generated at random. Finally, it is more efficient to improve heuristics via an evolutionary operator only once they become established in the network, thereby proving their potential.

Function nodes | |

/ | Protected divide returns −1 if denominator is 0, otherwise, the result of dividing the first operand by the second |

Returns 1 if the first operand is greater than the second or −1 otherwise | |

IGTZ | Evaluates the first operand. If it evaluates as greater than zero the result of evaluating the second operand is returned otherwise the result of evaluating the third operand is returned |

Returns 1 if the first operand is less than the second or −1 otherwise | |

X | Returns the product of two operands |

Terminal nodes | |

B1 | Packs the single largest item into the current bin returning 1 if successful or −1 otherwise |

B2 | Packs the largest combination of exactly two items into the current bin returning 1 if successful or −1 otherwise |

B2A | Packs the largest combination of up to two items into the current bin giving preference to sets of lower cardinality. Returns 1 if successful or −1 otherwise |

B3A | As for B2A but considers sets of up to three items |

B5A | As for B2A but considers sets of up to five items |

C | Returns the bin capacity |

FS | Returns the free space in the current bin |

INT | Returns a random integer value |

W1 | Packs the smallest item into the current bin returning 1 if successful, else −1 |

Function nodes | |

/ | Protected divide returns −1 if denominator is 0, otherwise, the result of dividing the first operand by the second |

Returns 1 if the first operand is greater than the second or −1 otherwise | |

IGTZ | Evaluates the first operand. If it evaluates as greater than zero the result of evaluating the second operand is returned otherwise the result of evaluating the third operand is returned |

Returns 1 if the first operand is less than the second or −1 otherwise | |

X | Returns the product of two operands |

Terminal nodes | |

B1 | Packs the single largest item into the current bin returning 1 if successful or −1 otherwise |

B2 | Packs the largest combination of exactly two items into the current bin returning 1 if successful or −1 otherwise |

B2A | Packs the largest combination of up to two items into the current bin giving preference to sets of lower cardinality. Returns 1 if successful or −1 otherwise |

B3A | As for B2A but considers sets of up to three items |

B5A | As for B2A but considers sets of up to five items |

C | Returns the bin capacity |

FS | Returns the free space in the current bin |

INT | Returns a random integer value |

W1 | Packs the smallest item into the current bin returning 1 if successful, else −1 |

Figure 3 shows an example of a hand-crafted heuristic represented in the SNGP format^{4}—this is in fact the deterministic heuristic DJD. A complete automatically initialised SNGP structure is depicted in Figure 4. A fixed set of terminal and function nodes are available to the generator and are defined in Table 3, which combines nodes according to the process outlined in Algorithm 2. The nodes selected for use in this study were derived by examining the heuristics outlined in Table 2. The simplest of these heuristics, FFD, packs each item in turn, taken in descending order of item size, into the first bin with free space that will accommodate it. FFD can be represented by a single node B1. The other heuristics used for comparison can all be represented as tree structures similar to that depicted for DJD in Figure 3. Further justification for the choice of nodes and details of SNGP can be found in Sim and Hart (2013).

### 4.5 Comparison to Previous Work

A brief comparison of NELLI to the system described in Sim et al. (2013) was conducted to highlight specific differences and improvements.

The two algorithms both utilise SNGP to provide a stream of novel heuristics as input to the system. However, this is the only similarity. The main difference between the two algorithms lies in the structure and composition of the self-sustaining network. In the previous work, the network consisted only of interacting heuristics, in which a direct measure of affinity *a* was calculated between each pair of heuristics. This measure was asymmetric, that is, . In contrast, the network sustained by NELLI consists of interacting problems and heuristics. An explicit measure of interaction is calculated between problems and heuristics which is symmetric. However, heuristics only interact indirectly through an implicit effect that excludes heuristics that do not occupy a specific niche within the problem space. The method by which the concentration of both heuristics and problems is calculated also directly results in unecessary heuristics and problems being removed, minimising the size of the network and removing the need for the greedy calculation performed in Sim et al. (2013) that was required to remove redundant heuristics.

As a result of these improvements, NELLI brings significant advantages. In addition to maintaining a set of heuristics that collaborates to cover the problem space, it also maintains a mininal set of problems that is representative of the problem space , thereby providing a map of the space. The minimal network sustained brings considerable efficiencies in computational cost; the heuristics in the NELLI network only need to be evaluated against the minimal set of problems sustained rather than the complete set of problems of interest —in Sim et al. (2013), heuristics needed to be evaluated against everything in at each iteration. This results in a system that is both efficient and scalable.

## 5 Experiments and Results

Experiments were conducted to test the following features of the system.

The utility of the hyper-heuristic system compared to single deterministic heuristics, similar hyper-heuristic approaches that use collectives of heuristics, and the best known solutions for each of the problems.

The elasticity and responsiveness of the network in terms of its ability to quickly adapt when presented with new unseen problem instances.

The ability to continually learn while retaining memory of previously encountered problem instances.

The efficiency and scalability of the system in maintaining knowledge using a minimal repertoire of network components.

Experiments were conducted using the model described by Algorithm 1 using data drawn from the two datasets described in Section 3, problem sets *A* and *B*. Unless specifically stated, the default parameters used for all experiments were as shown in Table 4. These parameters were set following an initial period of empirical investigation.

Parameter . | Description . | Value . |
---|---|---|

n _{p} | Number of problems added each iteration from | 30 |

n _{h} | Number of new heuristics added each iteration | 1 |

c_{init} | Initial concentration of added heuristics/problems | 200 |

Variation in concentration based on stimulation level | 50 | |

Maximum concentration level | 1000 |

Parameter . | Description . | Value . |
---|---|---|

n _{p} | Number of problems added each iteration from | 30 |

n _{h} | Number of new heuristics added each iteration | 1 |

c_{init} | Initial concentration of added heuristics/problems | 200 |

Variation in concentration based on stimulation level | 50 | |

Maximum concentration level | 1000 |

### 5.1 Utility of System in Comparison to Previous Approaches

Before analysing the behaviour of NELLI as an LML system, the system is benchmarked on static problem sets to obtain an indication of the quality of results it provides. Comparisons to the benchmark human-designed deterministic heuristics are provided. In terms of comparison to other hyper-heuristic appropaches, we provide comparisons to the precursor of NELLI described in Sim et al. (2013) and also to another system described in Sim and Hart (2013) in which an island model of cooperative coevolution was used to find a collaborative set of heuristics. As no other authors have used the same extensive set of problems as in this article, direct comparisons to other hyper-heuristic approaches from the literature are difficult. The most comprehensive study available is carried out by Ross et al. (2003), who evaluate their hyper-heuristic on a subset of 890 problem instances consisting of all of the problems from , and .^{5} This study used a genetic algorithm to evolve a mapping between a (partial) problem state and the best deterministic heuristic to use, that is, it focused on heuristic selection and is an off-line heuristic (i.e., it requires a training phase). We also provide a comparison to a recent study by Burke et al. (2012) that evaluated a hyper-heuristic on 90 instances taken from and . Finally, all results are compared to the best known solutions from the literature on each problem in order to obtain an absolute measure of quality.

#### 5.1.1 Problem Set *A*

Previous methods for obtaining a collaborative set of heuristics for solving bin-packing problems (Sim and Hart, 2013; Sim et al., 2013) involved a training phase, in which an algorithm was trained on a set of problems and performance was evaluated on a separate testset. Although NELLI does not have a training phase, for consistency and in order to directly compare results, we adopt the same procedure.

Problem set

*A*(1,370 problems) is split into two equal-sized sets (adding every second problem to the testset).^{6}NELLI is run for 500 iterations using the training set as the environment .

The resulting network obtained at the end of the previous step is then presented with all the problems in the testset (685) and the number of problems solved and bins utilised recorded. No further heuristics are added to the system, as shown in Table 5.

Heuristic . | Problems solved . | Extra bins . |
---|---|---|

FFD | 393 | 1,088 |

DJD | 356 | 1,216 |

DJT | 430 | 451 |

ADJD | 336 | 679 |

Heuristic . | Problems solved . | Extra bins . |
---|---|---|

FFD | 393 | 1,088 |

DJD | 356 | 1,216 |

DJT | 430 | 451 |

ADJD | 336 | 679 |

Table 6 directly compares the result obtained by NELLI to previous published work. A further experiment was run using NELLI where was set to the full set of 1,370 problems in *A* rather than a reduced set of 685 problems. To obtain a comparison to previous work, the algorithm described in Sim and Hart (2013), which utilises an island model to find a set of collaborating heuristics was run using the complete set of 1,370 problems. These results are given in Tables 7, 8, and 9, and confirm that the two systems produce solutions of identical quality on a static dataset. However, as we illustrate in the remainder of the article, NELLI has a number of advantages over previously proposed approaches. Specifically, the system is shown to be scalable; it significantly reduces computation time compared to previous approaches; and it is shown to adapt efficiently to unseen problems and rapidly changing environments while maintaining a memory of previously encountered problems.

. | Problems solved . | Extra bins . | ||||||
---|---|---|---|---|---|---|---|---|

. | Min . | Max . | Mean . | SD . | Min . | Max . | Mean . | SD . |

AIS I (Sim et al., 2013) | 554 | 559 | 556 | 1.4 | 159 | 165 | 162 | 1.4 |

Island (Sim and Hart, 2013) | 552 | 559 | 557 | 1.4 | 159 | 164 | 162 | 1.4 |

NELLI | 559 | 559 | 559 | 0 | 159 | 159 | 159 | 0 |

. | Problems solved . | Extra bins . | ||||||
---|---|---|---|---|---|---|---|---|

. | Min . | Max . | Mean . | SD . | Min . | Max . | Mean . | SD . |

Island (Sim and Hart, 2013) | 1,120 | 1,126 | 1,125 | 1.1 | 308 | 316 | 308 | 1.4 |

NELLI | 1,125 | 1,126 | 1,126 | 0.3 | 308 | 309 | 308 | 0.3 |

. | Problems solved . | Extra bins . | ||||||
---|---|---|---|---|---|---|---|---|

. | Min . | Max . | Mean . | SD . | Min . | Max . | Mean . | SD . |

Island (Sim and Hart, 2013) | 1,120 | 1,126 | 1,125 | 1.1 | 308 | 316 | 308 | 1.4 |

NELLI | 1,125 | 1,126 | 1,126 | 0.3 | 308 | 309 | 308 | 0.3 |

. | Min . | Max . | Mean . | SD . |
---|---|---|---|---|

Heuristics retained NELLI | 6 | 8 | 7.1 | 0.7 |

Problems retained NELLI | 26 | 57 | 36.9 | 6.4 |

. | Min . | Max . | Mean . | SD . |
---|---|---|---|---|

Heuristics retained NELLI | 6 | 8 | 7.1 | 0.7 |

Problems retained NELLI | 26 | 57 | 36.9 | 6.4 |

Further analysis is given in Table 10, which shows the number of problem instances solved using the specified number of bins more than the known optimum for the set of 1,370 problems. NELLI clearly outperforms the individual human-designed deterministic heuristics—many of these perform particularly poorly on certain problem instances. On the other hand, the evolved set of cooperative heuristics retained by NELLI solves 97% of problem instances using no more than one extra bin.

. | Number of problems solved requiring extra bins . | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|

Heuristic . | . | . | . | . | . | . | . | . | . | . | . |

FFD | 788 | 267 | 78 | 83 | 39 | 16 | 18 | 9 | 18 | 4 | 50 |

DJD | 716 | 281 | 119 | 58 | 48 | 36 | 10 | 16 | 23 | 3 | 60 |

DJT | 863 | 331 | 90 | 26 | 30 | 15 | 11 | 2 | 1 | 1 | 0 |

ADJD | 686 | 368 | 153 | 76 | 38 | 22 | 12 | 9 | 1 | 5 | 0 |

NELLI | 1,126 | 202 | 26 | 12 | 2 | 2 | 0 | 0 | 0 | 0 | 0 |

. | Number of problems solved requiring extra bins . | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|

Heuristic . | . | . | . | . | . | . | . | . | . | . | . |

FFD | 788 | 267 | 78 | 83 | 39 | 16 | 18 | 9 | 18 | 4 | 50 |

DJD | 716 | 281 | 119 | 58 | 48 | 36 | 10 | 16 | 23 | 3 | 60 |

DJT | 863 | 331 | 90 | 26 | 30 | 15 | 11 | 2 | 1 | 1 | 0 |

ADJD | 686 | 368 | 153 | 76 | 38 | 22 | 12 | 9 | 1 | 5 | 0 |

NELLI | 1,126 | 202 | 26 | 12 | 2 | 2 | 0 | 0 | 0 | 0 | 0 |

##### 5.1.1.1 Comparison to Other Hyper-heuristic Approaches

Ross et al. (2003) used an evolutionary algorithm to learn a mapping between the state of a partially solved problem and the heuristic that should be applied at any given time, selecting from the deterministic heuristics described in Table 2. This off-line approach requires a training phase using a subset of the data. They applied their method to 890 problems from problem set *A*. Using a training set consisting of a subset of 667 problems, they were able to solve 78.8% of the 223 problems in the unseen testset optimally and 95.4% to within one bin of optimal. In comparison, NELLI solves 83.4% of the unseen testset optimally and 96.9% to within one bin of optimal.

Burke et al. (2012) use genetic programming to produce a hyper-heuristic that generates a new heuristic for solving each of 90 of the problem instances from set *A*. They report excellent results—a success rate of 93% in finding the best known solutions. However, their approach generates 90 individual heuristics; *each* heuristic is generated following 50,000 iterations of the hyper-heuristic. That is, 4.5 million iterations in total. Applied to the same 90 problems, NELLI solves 53% optimally, 92% within one bin of optimal, and 100% within two bins: although these results cannot compete directly with Burke et al. (2012), they are obtained using only two heuristics and at most 1,080 heuristic-problem calculations. The results are in line with the defined goal of hyper-heuristics outlined in Section 2.2, that is, that hyper-heuristics should be fast and exhibit good performance across a wide range of problems. As shown in the next section, NELLI has additional advantages, in being adaptive and retaining memory.

#### 5.1.2 Problem Set *B*

The experimental procedure defined above was repeated using the new and larger problem set *B* in order to ascertain the system's performance on this new set of problems and to provide a baseline for further experimentation.

The system was executed 30 times with each run conducted over 100,000 iterations using the full set of problems as the environment and the default parameters as specified in Table 4. A summary of the results is given in Table 11, which also contrasts the results against those achieved using four human-designed deterministic heuristics. These results are analysed further in Table 12, which gives the number of problems solved using the specified number of bins greater than the known optimal by each of four deterministic heuristics and NELLI.

Heuristic . | Total bins . | Extra bins than optimal . | Problems solved optimally . |
---|---|---|---|

Optimal | 320,445 | 0 | 3,968 |

FFD | 327,563 | 7,118 | 491 |

DJD | 330,447 | 10,002 | 920 |

DJT | 325,743 | 5,298 | 1,158 |

ADJD | 323,566 | 3,121 | 1,279 |

NELLI | 322,820 | 2,375 | 1,983 |

Heuristic . | Total bins . | Extra bins than optimal . | Problems solved optimally . |
---|---|---|---|

Optimal | 320,445 | 0 | 3,968 |

FFD | 327,563 | 7,118 | 491 |

DJD | 330,447 | 10,002 | 920 |

DJT | 325,743 | 5,298 | 1,158 |

ADJD | 323,566 | 3,121 | 1,279 |

NELLI | 322,820 | 2,375 | 1,983 |

. | Number of problems solved requiring extra bins . | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|

Heuristic . | . | . | . | . | . | . | . | . | . | . | . |

FFD | 491 | 2364 | 442 | 208 | 196 | 51 | 22 | 34 | 68 | 19 | 73 |

DJD | 920 | 1552 | 468 | 248 | 191 | 100 | 92 | 66 | 57 | 34 | 240 |

DJT | 1158 | 1936 | 414 | 141 | 85 | 76 | 52 | 35 | 9 | 2 | 60 |

ADJD | 1279 | 2398 | 209 | 38 | 33 | 8 | 2 | 1 | 0 | 0 | 0 |

NELLI | 1983 | 1708 | 201 | 44 | 27 | 5 | 0 | 0 | 0 | 0 | 0 |

. | Number of problems solved requiring extra bins . | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|

Heuristic . | . | . | . | . | . | . | . | . | . | . | . |

FFD | 491 | 2364 | 442 | 208 | 196 | 51 | 22 | 34 | 68 | 19 | 73 |

DJD | 920 | 1552 | 468 | 248 | 191 | 100 | 92 | 66 | 57 | 34 | 240 |

DJT | 1158 | 1936 | 414 | 141 | 85 | 76 | 52 | 35 | 9 | 2 | 60 |

ADJD | 1279 | 2398 | 209 | 38 | 33 | 8 | 2 | 1 | 0 | 0 | 0 |

NELLI | 1983 | 1708 | 201 | 44 | 27 | 5 | 0 | 0 | 0 | 0 | 0 |

Table 12 also demonstrates the relative complexity of the problem instances in *B* when contrasted to the standard benchmarks in *A*, with respect to the standard set of deterministic heuristics. For example, on problem set *A*, FFD was shown to solve 56% of the 1,370 problem instances using the known optimal number of bins. In contrast, on problem set *B*, it only manages to solve 12% optimally. NELLI solves 82% of the problems in *A* optimally, compared to only 50% of the problem instances in *B*.

Note that the final evaluation of each of the 30 runs gave exactly the same result in terms of the number of bins required to pack each of the problems in *B* (although the heuristics and problems sustained in each run differed). One of the runs was selected at random and the results obtained by the final set of heuristics for each instance in *B* were retained for use in the remaining experiments as a benchmark for the problem set.

### 5.2 Parameter Tuning

A brief investigation of the impact of three of the main system parameters is conducted to determine their influence and justify the default settings.

#### 5.2.1 Concentration *c*_{init}

_{init}

The effect of varying the initial concentration of problems and heuristics is illustrated in Figure 5, which shows the results obtained when NELLI was run 30 times for each of . The system was halted after 100,000 iterations. Each box plot summarises the 30 runs conducted. The vertical axis shows the number of bins more than the *best* result that NELLI achieved on problem set *B* as described previously and presented in Tables 11 and 12. For , increasing the initial concentration improves performance—the increased *initial* concentration increases the time period that both heuristics and problem instances can be sustained without stimulation, thus increasing the probability of eventually finding a heuristic-problem pairing that is mutually stimulatory. However, as , the effect is reversed; newly introduced heuristics dominate due to their larger concentration, potentially suppressing previously established heuristics.

#### 5.2.2 Number of Problems Added per Iteration *n*_{p}

_{p}

*n*describing the number of problems presented to the system each iteration is key in that it has significant impact on the number of calculations that need to be made at each iteration of the algorithm. At each iteration, the number of new calculations

_{p}*C*that needs to be performed is given by:

The first term is required to determine the result of applying all heuristics in the system to the new problems just introduced. The second term determines the results of any new heuristics introduced in this iteration on all problems currently in the system.

To understand the influence of *n _{p}*, the model was executed 30 times for each of six different values of . At each iteration, the cumulative number of calculations undertaken was recorded. The model was allowed to run until the results obtained on converged to the

*best*result known for the system on problem set

*B*. Figure 6(a) summarises the results obtained over 30 runs for each parameter setting. The figure shows that increasing

*n*(that is, the number of problem instances presented each iteration) has an adverse affect, increasing the overall number of calculations required to achieve the same result. The default value of 30 appears to be a reasonable choice. Figure 6(b) shows a single run of the algorithm truncated to 20,000 calculations.

_{p}#### 5.2.3 Number of Heuristics Added per Iteration *n*_{h}

_{h}

Figure 7 shows the affect that varying *n _{h}* has on the system. For each plot, the system was executed for 50,000 iterations with , using default parameter settings, with the exception of

*n*, which was fixed for the duration of each plot as shown.

_{h}When adding a single heuristic each iteration, a smooth increase in performance is observed over time, and the system converges to the best known result, despite a slow start. Adding a larger number of heuristics per iteration improves the initial performance due to an increased probability of finding good solutions. However, over a longer time scale, performance is hindered, causing undesirable fluctuation in the collective capability of the network. In the worst case, when , the system fails to converge to the best result.

As *n _{h}* increases, the ability of individual heuristics to find niche areas of the problem space becomes more difficult due to increased competition; newly introduced heuristics are unlikely to gain any stimulation due to the decreased probability of the heuristic solving a problem better than any other heuristic, resulting in very short life-times for each heuristic and thus more unstable behaviour in the system. From a computational perspective, increasing both

*n*and

_{p}*n*also significantly increases the number of calculations required at each iteration. This further justifies the choice of as the default value.

_{h}### 5.3 Efficiency and Scalability

To determine the scalability of NELLI, in terms of number of problems in with respect to and , an experiment was conducted in which was varied, that is, . In each case, the problems in were randomly selected from problem set *B*, that is, . All other parameters were set to the default values, and the system was run for 50,000 iterations over 30 runs. Table 13 shows the mean number of problems and heuristics retained following 50,000 iterations of the system. The table also shows the ratio , that is, the fraction of the problems in the environment retained in the network, and the ratio as the size of increases, to indicate how the system scales.

. | . | . | . | . | . | . |
---|---|---|---|---|---|---|

Mean heuristics retained | 5.40 | 6.87 | 9.90 | 12.40 | 16.83 | 21.57 |

Mean problems retained | 18.73 | 23.30 | 33.45 | 41.50 | 47.40 | 59.52 |

Ratio | 18.73 | 11.65 | 6.69 | 4.15 | 2.37 | 1.50 |

Ratio | 0.29 | 0.29 | 0.30 | 0.30 | 0.35 | 0.36 |

. | . | . | . | . | . | . |
---|---|---|---|---|---|---|

Mean heuristics retained | 5.40 | 6.87 | 9.90 | 12.40 | 16.83 | 21.57 |

Mean problems retained | 18.73 | 23.30 | 33.45 | 41.50 | 47.40 | 59.52 |

Ratio | 18.73 | 11.65 | 6.69 | 4.15 | 2.37 | 1.50 |

Ratio | 0.29 | 0.29 | 0.30 | 0.30 | 0.35 | 0.36 |

As expected, as increases, the number of retained problems and heuristics increases. Note, however, that the fraction of problems retained in relation to the environment *decreases*. The problems in the environment represent a sample of problems from the larger . As increases, more of is sampled, and thus the system is better able to learn a general representation of , hence decreasing the ratio of problems required to represent it. This is also reflected in the sublinear increase in the number of heuristics required as increases, again confirming the ability of the system to find heuristics that generalise over the environment. The ratio remains almost constant, indicating the scalability of the system. Figure 8 shows a typical run from an experiment for both and .^{7}

With respect to efficiency, we return to the earlier comment that NELLI is computationally more efficient that its precursors. In the system described in Sim et al. (2013), the complete set of problems in the environment must be evaluated at each iteration [i.e., in Equation (4), the final term would be replaced with ]. In contrast, using NELLI, only the sustained subset of problems is evaluated. As is clearly shown in Table 13, for a range of values of —in an environment containing 3,968 problems, only of these are sustained, hence dramatically reducing computational complexity. Note that to obtain a solution to a new problem instance, it is necessary to apply a greedy procedure in which the performance of each of the deterministic heuristics in the system must be evaluted on the instance. Given that for , the system retains only heuristics, this does not appear to be a limiting factor.

### 5.4 Continuous Learning Capabilities

In order to demonstrate that NELLI functions effectively as a continuous learning system, it must be tested in a dynamically changing problem environment, in order to demonstrate that is is responsive to new problems and exhibits the plasticity required for the network to adapt.

#### 5.4.1 Memory and Plasticity: Response to New Problems from a Similar Dataset

Consider the case in which , that is, the set of 3,968 novel problem instances. At , consists of a set of problems drawn randomly from . Every 1,000 iterations, is replaced with a new random set of problems from *B*. Experiments are performed in which ; at each iteration, the size of and are recorded. Additionally, in order to demonstrate that the system has memory, the performance of the system against every potential problem in is tracked at each iteration. Particularly during early iterations, many of the problems in will not have been presented to the network; therefore, by measuring the hypothetical response against , it is possible to gauge whether the system is generalising from seen instances and retaining that information. As , .

The results are illustrated in Figure 9. The results are plotted both at every iteration (left-hand column) and averaged over each of the 1,000 iterations the problems are present in for. Several trends are clear.

The network is clearly plastic both in terms of the number of problems and the number of heuristics that are sustained in the network.

NELLI can generalise over ; even in the early iterations we see good performance across the entirety of when only a small fraction of it has been presented to the system.

NELLI continuously learns; the performance measured against all problems in improves over time; and the rate of learning can be increased by increasing the size of , the set of problems currently visible to the network.

NELLI sustains a useful network over time; performance never deteriorates in our experiments, provided that the parameters are set correctly; the system therefore exhibits memory.

Increasing , the number of problems in the environment, causes more difficulty at the start but has the effect of increasing the rate of learning overall. This is illustrated further in Figure 10 which summarises the results over 30 runs.

#### 5.4.2 Memory and Plasticity: Response to New Problems from Different Datasets

In order to demonstrate the systems’ learning and memory capabilities when faced with an environment in which problem characteristics vary over time, experiments are conducted using problems from and . These datasets—generated from parameters defined by Scholl et al. (1997)—are well known to have radically different properties. It is therefore unlikely that a heuristic that performs well on *ds*_{1} will generalise to *ds*_{2}.

In the following experiments, the environment is toggled alternately between and every 500 iterations. Two experiments were performed.

The system was restarted every 500 iterations to obtain a benchmark response for the current set of problems presented (equivalent to a system with no memory).

The problems in were replaced every 500 iterations, but the heuristics present were retained (in order to test whether the system retains a useful memory).

In each of the two scenarios, that is, with and without memory, we calculate the number of extra bins required to solve problems with respect to the best known solution using the heuristics present in the network every iteration. The results are given in Figure 11, which shows the results over a single typical run. In Figure 11, the blocks alternate every 500 iterations to highlight the dataset being considered. All figures are obtained from the same typical run. Figures 11(c) and 11(d) show the same information as Figures 11(a) and 11(b), but on a smaller scale. Figures 11(e) and 11(f) average the results over each 500 iteration cycle. The right-hand column is of most interest, as this shows the metric evaluated over , that is, the set of problems in the system environment that we are currently interested in solving. The left-hand column of results represents the same metric but evaluated over , that is, the set of problems that are sustained by the network as being representative of the problem space, and is shown to illustrate how the network is capable of generalising over from the problems in .

We observe the following with regard to

NELLI—with its implicit memory—always outperforms the system with no memory. Due to the retained network, the system does not have to adapt from scratch to a new environment.

Adaptation still occurs in the system with memory, demonstrating the plasticity of the network.

The memory of a dataset is sustained across cycles in which no items from that dataset are presented. This is apparent in the increasing performance on both datasets over time.

is clearly much easier than : within three presentations of samples from this dataset, NELLI has reached optimal performance (i.e., 0 bins greater than best) and sustains this performance.

^{8}

Comparing the figures in the right-hand column to those on the left that represent the same metric evaluated over , we see that performance on mirrors that of , that is, an improvement on correlates to an improvement in , confirming the generalisation capabilities of the network.

## 6 Conclusions and Future Work

We have described a continuous learning system (inspired by previous work in the artificial immune system field) that is capable of learning to solve a combinatorial optimisation problem, improves its performance over time, and adapts to changing environments. The system fuses methods from SNGP, which is used to generate novel heuristics with ideas from immune-network theory, resulting in a self-sustaining interacting network of problems and heuristics; this network is capable of adapting over time as new knowledge is presented or if the environment changes. When compared to existing approaches (Sim and Hart, 2013; Sim et al., 2013; Ross et al., 2003; Burke et al., 2006) that attempt to find sets of collaborative heuristics, the system performs equally well in terms of performance on static datasets. However, it is shown to have significant advantages in its ability to deal with dynamic data: its ability to provide a representative map of the problem space and its computational efficiency. Comparisons to the known optimal results on the suite of 5,338 instances tested also show the promise of the system. The test suite included 3,968 problems which were generated in order to provide a harder test than posed by existing benchmark problems; these problems are shown to be considerably more difficult than the standard benchmarks and are available as a resource for use by other researchers.

NELLI meets the requirements defined by Silver et al. (2013) for a lifelong machine learning system: it incorporates a long-term memory; it can selectively transfer prior knowledge when learning new tasks; and it adopts a systems approach that ensures the effective and efficient interaction of the elements of the system. Further, as specified by Silver et al. (2013), it is computationally efficient when storing learned knowledge in long-term memory and retains its knowledge online. The system is shown experimentally to be scalable in terms of the number of heuristics and problems it sustains as the size of the environment increases.

Although the system is tested using 1D bin-packing as an example domain, we believe it will generalise easily to other combinatorial optimisation problems. The underlying principle behind NELLI is that heuristics that are successful are sustained by the network. We generate constructive heuristics using SNGP to combine nodes that are explicitly designed to place one or more items into a solution. There is no requirement to limit NELLI to these types of heuristics or to generate those heuristics using SNGP. Heuristics could be hand-crafted or automatically generated using other methods. Future research could consider additional heuristic generation techniques such as in Burke et al. (2012). Similarly, the heuristics used do not have to be limited to deterministic constructive heuristics but could include improvement heuristics and stochastic methods. The main requirements of the system are that a number of heuristics can be used to solve problems from across the domain, and that potential heuristics can easily be represented (and therefore generated) using, for example, the SNGP format. Recent examples of using GP to evolve novel heuristics in the timetabling (Bader-El-Den et al., 2009) and 2D stock-cutting (Burke et al., 2010b) domains suggest that this is likely to be the case and provide promising avenues for future development.

## Acknowledgments

The authors are grateful to Prof. Peter Ross for comments and suggestions on draft versions of this paper which greatly improved it.

## References

*Lecture notes in computer science*, Vol.

*Lecture notes in computer science*

## Notes

^{2}

Although these problems have been removed from the network, they can still be solved by the system, as heuristics H1 and H3 remain in the network.

^{3}

In fact, exactly the same goals were identified in generic form by Bersini (1999).

^{4}

In this case this is also a standard GP tree.

^{5}

The authors do not include the 480 problem instances from which prove hard for the variations of DJD used.

^{6}

This ensures an even split of problem instances for each parameter setting between the training and test sets.

^{7}

As both heuristics and problems are continually added with sufficient concentration to allow them to survive for at least three iterations, then at any iteration, there will be potentially be at most three heuristics and 90 problem instances that give no added benefit to the system. Table 13 shows only and after the run finishes, where any unstimulated problems and heuristics are removed; thus, the discrepancy between the mean for is 11.65% in Table 13 and 60% in Figure 8.

^{8}

Note that experiments showed that the order in which the two datasets are presented does not have any impact on the results.