Abstract

We describe a novel hyper-heuristic system that continuously learns over time to solve a combinatorial optimisation problem. The system continuously generates new heuristics and samples problems from its environment; and representative problems and heuristics are incorporated into a self-sustaining network of interacting entities inspired by methods in artificial immune systems. The network is plastic in both its structure and content, leading to the following properties: it exploits existing knowledge captured in the network to rapidly produce solutions; it can adapt to new problems with widely differing characteristics; and it is capable of generalising over the problem space. The system is tested on a large corpus of 3,968 new instances of 1D bin-packing problems as well as on 1,370 existing problems from the literature; it shows excellent performance in terms of the quality of solutions obtained across the datasets and in adapting to dynamically changing sets of problem instances compared to previous approaches. As the network self-adapts to sustain a minimal repertoire of both problems and heuristics that form a representative map of the problem space, the system is further shown to be computationally efficient and therefore scalable.

1  Introduction

The previous two decades have seen significant advances in meta-heuristic optimisation techniques that are able to quickly find optimal or near-optimal solutions to problem instances in many combinatorial optimisation domains. Techniques employed vary widely: typical meta-heuristic algorithms (e.g., evolutionary algorithms, particle swarm optimisation) operate by searching a space of potential problem solutions. Hyper-heuristic algorithms, on the other hand, operate by searching a space of heuristics that are used to either perturb existing solutions or construct completely new solutions. Considering many successful applications of both approaches, they typically operate in the same manner: an algorithm is tuned to work well on a (possibly large) set of representative problems and each time a new problem instance needs to be solved, the algorithm conducts a search of either the solution space or the heuristic space to locate good solutions. Although this often leads to acceptable solutions, such approaches have a number of weaknesses in that if the nature of the problems to be solved changes over time, then the algorithm needs to be periodically retuned. Furthermore, such approaches are likely to be inefficient, failing to exploit previously learned knowledge in the search for a solution.

In contrast, in the field of machine-learning, several contemporary learning systems employ methods that use prior knowledge when learning behaviours in new but similar tasks, leading to a recent proposal from Silver et al. (2013) that it is now appropriate for the AI community to move beyond learning algorithms to more seriously consider the nature of systems that are capable of learning over a lifetime. They suggest that algorithms should be capable of learning a variety of tasks over an extended period of time such that the knowledge of the tasks is retained and can be used to improve learning in the future. They name such systems lifelong machine learning, or LML systems, in accord with earlier proposals by Thrun and Pratt (1997). Silver et al. (2013) identify three essential components of an LML system: it should be able to retain and/or consolidate knowledge, that is, incorporate a long-term memory; it should selectively transfer prior knowledge when learning new tasks; and it should adopt a systems approach that ensures the effective and efficient interaction of the elements of the system. In terms of the memory of the LML, they further specify that the system should be computationally efficient when storing learned knowledge in long-term memory and that ideally, retention should occur online.

Silver et al. (2013) propose a framework for a generic LML that encompasses supervised, unsupervised, and reinforcement learning techniques, with a view to developing test applications in the robotics and agents domains. In contrast, we turn to biology for inspiration in building an LML for optimisation purposes, and in particular to the natural immune system, noting that it has properties that fulfil the three requirements for an LML system listed above. It exhibits memory that enables it to respond rapidly when faced with pathogens it has previously been exposed to; it can selectively adapt prior knowledge via clonal selection mechanisms that can rapidly adapt existing antibodies to new variants of previous pathogens and finally, it embodies a systemic approach by maintaining a repertoire of antibodies that collectively cover the space of potential pathogens.

Using this analogy, we describe an LML system for optimisation that combines inspiration from immunology with a hyper-heuristic approach to optimisation, using 1D bin-packing as an example domain. The system continuously generates new knowledge in the form of novel deterministic heuristics that produce solutions to problems; these are integrated into a network of interacting heuristics and problems: the problems incorporated in the network provide a minimal representative map of the problem space; and the heuristics generalise over the problem space, each occupying its own niche. Memory is encapsulated in the network and is exploited to rapidly find solutions to new problems. The network is plastic both in its contents and in its topology, enabling it to continuously adapt as the problems in its environment change. We show that the system not only produces effective solutions, but also responds efficiently to changing problems in terms of the response time required to obtain an effective solution.

2  Previous Related Work

We describe an LML for bin-packing that combines inspiration from both hyper-heuristics and artificial immune systems with the concept of LML set out by Silver et al. (2013). We briefly review relevant previous work in each of these domains that informs our approach, focusing on the aspects of each field that particularly relate to learning.

2.1  LML Systems

Systems that learn over extended periods of time are common in the machine learning literature. Silver et al. (2013) describe several examples that cover supervised, unsupervised, and reinforcement learning methods. Ruvolo and Eaton (2013) propose an efficient lifelong learning algorithm dubbed ELLA that focuses on multitask learning in machine-learning applications relating to prediction and recogniton tasks; ELLA is able to transfer previously learned knowledge to a new task and integrate new knowledge through the use of shared basis vectors in a manner that has been proven to be computationally efficient. Other examples are found in the multiagent system (Kira and Schultz, 2006) and robotics literature (Carlson et al., 2010). The potential benefits to be gleaned by incorporating concepts taken from the meta-learning community into hyper-heuristic approaches are explored in Pappa et al. (2013). However, there appears to be little literature in the optimisation field. An exception is work originated by Louis and McDonnell (2004) on case-injected genetic algorithms—CIGAR algorithms. Conjecturing that a system that combined a robust search algorithm with an associative memory could learn by experience to solve problems, they combine a genetic algorithm with a case-based memory of past problem solutions. Solutions in the memory are used to seed the population of the genetic algorithm; cases are chosen on the basis that similar problems will have similar solutions, and therefore the memory will contain building blocks that will speed up the evolution of good solutions. Using a set of design and optimisation problems, Louis and McDonnell (2004) demonstrate that their system learns to take less time to provide quality solutions to a new problem as it gains experience from solving other similar problems. The system differs from other case-based reasoning systems in that it does not use prior knowledge based on a comparison to previously encountered problems, but simply injects solutions that are deemed similar to those already in the GA population.

The CIGAR methodology can be considered to be an LML technique as it builds up a case history over time. It differs from the system proposed in this article in its use of genetic algorithms for the optimisation technique rather than a hyper-heuristic approach, and in its assumption that similar problems will have similar solutions; this limits the generality of the system to those where the similarity metric between solutions can be easily defined. In addition, the memory in a CIGAR algorithm does not adapt over time, for example, by enabling forgetting, but simply increases in size as time passes.

2.2  Hyper-heuristics

A hyper-heuristic is an automated method for selecting or generating heuristics to solve hard computational search problems, motivated by a desire to raise the level of generality at which search methodologies can operate (Burke et al., 2003). As opposed to typical meta-heuristic methodologies, which tend to be customised to a particular problem or a narrow class of problems, the goal of hyper-heuristics is to develop algorithms that work well across many instances from a problem class and do not require parameters to be tuned before use. Although this may lead to some trade-offs in quality compared to specifically designed meta-heuristic approaches, hyper-heuristics should be fast and exhibit good performance across a wide range of problems (Ross, 2005).

A recent categorisation of hyper-heuristics by Burke et al. (2010a) separated hyper-heuristics into two categories—heuristic selection, that is, choosing between existing heuristics, and heuristic generation, that is, creating new heuristics from components of existing ones. The same authors further classify hyper-heuristics according to the type of learning used to inform the search process. Thus, according to their definition, in an on-line hyper-heuristic, learning takes place while the algorithm is solving an instance of a problem. In contrast, off-line hyper-heuristics learn from a representative set of training instances resulting in a model that generalises to unseen problem instances. However, both categorisations suffer from the same weakness; learning ends at some point. In online systems, every time a new instance is solved, the learning process starts from scratch. Although off-line systems do generalise from a set of problems, they need to be periodically retrained if the nature of the problems changes.

With respect to heuristic generation, genetic programming has commonly been used in an off-line learning procedure to generate a set of novel heuristics that work well on a training set and are expected to generalise to unseen instances (e.g., Burke et al., 2010b, 2009, 2012). However, research on disposable heuristics has also been conducted (Bader-El-Den and Poli, 2008) that evolves novel heuristics for solving a single instance of a problem; this type of approach is in direct conflict to the appeal from Silver et al. (2013) to develop LML systems. Previous studies in the domain of bin packing (Burke et al., 2007b; Terashima-Marín et al., 2010) have strengthened the notion that while it is possible to create heuristics that generalise well across a range of problem instances, there is a trade-off between the generality of a heuristic and the solution quality it provides. The approach taken here is to generate sets of complementary heuristics that can be seen to generalise well across the complete set of problem instances while being individually tailored to niche areas of the problem space.

Finally, it is worth mentioning an approach from Sim and Hart (2013) in which heuristic generation is combined with a greedy selection algorithm in a learning system inspired by island models in evolution, to learn a set of heuristics that collaborate to solve representative sets of problem instances. This work differs from much previous hyper-heuristic work in trying to learn an appropriate set of heuristics rather than learn to select from a large but fixed set of heuristics, but suffers from the same criticisms that have been levelled at other off-line hyper-heuristic methods in that the learning phase must be repeated if the nature of the problem instances changes.

2.3  Immune Networks

As described in Section 1, the immune system exhibits many of the properties and functions desired in LML systems. Many mechanisms have been proposed as to how such functions are achieved, with no clear consensus emerging from the immunology community. Of most relevance to this work is the idiotypic network model proposed by Jerne (1974) as a possible mechanism for explaining long-term memory. Challenging existing thinking at the time, Jerne proposed that antibodies produced by the immune system interact with each other, even in the absence of pathogens, to form a self-sustaining network of collaborating cells that collectively embodies a memory of previous responses. Bersini (1999) proposed that the engineering community might benefit from developing algorithms inspired by double plasticity of the network view of the immune system: parametric plasticity provides an adaptive mechanism that adjusts parameters while executing a task to improve performance, whereas structural plasticity enables new elements to be incorporated into the network and other elements to be removed, thereby enabling the network to adapt to a time-varying environment—properties that result in an LML system.

Idiotypic network theory has been translated into a number of computational algorithms in the machine learning, optimisation, and engineering domains (see Timmis, 2007, for an overview). However, very few of these applications really address problems in the kind of complex, dynamic environments envisaged by Bersini (1999). Nasraoui et al. (2003) describe a system based on an idiotypic network for tracking evolving clusters in noisy datastreams, finding it to be both capable of learning and scalable. However, the majority of work in relevant dynamic environments lies in the robotics domain.

Idiotypic networks were first used in mobile robotics in Watanabe et al. (1998), where parametric plasticity was exploited to enable a robot to adapt its behaviours in order to fulfil a task, depending on environmental conditions. An antibody in the network described a tuple : conditions match environmental data, and the action specifies an atomic behaviour of the robot that should be executed if the antibody has high enough concentration. Connections between antibodies either further stimulate or suppress antibodies, altering their concentration and therefore their probability of selection. Early work relied on hand-coded antibodies within the network, and focused on learning connections. This was significantly improved by Whitbrook et al. (2007, 2008, 2010), who used an evolutionary algorithm in a separate learning phase to produce antibodies that are used to seed the network. Even taking this into account, the network does not have structural plasticity—the initial learning phase happens only once and hence the networks cannot adapt to significant environmental changes.

Although immune-inspired algorithms are common in the field of optimisation, the majority are applied to static optimisation problems (e.g., Kromer et al., 2012). However, some research exists relating to dynamic optimisation in which the fitness function of a problem changes over time. Nanas and de Roeck (2007) give a comprehensive overview of immune-inspired research in this area, with Trojanowski and Wierzchon (2009) providing a detailed analytical comparison on a series of benchmarks. For example, in Gaspar and Collard (2000) and in de França et al. (2005), idiotypic network inspiration results in a dynamic allocation of population size; these algorithms can thus be said to exhibit structural plasticity, in that the network itself selects which recruited nodes will survive in the population, essentially removing similar nodes to preserve diversity. Idiotypic effects also implicitly provide memory in these algorithms, by sustaining nodes within the network.

2.4  Immune Networks and Hyper-heuristics

An initial attempt at combining hyper-heuristics with immune inspiration from immune network theory to evolve a collaborative set of heuristcs for solving optimisation problems was described by Sim et al. (2013). To the best of our knowledge, this was the first example of an immune-inspired hyper-heuristic optimisation system and was tested using 1,370 1D bin-packing problems. This system used methods from single-node genetic programming (SNGP; Jackson, 2012a, 2012b) to generate a stream of novel heuristics. If one heuristic Ha could solve a problem from a set of problems of interest using fewer bins than another heuristic Hb, then Ha received a positive stimulation, proportional to the difference in bins used. Heuristics with zero stimulation were removed from the system, resulting in a network of heuristics in which a direct measure of interaction could be calculated between every pair of heuristics. Although the system provided promising results, it had a number of drawbacks. Firstly, the algorithm required each heuristic in the system to be evaluated against all of the problems in the set of interest, which can be computationally costly for large problem sets. Secondly, the algorithm required the use of a greedy procedure to remove heuristics that were subsumed by other heuristics in the system in order to limit the network size. However, this system inspired the work presented in this article, which both addresses limitations of previous work and extends it to create a continuous-learning optimisation system that is capable of learning over long periods of time in dynamic environments. Note that the term dynamic does not refer here to changing problem instances, but to dynamically changing environments that are composed of varying sets of problem instances.

3  Problem Definition: 1D Bin-Packing—Benchmarks and Heuristics

The objective of the 1D bin-packing problem (1D-BPP) is to find a packing which minimises the number of containers, b, of fixed capacity c required to accommodate a set of n items with weights falling in the range while enforcing the constraint that the sum of weights in any bin does not exceed the bin capacity c. The lower and upper bounds on b, (bl and bu) respectively, are given by Equation (1). Any heuristic that does not return empty bins will produce, for a given problem instance, p, a solution using bp bins where .
formula
1

Studies relating to bin-packing generally use instances from one or all of the data-sets given in Table 1. In each of these sets, a number of problem instances are generated from a set of parameters, also shown in the table.

Table 1:
Benchmark bin-packing problems.
Number of
DatasetCapacity (c)nproblem
ds1 100, 120, 150 50, 100, 200, 500 [1, 100], [20, 100], [30, 100] —  
ds3 100,000 200 [20,000, 30,000] — 10 
 150 120, 250, 500, 1000 [20, 100] —  
 60, 120, 249, 501 [0.25, 0.5] —  
ds2 1,000 50, 100, 200, 500  20, 50, 90  
Number of
DatasetCapacity (c)nproblem
ds1 100, 120, 150 50, 100, 200, 500 [1, 100], [20, 100], [30, 100] —  
ds3 100,000 200 [20,000, 30,000] — 10 
 150 120, 250, 500, 1000 [20, 100] —  
 60, 120, 249, 501 [0.25, 0.5] —  
ds2 1,000 50, 100, 200, 500  20, 50, 90  

Average weight is used for dataset ds2.

Datasets , and were introduced by Scholl et al. (1997). All have optimal solutions that differ from the lower bound given by Equation (1). However, all are known and have been solved since their introduction (Schwerin and Wäscher, 1997). The datasets and were created by generating n items with weights randomly sampled from a uniform distribution between the bounds given by . Dataset was created by randomly generating weights from a uniform distribution in the range given by .

All of the instances from and , introduced by Falkenauer (1996), have optimal solutions at the lower bound except for one, for which it has been proven not to exist (Gent, 1998). was created by generating n items with weights randomly sampled from a uniform distribution between the bounds given by . The instances in were generated such that the optimal solution has exactly three items in each bin with no free space.

In the remainder of the article, the complete set of problems described in this section obtained from the benchmark literature is referred to as problem set A. The suffix , for example, denotes problems from ds1 (refer to Table 1) in the benchmark set A. A number of deterministic heuristics are commonly used to solve these problems, and are particularly prevalent in the hyper-heuristic literature. The heuristics are described in Table 2. Note that none of the heuristics listed in the table were able to find optimal solutions to any problems from the datasets or . A number of other heuristics were examined including best fit algorithm (Garey and Johnson, 1979) and the sum of squares algorithm (Csirik et al., 1999), but were excluded from this study as they provided no further improvement when evaluated on the problem sets used.

Table 2:
Common deterministic heuristics from the literature for solving 1D bin-packing problems.
HeuristicAcronymSummary
First fit descending FFD Packs each item into the first bin that will accommodate it. If no bin is available, then a new bin is opened. 
Djang and Finch (Djang and Finch, 1998DJD Packs items into a bin until it is at least one-third full. The set of up to three items which best fills the remaining space is then found with preference given to sets with the lowest cardinality. The bin is then closed and the procedure repeats using a new bin. 
DJD more tuples (Ross et al., 2002DJT Works as for DJD but considers sets of up to five items after the bin is filled more than one-third full. 
Adaptive DJD (Sim et al., 2012ADJD Packs items into a bin until the free space is less than or equal to three times the average size of the remaining items. It then operates as for DJD. 
HeuristicAcronymSummary
First fit descending FFD Packs each item into the first bin that will accommodate it. If no bin is available, then a new bin is opened. 
Djang and Finch (Djang and Finch, 1998DJD Packs items into a bin until it is at least one-third full. The set of up to three items which best fills the remaining space is then found with preference given to sets with the lowest cardinality. The bin is then closed and the procedure repeats using a new bin. 
DJD more tuples (Ross et al., 2002DJT Works as for DJD but considers sets of up to five items after the bin is filled more than one-third full. 
Adaptive DJD (Sim et al., 2012ADJD Packs items into a bin until the free space is less than or equal to three times the average size of the remaining items. It then operates as for DJD. 

3.1  Additional Novel Problems

In order to evaluate the system on a larger set of problems, a generator was implemented that facilitates generation of problem instances with similar characteristics to those described previously. The generator (Sim, 2013) attempts to generate three new problem instances from parameters obtained from each of the problems in set A but tries to enforce an additional constraint that the total free space summed across all bins is zero, thus increasing the complexity of the problem.1 Initially, new problems were generated; however, not of all the generated instances respected the free space constraint. One-hundred-forty-two instances in which the total free space was greater than one bin were removed. Of the remainining 3,968 new instances, 3,178 had optimal solutions where all bins were filled to capacity. The remaining 790 had optimal solutions at the lower bound given by Equation (1) where the free space summed across all bins was less than the capacity of one bin. The problem instances can be downloaded from the internet (Sim, 2013) along with a known optimal solution for each.

In the remainder of the article, this new problem set is denoted problem set B. A problem described as being from denotes a novel instance generated from parameters derived by sampling the corresponding problem instances from problem set (Scholl et al., 1997).

4  An LML Hyper-heuristic

The LML hyper-heuristic system proposed is composed of three main parts: a stream of problem instances, a heuristic generator, and an AIS, as illustrated in Figure 1. The system is dubbed NELLI, NEtwork for Life Long learnIng.

Figure 1:

A conceptual view of the system: problems are continuously added/removed from the system. The generator continuously injects new heuristics. The dynamics and metadynamics of the system result in a self-sustaining network of heuristics and problems. Solid lines show direct interactions, and dashed lines represent indirect interactions (see Section 4.2).

Figure 1:

A conceptual view of the system: problems are continuously added/removed from the system. The generator continuously injects new heuristics. The dynamics and metadynamics of the system result in a self-sustaining network of heuristics and problems. Solid lines show direct interactions, and dashed lines represent indirect interactions (see Section 4.2).

NELLI is designed to run continuously; problem instances can be added or removed from the system at any point. A heuristic generator akin to gene libraries in the natural immune system provides a continual source of potential heuristics. The AIS itself consists of a network of interacting problems and heuristics (akin to immune cells in the natural immune system) that interact with each other based on an affinity metric.

The immune network sustains a minimal repertoire of heuristics and a minimal repertoire of problems that provide a representative map of the problem space to which the system has been exposed over its lifetime. From a problem perspective, the network does not contain representatives of all problems from the problem stream shown in Figure 1, but a representative set that is sufficient to map the problem space. From a heuristic perspective, only heuristics that provide a unique contribution in that they produce a better result on at least one problem than any other heuristic are retained.

This is represented conceptually in Figure 2. Figure 2(a) shows a set of problems that the system is currently exposed to. Figure 2(b) shows a set of heuristics that collectively cover the problems in . The problems and are solved equally by two or more heuristics. H2 is subsumed in that it cannot solve any problem better than another heuristic. In Figure 2(c), H2 is removed as it does not have a niche in solving problems; problems P1 and P2 are removed as they do not have a niche in describing the problem space.2 A competitive exclusion effect is observed between heuristics (and also between problems) that results in efficient coverage of the problem space. A key aspect of the compression is that it significantly decreases the computation time of the method (discussed in more detail in Section 5.2.2). The mechanism by which this is achieved is described later in Section 4.1. Finally, metadynamic processes continuously generate novel heuristics and adapt the network structure. Thus, the system has the following features.

  • It rapidly produces solutions to new problem instances that are similar in structure to previous problem instances that the system has been exposed to.

  • It responds by generating new heuristics to provide solutions to new problems that differ from those previously seen.

Figure 2:

(a) The problems that the system is currently exposed to, . (b) The generated heuristics that cover the problems in . The problems and shown are equally solved by one or more heuristics and therefore are not required to map the problem space. The shaded heuristic is redundant, as it does not have a niche. (c) The resulting network that sustains the minimal set of problems and heuristics required to describe the space.

Figure 2:

(a) The problems that the system is currently exposed to, . (b) The generated heuristics that cover the problems in . The problems and shown are equally solved by one or more heuristics and therefore are not required to map the problem space. The shaded heuristic is redundant, as it does not have a niche. (c) The resulting network that sustains the minimal set of problems and heuristics required to describe the space.

In the next sections, we describe the key components of the LML system.

4.1  The Artificial Immune System

The AIS component is responsible for constructing a network of interacting heuristics and problems, and for governing the dynamic processes that enable heuristics to be incorporated or rejected from the current network. Pseudocode describing the network dynamics is given in Algorithm 1: the following variables are used in the algorithm definition and in the remainder of the article.

  •  The set of all possible problems from the class of 1D-BPP of interest.

  •  A subset of that contains a specific set of problems, for example, problem set A or B.

  •  The current environment, that is, the set of problems we are currently interested in solving, that is, .

  •  The set of all problems to which the system has been exposed during its lifetime.

  •  The immune network, composed of a set of problems and a set of heuristics.

  •  The set of problems currently sustained in the immune network , that is, .

  •  The set of heuristics currently sustained in the immune network .

formula

The algorithm captures the three essential concepts of an immune network as proposed by Varela et al. (1988)—structure, dynamics, and metadynamics. The term structure refers to the interactions between components of the network, in this case, problems and heuristics, as described in Section 4.2 and by Steps 4 and 5 of Algorithm 1. Dynamics refers to the variations in time of the concentration and affinities between components of the network, and crucially, the dynamics describes how the network adapts to itself and the environment (Steps 6--8 in Algorithm 1). Finally, the network metadynamics refers to a unique property of the immune system, that is, the ability to continuously produce and recruit novel components, in this case, heuristics and problems, as described in Section 4.4 and Step 8. These elements are discussed in detail in the next sections.

4.2  Network Structure

The network sustains a set of interacting heuristics and problems. Problems are directly stimulated by heuristics, and vice versa. Heuristics are indirectly stimulated by other heuristics.

A heuristic h can be stimulated by one or more problems. The total stimulation of a heuristic is the sum of its affinity with each problem in the set currently in the network . A heuristic h has a nonzero affinity with a problem if and only if it provides a solution that uses fewer bins than any other heuristic currently in . If this is the case, then the value of the affinity is equal to the improvement in the number of bins used by h compared to the next-best heuristic. If a heuristic provides the best solution for a problem p but one or more other heuristics give an equal result, then the affinity between problem p and the heuristic h is zero. If a heuristic h uses more bins than another heuristic on the problem, then the affinity between problem p and the heuristic h is also zero.

This is expressed mathematically by Equation (2), in which is the set of heuristics currently in the system, excluding the heuristic h currently under consideration, that is, .
formula
2

Note that heuristics are directly stimulated by problems. A heuristic only survives if it is able to solve at least one problem better than any other heuristic in the system. This provides competition between heuristics that forces a heuristic to find an individual niche to ensure survival. Thus, although no quantitative value is calculated for heuristicheuristic interactions, we consider this an indirect interaction arising from the method of calculating the problemheuristic interactions.

As the affinity between a problem and a heuristic is symmetrical, then the stimulation of a problem is simply the affinity between the problem and the heuristic that best solves it. A problem for which the best solution is provided by more than one heuristic receives zero stimulation. Thus, unlike heuristics, a problem can only be stimulated by one heuristic. This is expressed mathematically in Equation (3). Note that in the sum expressed in this equation, at most one term will be nonzero.
formula
3

4.3  Network Dynamics

In each iteration, the environment optionally changes (step 2 of Algorithm 1). This can range from adding new problem instances to the current environment to completely replacing the current environment with a new set of problem instances. In step 3, one new heuristic is generated and is made available to the network. The affinity metric described encourages diversity in the network, in sustaining heuristics that cover different parts of the problem space. In a practical application, however, it is reasonable to assume that in addition to maintaining diversity, important goals of the system should be to find (1) the set of heuristics that most efficiently covers the problem space, and (2) the set that collectively minimises the total number of bins used to solve all problems the network is exposed to.3 While the latter is addressed by sustaining any heuristic with nonzero affinity, the former goal requires some attention.

Previous AIS models relating to idiotypic networks generally make use of an equation first defined by Farmer et al. (1986) to govern the dynamics of addition and removal of nodes from a network. In machine-learning applications, such as data-clustering, this was quickly found to lead to population explosion (Timmis et al., 2000), later addressed by using resource limiting mechanisms (Timmis and Neal, 2001). In previous robotics applications, the situation is avoided completely by using a network of fixed size and focusing only on evolving connections. In more theoretical models (Hart, 2006) the criteria are not relevant, as the goal is simply to show that a network can be sustained. In this heuristic case, simply sustaining all heuristics that contribute to covering the heuristic space is likely to lead to a population explosion in the same manner observed in data-mining applications, as no pressure exists on the system to encourage efficiency.

4.4  Network Metadynamics

The proposed LML system requires a continuous source of novel heuristics to be generated. Burke et al. (2009) describe the use of genetic programming (GP) to generate new heuristics within a hyper-heuristic framework; this has been applied specifically to bin-packing (Burke et al., 2006, 2007a). Although achieving some success, the approaches suffer from the usual afflictions of GP, in that efforts must be made to control unnecessary bloat. Sim and Hart (2013) proposed the use of single node genetic programming (SNGP; Jackson, 2012a, 2012b) as an alternative method of generating new bin-packing heuristics.

SNGP differs from the conventional GP model introduced by Koza (1992) in a number of key respects:

  • Each individual node may be the starting point for evaluation, not only the topmost node.

  • Nodes may have any number of parent nodes (including none and duplicates), allowing for network structures other than trees to be formed.

  • No crossover is used, only mutation, which is employed as a hill climber where the mutation is undone if no improvement is achieved.

The key benefit of this method is that the heuristics are of fixed maximum size (in terms of the number of nodes). Sim and Hart (2013) showed that SNGP could successfully evolve new bin-packing heuristics that outperformed existing ones from the literature. In this study, we only use the initialisation procedure of the SNGP method to produce new heuristics, and do not apply evolutionary operators to improve the generated heuristics. The justification for this is as follows: the role of the heuristic generator is to provide a continuous source of novel material for potential integration into the network of heuristics. The network dynamics will eradicate poor heuristics, and furthermore, given the relatively small number of terminal and function nodes outlined in Table 3, heuristics of reasonable quality are likely to be generated at random. Finally, it is more efficient to improve heuristics via an evolutionary operator only once they become established in the network, thereby proving their potential.

Table 3:
Nodes used.
Function nodes 
  / Protected divide returns −1 if denominator is 0, otherwise, the result of dividing the first operand by the second 
   Returns 1 if the first operand is greater than the second or −1 otherwise 
  IGTZ Evaluates the first operand. If it evaluates as greater than zero the result of evaluating the second operand is returned otherwise the result of evaluating the third operand is returned 
   Returns 1 if the first operand is less than the second or −1 otherwise 
  X Returns the product of two operands 
Terminal nodes 
  B1 Packs the single largest item into the current bin returning 1 if successful or −1 otherwise 
  B2 Packs the largest combination of exactly two items into the current bin returning 1 if successful or −1 otherwise 
  B2A Packs the largest combination of up to two items into the current bin giving preference to sets of lower cardinality. Returns 1 if successful or −1 otherwise 
  B3A As for B2A but considers sets of up to three items 
  B5A As for B2A but considers sets of up to five items 
  C Returns the bin capacity 
  FS Returns the free space in the current bin 
  INT Returns a random integer value  
  W1 Packs the smallest item into the current bin returning 1 if successful, else −1 
Function nodes 
  / Protected divide returns −1 if denominator is 0, otherwise, the result of dividing the first operand by the second 
   Returns 1 if the first operand is greater than the second or −1 otherwise 
  IGTZ Evaluates the first operand. If it evaluates as greater than zero the result of evaluating the second operand is returned otherwise the result of evaluating the third operand is returned 
   Returns 1 if the first operand is less than the second or −1 otherwise 
  X Returns the product of two operands 
Terminal nodes 
  B1 Packs the single largest item into the current bin returning 1 if successful or −1 otherwise 
  B2 Packs the largest combination of exactly two items into the current bin returning 1 if successful or −1 otherwise 
  B2A Packs the largest combination of up to two items into the current bin giving preference to sets of lower cardinality. Returns 1 if successful or −1 otherwise 
  B3A As for B2A but considers sets of up to three items 
  B5A As for B2A but considers sets of up to five items 
  C Returns the bin capacity 
  FS Returns the free space in the current bin 
  INT Returns a random integer value  
  W1 Packs the smallest item into the current bin returning 1 if successful, else −1 

Figure 3 shows an example of a hand-crafted heuristic represented in the SNGP format4—this is in fact the deterministic heuristic DJD. A complete automatically initialised SNGP structure is depicted in Figure 4. A fixed set of terminal and function nodes are available to the generator and are defined in Table 3, which combines nodes according to the process outlined in Algorithm 2. The nodes selected for use in this study were derived by examining the heuristics outlined in Table 2. The simplest of these heuristics, FFD, packs each item in turn, taken in descending order of item size, into the first bin with free space that will accommodate it. FFD can be represented by a single node B1. The other heuristics used for comparison can all be represented as tree structures similar to that depicted for DJD in Figure 3. Further justification for the choice of nodes and details of SNGP can be found in Sim and Hart (2013).

formula
Figure 3:

DJD heuristic expressed as a tree.

Figure 3:

DJD heuristic expressed as a tree.

Figure 4:

A randomly initialised SNGP structure.

Figure 4:

A randomly initialised SNGP structure.

4.5  Comparison to Previous Work

A brief comparison of NELLI to the system described in Sim et al. (2013) was conducted to highlight specific differences and improvements.

The two algorithms both utilise SNGP to provide a stream of novel heuristics as input to the system. However, this is the only similarity. The main difference between the two algorithms lies in the structure and composition of the self-sustaining network. In the previous work, the network consisted only of interacting heuristics, in which a direct measure of affinity a was calculated between each pair of heuristics. This measure was asymmetric, that is, . In contrast, the network sustained by NELLI consists of interacting problems and heuristics. An explicit measure of interaction is calculated between problems and heuristics which is symmetric. However, heuristics only interact indirectly through an implicit effect that excludes heuristics that do not occupy a specific niche within the problem space. The method by which the concentration of both heuristics and problems is calculated also directly results in unecessary heuristics and problems being removed, minimising the size of the network and removing the need for the greedy calculation performed in Sim et al. (2013) that was required to remove redundant heuristics.

As a result of these improvements, NELLI brings significant advantages. In addition to maintaining a set of heuristics that collaborates to cover the problem space, it also maintains a mininal set of problems that is representative of the problem space , thereby providing a map of the space. The minimal network sustained brings considerable efficiencies in computational cost; the heuristics in the NELLI network only need to be evaluated against the minimal set of problems sustained rather than the complete set of problems of interest —in Sim et al. (2013), heuristics needed to be evaluated against everything in at each iteration. This results in a system that is both efficient and scalable.

5  Experiments and Results

Experiments were conducted to test the following features of the system.

  • The utility of the hyper-heuristic system compared to single deterministic heuristics, similar hyper-heuristic approaches that use collectives of heuristics, and the best known solutions for each of the problems.

  • The elasticity and responsiveness of the network in terms of its ability to quickly adapt when presented with new unseen problem instances.

  • The ability to continually learn while retaining memory of previously encountered problem instances.

  • The efficiency and scalability of the system in maintaining knowledge using a minimal repertoire of network components.

Experiments were conducted using the model described by Algorithm 1 using data drawn from the two datasets described in Section 3, problem sets A and B. Unless specifically stated, the default parameters used for all experiments were as shown in Table 4. These parameters were set following an initial period of empirical investigation.

Table 4:
Default parameter settings for experiments.
ParameterDescriptionValue
np Number of problems added each iteration from  30 
nh Number of new heuristics added each iteration  1 
cinit Initial concentration of added heuristics/problems 200 
 Variation in concentration based on stimulation level 50 
 Maximum concentration level 1000 
ParameterDescriptionValue
np Number of problems added each iteration from  30 
nh Number of new heuristics added each iteration  1 
cinit Initial concentration of added heuristics/problems 200 
 Variation in concentration based on stimulation level 50 
 Maximum concentration level 1000 

5.1  Utility of System in Comparison to Previous Approaches

Before analysing the behaviour of NELLI as an LML system, the system is benchmarked on static problem sets to obtain an indication of the quality of results it provides. Comparisons to the benchmark human-designed deterministic heuristics are provided. In terms of comparison to other hyper-heuristic appropaches, we provide comparisons to the precursor of NELLI described in Sim et al. (2013) and also to another system described in Sim and Hart (2013) in which an island model of cooperative coevolution was used to find a collaborative set of heuristics. As no other authors have used the same extensive set of problems as in this article, direct comparisons to other hyper-heuristic approaches from the literature are difficult. The most comprehensive study available is carried out by Ross et al. (2003), who evaluate their hyper-heuristic on a subset of 890 problem instances consisting of all of the problems from , and .5 This study used a genetic algorithm to evolve a mapping between a (partial) problem state and the best deterministic heuristic to use, that is, it focused on heuristic selection and is an off-line heuristic (i.e., it requires a training phase). We also provide a comparison to a recent study by Burke et al. (2012) that evaluated a hyper-heuristic on 90 instances taken from and . Finally, all results are compared to the best known solutions from the literature on each problem in order to obtain an absolute measure of quality.

5.1.1  Problem Set A

Previous methods for obtaining a collaborative set of heuristics for solving bin-packing problems (Sim and Hart, 2013; Sim et al., 2013) involved a training phase, in which an algorithm was trained on a set of problems and performance was evaluated on a separate testset. Although NELLI does not have a training phase, for consistency and in order to directly compare results, we adopt the same procedure.

  • Problem set A (1,370 problems) is split into two equal-sized sets (adding every second problem to the testset).6

  • NELLI is run for 500 iterations using the training set as the environment .

  • The resulting network obtained at the end of the previous step is then presented with all the problems in the testset (685) and the number of problems solved and bins utilised recorded. No further heuristics are added to the system, as shown in Table 5.

Table 5:
Results from single heuristics obtained on a static dataset of 685 problems taken from problem set A, compared to the best known solutions from the literature.
HeuristicProblems solvedExtra bins
FFD 393 1,088 
DJD 356 1,216 
DJT 430 451 
ADJD 336  679 
HeuristicProblems solvedExtra bins
FFD 393 1,088 
DJD 356 1,216 
DJT 430 451 
ADJD 336  679 

Table 6 directly compares the result obtained by NELLI to previous published work. A further experiment was run using NELLI where was set to the full set of 1,370 problems in A rather than a reduced set of 685 problems. To obtain a comparison to previous work, the algorithm described in Sim and Hart (2013), which utilises an island model to find a set of collaborating heuristics was run using the complete set of 1,370 problems. These results are given in Tables 7, 8, and 9, and confirm that the two systems produce solutions of identical quality on a static dataset. However, as we illustrate in the remainder of the article, NELLI has a number of advantages over previously proposed approaches. Specifically, the system is shown to be scalable; it significantly reduces computation time compared to previous approaches; and it is shown to adapt efficiently to unseen problems and rapidly changing environments while maintaining a memory of previously encountered problems.

Table 6:
Results from collaborative methods obtained on a static dataset of 685 problems taken from problem set A, compared to the best known solutions from the literature.
Problems solvedExtra bins
MinMaxMeanSDMinMaxMeanSD
AIS I (Sim et al., 2013554 559 556 1.4 159 165 162 1.4 
Island (Sim and Hart, 2013552 559 557 1.4 159 164 162 1.4 
NELLI 559 559 559 159 159 159 
Problems solvedExtra bins
MinMaxMeanSDMinMaxMeanSD
AIS I (Sim et al., 2013554 559 556 1.4 159 165 162 1.4 
Island (Sim and Hart, 2013552 559 557 1.4 159 164 162 1.4 
NELLI 559 559 559 159 159 159 
Table 7:
Results obtained using single heuristics on the complete set of 1,370 problems in problem set B, compared to the best known solutions in the literature.
HeuristicProblems solvedExtra bins
FFD 788 2,142 
DJD 716 2,409 
DJT 863 881 
ADJD 686 1,352 
HeuristicProblems solvedExtra bins
FFD 788 2,142 
DJD 716 2,409 
DJT 863 881 
ADJD 686 1,352 
Table 8:
Results obtained using collaborative methods on the complete set of 1,370 problems in problem set B, compared to the best known solutions in the literature.
Problems solvedExtra bins
MinMaxMeanSDMinMaxMeanSD
Island (Sim and Hart, 20131,120 1,126 1,125 1.1 308 316 308 1.4 
NELLI 1,125 1,126 1,126 0.3 308 309 308 0.3 
Problems solvedExtra bins
MinMaxMeanSDMinMaxMeanSD
Island (Sim and Hart, 20131,120 1,126 1,125 1.1 308 316 308 1.4 
NELLI 1,125 1,126 1,126 0.3 308 309 308 0.3 
Table 9:
Based on the results of Tables 7 and 8, the efficiency of NELLI is demonstrated in sustaining. The network using a minimal repertoire of heuristics and problem instances.
MinMaxMeanSD
Heuristics retained NELLI 7.1 0.7 
Problems retained NELLI 26 57 36.9 6.4 
MinMaxMeanSD
Heuristics retained NELLI 7.1 0.7 
Problems retained NELLI 26 57 36.9 6.4 

Further analysis is given in Table 10, which shows the number of problem instances solved using the specified number of bins more than the known optimum for the set of 1,370 problems. NELLI clearly outperforms the individual human-designed deterministic heuristics—many of these perform particularly poorly on certain problem instances. On the other hand, the evolved set of cooperative heuristics retained by NELLI solves 97% of problem instances using no more than one extra bin.

Table 10:
Extra bins () required by NELLI and four deterministic heuristics compared to the best known solutions from the literature on the complete set of 1,370 benchmark problem instances.
Number of problems solved requiring extra bins
Heuristic
FFD 788 267 78 83 39 16 18 18 50 
DJD 716 281 119 58 48 36 10 16 23 60 
DJT 863 331 90 26 30 15 11 
ADJD 686 368 153 76 38 22 12 
NELLI 1,126 202 26 12 
Number of problems solved requiring extra bins
Heuristic
FFD 788 267 78 83 39 16 18 18 50 
DJD 716 281 119 58 48 36 10 16 23 60 
DJT 863 331 90 26 30 15 11 
ADJD 686 368 153 76 38 22 12 
NELLI 1,126 202 26 12 
5.1.1.1  Comparison to Other Hyper-heuristic Approaches

Ross et al. (2003) used an evolutionary algorithm to learn a mapping between the state of a partially solved problem and the heuristic that should be applied at any given time, selecting from the deterministic heuristics described in Table 2. This off-line approach requires a training phase using a subset of the data. They applied their method to 890 problems from problem set A. Using a training set consisting of a subset of 667 problems, they were able to solve 78.8% of the 223 problems in the unseen testset optimally and 95.4% to within one bin of optimal. In comparison, NELLI solves 83.4% of the unseen testset optimally and 96.9% to within one bin of optimal.

Burke et al. (2012) use genetic programming to produce a hyper-heuristic that generates a new heuristic for solving each of 90 of the problem instances from set A. They report excellent results—a success rate of 93% in finding the best known solutions. However, their approach generates 90 individual heuristics; each heuristic is generated following 50,000 iterations of the hyper-heuristic. That is, 4.5 million iterations in total. Applied to the same 90 problems, NELLI solves 53% optimally, 92% within one bin of optimal, and 100% within two bins: although these results cannot compete directly with Burke et al. (2012), they are obtained using only two heuristics and at most 1,080 heuristic-problem calculations. The results are in line with the defined goal of hyper-heuristics outlined in Section 2.2, that is, that hyper-heuristics should be fast and exhibit good performance across a wide range of problems. As shown in the next section, NELLI has additional advantages, in being adaptive and retaining memory.

5.1.2  Problem Set B

The experimental procedure defined above was repeated using the new and larger problem set B in order to ascertain the system's performance on this new set of problems and to provide a baseline for further experimentation.

The system was executed 30 times with each run conducted over 100,000 iterations using the full set of problems as the environment and the default parameters as specified in Table 4. A summary of the results is given in Table 11, which also contrasts the results against those achieved using four human-designed deterministic heuristics. These results are analysed further in Table 12, which gives the number of problems solved using the specified number of bins greater than the known optimal by each of four deterministic heuristics and NELLI.

Table 11:
Number of bins required on the full set of 3,968 problem instances.
HeuristicTotal binsExtra bins than optimalProblems solved optimally
Optimal 320,445 3,968 
FFD 327,563 7,118  491 
DJD 330,447 10,002  920 
DJT 325,743 5,298 1,158 
ADJD 323,566 3,121 1,279 
NELLI 322,820 2,375 1,983 
HeuristicTotal binsExtra bins than optimalProblems solved optimally
Optimal 320,445 3,968 
FFD 327,563 7,118  491 
DJD 330,447 10,002  920 
DJT 325,743 5,298 1,158 
ADJD 323,566 3,121 1,279 
NELLI 322,820 2,375 1,983 
Table 12:
Extra bins () required by NELLI and four deterministic heuristics on the new set of 3,968 problem instances when compared to the known optimal values.
Number of problems solved requiring extra bins
Heuristic
FFD 491 2364 442 208 196 51 22 34 68 19 73 
DJD 920 1552 468 248 191 100 92 66 57 34 240 
DJT 1158 1936 414 141 85 76 52 35 60 
ADJD 1279 2398 209 38 33 
NELLI 1983 1708 201 44 27 
Number of problems solved requiring extra bins
Heuristic
FFD 491 2364 442 208 196 51 22 34 68 19 73 
DJD 920 1552 468 248 191 100 92 66 57 34 240 
DJT 1158 1936 414 141 85 76 52 35 60 
ADJD 1279 2398 209 38 33 
NELLI 1983 1708 201 44 27 

Table 12 also demonstrates the relative complexity of the problem instances in B when contrasted to the standard benchmarks in A, with respect to the standard set of deterministic heuristics. For example, on problem set A, FFD was shown to solve 56% of the 1,370 problem instances using the known optimal number of bins. In contrast, on problem set B, it only manages to solve 12% optimally. NELLI solves 82% of the problems in A optimally, compared to only 50% of the problem instances in B.

Note that the final evaluation of each of the 30 runs gave exactly the same result in terms of the number of bins required to pack each of the problems in B (although the heuristics and problems sustained in each run differed). One of the runs was selected at random and the results obtained by the final set of heuristics for each instance in B were retained for use in the remaining experiments as a benchmark for the problem set.

5.2  Parameter Tuning

A brief investigation of the impact of three of the main system parameters is conducted to determine their influence and justify the default settings.

5.2.1  Concentration cinit

The effect of varying the initial concentration of problems and heuristics is illustrated in Figure 5, which shows the results obtained when NELLI was run 30 times for each of . The system was halted after 100,000 iterations. Each box plot summarises the 30 runs conducted. The vertical axis shows the number of bins more than the best result that NELLI achieved on problem set B as described previously and presented in Tables 11 and 12. For , increasing the initial concentration improves performance—the increased initial concentration increases the time period that both heuristics and problem instances can be sustained without stimulation, thus increasing the probability of eventually finding a heuristic-problem pairing that is mutually stimulatory. However, as , the effect is reversed; newly introduced heuristics dominate due to their larger concentration, potentially suppressing previously established heuristics.

Figure 5:

The effect of varying the initial concentration cinit. The concentration cinit on the x axis is plotted as fraction of cmax.

Figure 5:

The effect of varying the initial concentration cinit. The concentration cinit on the x axis is plotted as fraction of cmax.

5.2.2  Number of Problems Added per Iteration np

The parameter np describing the number of problems presented to the system each iteration is key in that it has significant impact on the number of calculations that need to be made at each iteration of the algorithm. At each iteration, the number of new calculations C that needs to be performed is given by:
formula
4

The first term is required to determine the result of applying all heuristics in the system to the new problems just introduced. The second term determines the results of any new heuristics introduced in this iteration on all problems currently in the system.

To understand the influence of np, the model was executed 30 times for each of six different values of . At each iteration, the cumulative number of calculations undertaken was recorded. The model was allowed to run until the results obtained on converged to the best result known for the system on problem set B. Figure 6(a) summarises the results obtained over 30 runs for each parameter setting. The figure shows that increasing np (that is, the number of problem instances presented each iteration) has an adverse affect, increasing the overall number of calculations required to achieve the same result. The default value of 30 appears to be a reasonable choice. Figure 6(b) shows a single run of the algorithm truncated to 20,000 calculations.

Figure 6:

(a) Number of problem instances added per iteration versus heuristic problems solved to reach the best known result. (b) Cumulative number of problems solved versus the number of bins more than best.

Figure 6:

(a) Number of problem instances added per iteration versus heuristic problems solved to reach the best known result. (b) Cumulative number of problems solved versus the number of bins more than best.

5.2.3  Number of Heuristics Added per Iteration nh

Figure 7 shows the affect that varying nh has on the system. For each plot, the system was executed for 50,000 iterations with , using default parameter settings, with the exception of nh, which was fixed for the duration of each plot as shown.

Figure 7:

Effect of varying nh. Results shown averaged on every 1,000 iterations.

Figure 7:

Effect of varying nh. Results shown averaged on every 1,000 iterations.

When adding a single heuristic each iteration, a smooth increase in performance is observed over time, and the system converges to the best known result, despite a slow start. Adding a larger number of heuristics per iteration improves the initial performance due to an increased probability of finding good solutions. However, over a longer time scale, performance is hindered, causing undesirable fluctuation in the collective capability of the network. In the worst case, when , the system fails to converge to the best result.

As nh increases, the ability of individual heuristics to find niche areas of the problem space becomes more difficult due to increased competition; newly introduced heuristics are unlikely to gain any stimulation due to the decreased probability of the heuristic solving a problem better than any other heuristic, resulting in very short life-times for each heuristic and thus more unstable behaviour in the system. From a computational perspective, increasing both np and nh also significantly increases the number of calculations required at each iteration. This further justifies the choice of as the default value.

5.3  Efficiency and Scalability

To determine the scalability of NELLI, in terms of number of problems in with respect to and , an experiment was conducted in which was varied, that is, . In each case, the problems in were randomly selected from problem set B, that is, . All other parameters were set to the default values, and the system was run for 50,000 iterations over 30 runs. Table 13 shows the mean number of problems and heuristics retained following 50,000 iterations of the system. The table also shows the ratio , that is, the fraction of the problems in the environment retained in the network, and the ratio as the size of increases, to indicate how the system scales.

Table 13:
The number of heuristics and problems retained in the network as the size of the environment increases. All figures obtained over 30 runs and 50,000 iterations.
Mean heuristics retained  5.40 6.87 9.90 12.40 16.83 21.57 
Mean problems retained  18.73 23.30 33.45 41.50 47.40 59.52 
Ratio  18.73 11.65 6.69 4.15 2.37 1.50 
Ratio  0.29 0.29 0.30 0.30 0.35 0.36 
Mean heuristics retained  5.40 6.87 9.90 12.40 16.83 21.57 
Mean problems retained  18.73 23.30 33.45 41.50 47.40 59.52 
Ratio  18.73 11.65 6.69 4.15 2.37 1.50 
Ratio  0.29 0.29 0.30 0.30 0.35 0.36 

As expected, as increases, the number of retained problems and heuristics increases. Note, however, that the fraction of problems retained in relation to the environment decreases. The problems in the environment represent a sample of problems from the larger . As increases, more of is sampled, and thus the system is better able to learn a general representation of , hence decreasing the ratio of problems required to represent it. This is also reflected in the sublinear increase in the number of heuristics required as increases, again confirming the ability of the system to find heuristics that generalise over the environment. The ratio remains almost constant, indicating the scalability of the system. Figure 8 shows a typical run from an experiment for both and .7

Figure 8:

Typical runs for (a) ; (b) .

Figure 8:

Typical runs for (a) ; (b) .

With respect to efficiency, we return to the earlier comment that NELLI is computationally more efficient that its precursors. In the system described in Sim et al. (2013), the complete set of problems in the environment must be evaluated at each iteration [i.e., in Equation (4), the final term would be replaced with ]. In contrast, using NELLI, only the sustained subset of problems is evaluated. As is clearly shown in Table 13, for a range of values of —in an environment containing 3,968 problems, only of these are sustained, hence dramatically reducing computational complexity. Note that to obtain a solution to a new problem instance, it is necessary to apply a greedy procedure in which the performance of each of the deterministic heuristics in the system must be evaluted on the instance. Given that for , the system retains only heuristics, this does not appear to be a limiting factor.

5.4  Continuous Learning Capabilities

In order to demonstrate that NELLI functions effectively as a continuous learning system, it must be tested in a dynamically changing problem environment, in order to demonstrate that is is responsive to new problems and exhibits the plasticity required for the network to adapt.

5.4.1  Memory and Plasticity: Response to New Problems from a Similar Dataset

Consider the case in which , that is, the set of 3,968 novel problem instances. At , consists of a set of problems drawn randomly from . Every 1,000 iterations, is replaced with a new random set of problems from B. Experiments are performed in which ; at each iteration, the size of and are recorded. Additionally, in order to demonstrate that the system has memory, the performance of the system against every potential problem in is tracked at each iteration. Particularly during early iterations, many of the problems in will not have been presented to the network; therefore, by measuring the hypothetical response against , it is possible to gauge whether the system is generalising from seen instances and retaining that information. As , .

The results are illustrated in Figure 9. The results are plotted both at every iteration (left-hand column) and averaged over each of the 1,000 iterations the problems are present in for. Several trends are clear.

  • The network is clearly plastic both in terms of the number of problems and the number of heuristics that are sustained in the network.

  • NELLI can generalise over ; even in the early iterations we see good performance across the entirety of when only a small fraction of it has been presented to the system.

  • NELLI continuously learns; the performance measured against all problems in improves over time; and the rate of learning can be increased by increasing the size of , the set of problems currently visible to the network.

  • NELLI sustains a useful network over time; performance never deteriorates in our experiments, provided that the parameters are set correctly; the system therefore exhibits memory.

  • Increasing , the number of problems in the environment, causes more difficulty at the start but has the effect of increasing the rate of learning overall. This is illustrated further in Figure 10 which summarises the results over 30 runs.

Figure 9:

changed every 1,000 iterations to a random 100, 500, or 1,000 problems from B.

Figure 9:

changed every 1,000 iterations to a random 100, 500, or 1,000 problems from B.

Figure 10:

Number of iterations to reach the best result for different sizes of .

Figure 10:

Number of iterations to reach the best result for different sizes of .

5.4.2  Memory and Plasticity: Response to New Problems from Different Datasets

In order to demonstrate the systems’ learning and memory capabilities when faced with an environment in which problem characteristics vary over time, experiments are conducted using problems from and . These datasets—generated from parameters defined by Scholl et al. (1997)—are well known to have radically different properties. It is therefore unlikely that a heuristic that performs well on ds1 will generalise to ds2.

In the following experiments, the environment is toggled alternately between and every 500 iterations. Two experiments were performed.

  • The system was restarted every 500 iterations to obtain a benchmark response for the current set of problems presented (equivalent to a system with no memory).

  • The problems in were replaced every 500 iterations, but the heuristics present were retained (in order to test whether the system retains a useful memory).

In each of the two scenarios, that is, with and without memory, we calculate the number of extra bins required to solve problems with respect to the best known solution using the heuristics present in the network every iteration. The results are given in Figure 11, which shows the results over a single typical run. In Figure 11, the blocks alternate every 500 iterations to highlight the dataset being considered. All figures are obtained from the same typical run. Figures 11(c) and 11(d) show the same information as Figures 11(a) and 11(b), but on a smaller scale. Figures 11(e) and 11(f) average the results over each 500 iteration cycle. The right-hand column is of most interest, as this shows the metric evaluated over , that is, the set of problems in the system environment that we are currently interested in solving. The left-hand column of results represents the same metric but evaluated over , that is, the set of problems that are sustained by the network as being representative of the problem space, and is shown to illustrate how the network is capable of generalising over from the problems in .

Figure 11:

Alternating between and . Utility measured against both and .

Figure 11:

Alternating between and . Utility measured against both and .

We observe the following with regard to

  • NELLI—with its implicit memory—always outperforms the system with no memory. Due to the retained network, the system does not have to adapt from scratch to a new environment.

  • Adaptation still occurs in the system with memory, demonstrating the plasticity of the network.

  • The memory of a dataset is sustained across cycles in which no items from that dataset are presented. This is apparent in the increasing performance on both datasets over time.

  • is clearly much easier than : within three presentations of samples from this dataset, NELLI has reached optimal performance (i.e., 0 bins greater than best) and sustains this performance.8

Comparing the figures in the right-hand column to those on the left that represent the same metric evaluated over , we see that performance on mirrors that of , that is, an improvement on correlates to an improvement in , confirming the generalisation capabilities of the network.

6  Conclusions and Future Work

We have described a continuous learning system (inspired by previous work in the artificial immune system field) that is capable of learning to solve a combinatorial optimisation problem, improves its performance over time, and adapts to changing environments. The system fuses methods from SNGP, which is used to generate novel heuristics with ideas from immune-network theory, resulting in a self-sustaining interacting network of problems and heuristics; this network is capable of adapting over time as new knowledge is presented or if the environment changes. When compared to existing approaches (Sim and Hart, 2013; Sim et al., 2013; Ross et al., 2003; Burke et al., 2006) that attempt to find sets of collaborative heuristics, the system performs equally well in terms of performance on static datasets. However, it is shown to have significant advantages in its ability to deal with dynamic data: its ability to provide a representative map of the problem space and its computational efficiency. Comparisons to the known optimal results on the suite of 5,338 instances tested also show the promise of the system. The test suite included 3,968 problems which were generated in order to provide a harder test than posed by existing benchmark problems; these problems are shown to be considerably more difficult than the standard benchmarks and are available as a resource for use by other researchers.

NELLI meets the requirements defined by Silver et al. (2013) for a lifelong machine learning system: it incorporates a long-term memory; it can selectively transfer prior knowledge when learning new tasks; and it adopts a systems approach that ensures the effective and efficient interaction of the elements of the system. Further, as specified by Silver et al. (2013), it is computationally efficient when storing learned knowledge in long-term memory and retains its knowledge online. The system is shown experimentally to be scalable in terms of the number of heuristics and problems it sustains as the size of the environment increases.

Although the system is tested using 1D bin-packing as an example domain, we believe it will generalise easily to other combinatorial optimisation problems. The underlying principle behind NELLI is that heuristics that are successful are sustained by the network. We generate constructive heuristics using SNGP to combine nodes that are explicitly designed to place one or more items into a solution. There is no requirement to limit NELLI to these types of heuristics or to generate those heuristics using SNGP. Heuristics could be hand-crafted or automatically generated using other methods. Future research could consider additional heuristic generation techniques such as in Burke et al. (2012). Similarly, the heuristics used do not have to be limited to deterministic constructive heuristics but could include improvement heuristics and stochastic methods. The main requirements of the system are that a number of heuristics can be used to solve problems from across the domain, and that potential heuristics can easily be represented (and therefore generated) using, for example, the SNGP format. Recent examples of using GP to evolve novel heuristics in the timetabling (Bader-El-Den et al., 2009) and 2D stock-cutting (Burke et al., 2010b) domains suggest that this is likely to be the case and provide promising avenues for future development.

Acknowledgments

The authors are grateful to Prof. Peter Ross for comments and suggestions on draft versions of this paper which greatly improved it.

References

Bader-El-Den
,
M.
, and
Poli
,
R.
(
2008
).
Generating SAT local-search heuristics using a GP hyper-heuristic framework
. In
N.
Monmarch
,
E.-G.
Talbi
,
P.
Collet
,
M.
Schoenauer
, and
E.
Lutton
(Eds.),
Artificial evolution. Lecture notes in computer science
, Vol.
4926
(pp.
37
49
).
Berlin
:
Springer
.
Bader-El-Den
,
M.
,
Poli
,
R.
, and
Fatima
,
S.
(
2009
).
Evolving timetabling heuristics using a grammar-based genetic programming hyper-heuristic framework
.
Memetic Computing
,
1
(
3
):
205
219
.
Bersini
,
H.
(
1999
).
The endogenous double plasticity of the immune network and the inspiration to be drawn for engineering artifacts
. In
D.
Dasgupta
(Ed.),
Artificial immune systems and their applications
(pp.
22
44
).
Berlin
:
Springer
.
Burke
,
E.
,
Hyde
,
M.
, and
Kendall
,
G.
(
2006
).
Evolving bin packing heuristics with genetic programming
. In
Parallel Problem Solving from Nature, PPSN IX. Lecture Notes in Computer Science
, Vol.
4193
(pp.
860
869
).
Berlin
:
Springer
.
Burke
,
E.
,
Kendall
,
G.
,
Newall
,
J.
,
Hart
,
E.
,
Ross
,
P.
, and
Schulenburg
,
S.
(
2003
).
Hyper-heuristics: An emerging direction in modern search technology
. In
Handbook of metaheuristics
,
Chap. 16
(pp.
457
474
).
Dordrecht, The Netherlands
:
Kluwer
.
Burke
,
E. K.
,
Hyde
,
M.
,
Kendall
,
G.
,
Ochoa
,
G.
,
Özcan
,
E.
, and
Woodward
,
J. R.
(
2010a
).
A classification of hyper-heuristic approaches
. In
M.
Gendreau
and
J.-Y.
Potvin
(Eds.),
Handbook of metaheuristics
(pp.
449
468
).
Berlin
:
Springer
.
Burke
,
E. K.
,
Hyde
,
M.
,
Kendall
,
G.
, and
Woodward
,
J.
(
2007a
).
Scalability of evolved online bin packing heuristics
. In
IEEE Congress on Evolutionary Computation (CEC07)
, pp.
2530
2537
.
Burke
,
E. K.
,
Hyde
,
M.
,
Kendall
,
G.
, and
Woodward
,
J.
(
2010b
).
A genetic programming hyper-heuristic approach for evolving 2-D strip packing heuristics
.
IEEE Transactions on Evolutionary Computation
,
14
(
6
):
942
958
.
Burke
,
E. K.
,
Hyde
,
M. R.
,
Kendall
,
G.
,
Ochoa
,
G.
,
Özcan
,
E.
, and
Woodward
,
J. R.
(
2009
).
Exploring hyper-heuristic methodologies with genetic programming
. In
J.
Kacprzyk
and
L. C.
Jain
(Eds.),
Computational intelligence
(pp.
177
201
).
Berlin
:
Springer
.
Burke
,
E. K.
,
Hyde
,
M. R.
,
Kendall
,
G.
, and
Woodward
,
J.
(
2007b
).
Automatic heuristic generation with genetic programming: Evolving a jack-of-all-trades or a master of one
. In
Proceedings of the 9th Annual Conference on Genetic and Evolutionary Computation, GECCO ’07
, pp.
1559
1565
.
Burke
,
E. K.
,
Hyde
,
M. R.
,
Kendall
,
G.
, and
Woodward
,
J.
(
2012
).
Automating the packing heuristic design process with genetic programming
.
Evolutionary Computation
,
20
(
1
):
63
89
.
Carlson
,
A.
,
Betteridge
,
J.
,
Kisiel
,
B.
,
Settles
,
B.
,
Hruschka
,
E. R.
, and
Mitchell
,
T. M.
(
2010
).
Toward an architecture for never-ending language learning
. In
AAAI Conference on Artificial Intelligence
, pp.
1306
1313
.
Csirik
,
J.
,
Johnson
,
D. S.
,
Kenyon
,
C.
,
Shor
,
P. W.
, and
Weber
,
R. R.
(
1999
).
A self organizing bin packing heuristic
. In
Selected Papers from the International Workshop on Algorithm Engineering and Experimentation, ALENEX ’99
, pp.
246
265
.
Djang
,
P. A.
, and
Finch
,
P. R.
(
1998
).
Solving one dimensional bin packing problems
.
Available as
http://www.zianet.com/pdjang/binpack/paper.zip
Falkenauer
,
E.
(
1996
).
A hybrid grouping genetic algorithm for bin packing
.
Journal of Heuristics
,
2
(
1
):
5
30
.
Farmer
,
J. D.
,
Packard
,
N. H.
, and
Perelson
,
A. S.
(
1986
).
The immune system, adaptation, and machine learning
.
Physica D: Nonlinear Phenomena
,
2
(
1–3
):
187
204
.
de França
,
F. O.
,
Von Zuben
,
F. J.
, and
de Castro
,
L. N.
(
2005
).
An artificial immune network for multimodal function optimization on dynamic environments
. In
Proceedings of the Genetic and Evolutionary Computation Conference, GECCO 2005
, pp.
289
296
.
Garey
,
M. R.
, and
Johnson
,
D. S.
(
1979
).
Computers and intractability: A guide to the theory of NP-completeness
.
San Francisco
:
Freeman
.
Gaspar
,
A.
, and
Collard
,
P.
(
2000
).
Two models of immunization for time dependent optimization
. In
Proceedings of the 2000 IEEE International Conference on Systems, Man, and Cybernetics
, pp.
113
118
.
Gent
,
I. P.
(
1998
).
Heuristic solution of open bin packing problems
.
Journal of Heuristics
,
3
(
4
):
299
304
.
Hart
,
E.
(
2006
).
Analysis of a growth model for idiotypic networks
. In
H.
Bersini
and
J.
Carneiro
(Eds.),
Artificial immune systems. Lecture notes in computer science
, Vol.
4163
(pp.
66
80
).
Berlin
:
Springer
.
Jackson
,
D.
(
2012a
).
A new, node-focused model for genetic programming
. In
A.
Moraglio
,
S.
Silva
,
K.
Krawiec
,
P.
Machado
, and
C.
Cotta
(Eds.),
Genetic programming
, Lecture notes in computer science, Vol.
7244
(pp.
49
60
).
Berlin
:
Springer
.
Jackson
,
D.
(
2012b
).
Single node genetic programming on problems with side effects
. In
C.
Coello
,
V.
Cutello
,
K.
Deb
,
S.
Forrest
,
G.
Nicosia
, and
M.
Pavone
(Eds.),
Parallel problem solving from nature, PPSN: XII. Lecture notes in computer science
, Vol.
7491
(pp.
327
336
).
Berlin
:
Springer
.
Jerne
,
N. K.
(
1974
).
Towards a network theory of the immune system
.
Annales d’Immundogie
,
125
(
1–2
):
373
89
.
Kira
,
Z.
, and
Schultz
,
A.
(
2006
).
Continuous and embedded learning for multi-agent systems
. In
Proceedings of the 2006 IEEE/RSJ International Conference on Intelligent Robots and Systems
, pp.
3184
3190
.
Koza
,
J. R.
(
1992
).
Genetic programming: On the programming of computers by means of natural selection
.
Cambridge, MA
:
MIT Press
.
Kromer
,
P.
,
Platos
,
J.
, and
Snasel
,
V.
(
2012
).
Practical results of artificial immune systems for combinatorial optimization problems
. In
Proceedings of the Fourth World Conference on Nature and Biologically Inspired Computing, NaBIC 2012
, pp.
194
199
.
Louis
,
S.
, and
McDonnell
,
J.
(
2004
).
Learning with case-injected genetic algorithms
.
IEEE Transactions on Evolutionary Computation
,
8
(
4
):
316
328
.
Nanas
,
N.
, and
de Roeck
,
A.
(
2007
).
Multimodal dynamic optimization: From evolutionary algorithms to artificial immune systems
. In
Proceedings of the 6th International Conference on Artificial Immune Systems
,
ICARIS’07
, pp.
13
24
.
Berlin
:
Springer-Verlag
.
Nasraoui
,
O.
,
Uribe
,
C.
,
Coronel
,
C.
, and
Gonzalez
,
F.
(
2003
).
Tecno-streams: Tracking evolving clusters in noisy data streams with a scalable immune system learning model
. In
Proceedings of the Third IEEE International Conference on Data Mining. ICDM 2003
, pp.
235
242
.
Pappa
,
G.
,
Ochoa
,
G.
,
Hyde
,
M.
,
Freitas
,
A.
,
Woodward
,
J.
, and
Swan
,
J.
(
2013
).
Contrasting meta-learning and hyper-heuristic research: The role of evolutionary algorithms
.
Genetic Programming and Evolvable Machines
, 15(1):
1
33
.
Ross
,
P.
(
2005
).
Hyper-heuristics
. In
E. K.
Burke
,
G.
Kendall
(Eds.),
Search methodologies: Introductory tutorials in optimization and decision support techniques
(pp.
529
556
).
Berlin
:
Springer-Verlag
.
Ross
,
P.
,
Marn-Blzquez
,
J.
,
Schulenburg
,
S.
, and
Hart
,
E.
(
2003
).
Learning a procedure that can solve hard bin-packing problems: A new ga-based approach to hyper-heuristics
. In
E.
Canta-Paz
,
J.
Foster
,
K.
Deb
,
L.
Davis
,
R.
Roy
,
U.-M.
O'Reilly
,
H.-G.
Beyer
,
R.
Standish
,
G.
Kendall
,
S.
Wilson
,
M.
Harman
,
J.
Wegener
,
D.
Dasgupta
,
M.
Potter
,
A.
Schultz
,
K.
Dowsland
,
N.
Jonoska
, and
J.
Miller
(Eds.),
Genetic and evolutionary computation, GECCO 2003. Lecture notes in computer science
, Vol.
2724
(pp.
1295
1306
).
Berlin
:
Springer
.
Ross
,
P.
,
Schulenburg
,
S.
,
Marín-Blázquez
,
J. G.
, and
Hart
,
E.
(
2002
).
Hyper-heuristics: Learning to combine simple heuristics in bin-packing problems
. In
Proceedings of the Genetic and Evolutionary Computation Conference, GECCO ’02
, pp.
942
948
.
Ruvolo
,
P.
, and
Eaton
,
E.
(
2013
).
ELLA: An efficient lifelong learning algorithm
.
Journal of Machine Learning Research
,
28
(
1
):
507
515
.
Scholl
,
A.
,
Klein
,
R.
, and
Jürgens
,
C.
(
1997
).
Bison: A fast hybrid procedure for exactly solving the one-dimensional bin packing problem
.
Computers & Operations Research
,
24
(
7
):
627
645
.
Schwerin
,
P.
, and
Wäscher
,
G.
(
1997
).
The bin-packing problem: A problem generator and some numerical experiments with FFD packing and MTP
.
International Transactions in Operational Research
,
4
(
5–6
):
377
389
.
Silver
,
D.
,
Yang
,
Q.
, and
Li
,
L.
(
2013
).
Lifelong machine learning systems: Beyond learning algorithms
. Paper presented at the
AAAI Spring Symposium Series
,
Stanford University, Stanford, CA
.
Sim
,
K.
(
2013
).
Bin-packing generator
. http://www.soc.napier.ac.uk/cs378/bpp/
Sim
,
K.
, and
Hart
,
E.
(
2013
).
Generating single and multiple cooperative heuristics for the one dimensional bin packing problem using a single node genetic programming island model
. In
Proceedings of GECCO 2013
, pp.
1549
1556
.
Sim
,
K.
,
Hart
,
E.
, and
Paechter
,
B.
(
2012
).
A hyper-heuristic classifier for one dimensional bin packing problems: Improving classification accuracy by attribute evolution
. In
C.
Coello
,
V.
Cutello
,
K.
Deb
,
S.
Forrest
,
G.
Nicosia
, and
M.
Pavone
(Eds.),
Parallel problem solving from nature, PPSN XII
,
Lecture notes in computer science
, Volume
7492
(pp.
348
357
).
Berlin
:
Springer
.
Sim
,
K.
,
Hart
,
E.
, and
Paechter
,
B.
(
2013
).
Learning to solve bin packing problems with an immune inspired hyper-heuristic
. In
Proceedings of the 12th European Conference on ALife, ECAL 2013
, pp.
856
863
.
Terashima-Marín
,
H.
,
Ross
,
P.
,
Farías-Zárate
,
C.
,
López-Camacho
,
E.
, and
Valenzuela-Rendón
,
M.
(
2010
).
Generalized hyper-heuristics for solving 2D regular and irregular packing problems
.
Annals of Operations Research
,
179
(
1
):
369
392
.
Thrun
,
S.
, and
Pratt
,
L.
(
1997
).
Learning to learn
.
Dordrecht, The Netherlands
:
Kluwer
.
Timmis
,
J.
(
2007
).
Artificial immune systems—Today and tomorrow
.
Natural Computing
,
6
(
1
):
1
18
.
Timmis
,
J.
, and
Neal
,
M.
(
2001
).
A resource limited artificial immune system for data analysis
.
Knowledge-Based Systems
,
14
(
34
):
121
130
.
Timmis
,
J.
,
Neal
,
M.
, and
Hunt
,
J.
(
2000
).
An artificial immune system for data analysis
.
Biosystems
,
55
(
13
):
143
150
.
Trojanowski
,
K.
, and
Wierzchon
,
S. T.
(
2009
).
Immune-based algorithms for dynamic optimization
.
Information Sciences
,
179
(
10
):
1495
1515
.
Varela
,
F.
,
Coutinho
,
A.
,
Dupire
,
B.
, and
Vaz
,
N. N.
(
1988
).
Cognitive networks: Immune, neural and otherwise
. In
A.
Perelson
(Ed.),
Theoretical immunology
,
Part 2
(pp.
359
375
).
Reading, MA
:
Addison Wesley
.
Watanabe
,
Y.
,
Ishiguro
,
A.
, and
Uchikawa
,
Y.
(
1998
).
Decentralized behavior arbitration mechanism for autonomous mobile robot using immune network
. In
D.
DasGupta
(Ed.),
Artificial immune systems and their applications
(pp.
187
209
).
Berlin
:
Springer
.
Whitbrook
,
A.
,
Aickelin
,
U.
, and
Garibaldi
,
J.
(
2007
).
Idiotypic immune networks in mobile-robot control
.
Systems, Man, and Cybernetics, Part B: IEEE Transactions on Cybernetics,
37
(
6
):
1581
1598
.
Whitbrook
,
A.
,
Aickelin
,
U.
, and
Garibaldi
,
J.
(
2008
).
An idiotypic immune network as a short-term learning architecture for mobile robots
. In
P.
Bentley
,
D.
Lee
, and
S.
Jung
(Eds.),
Artificial immune systems. Lecture notes in computer science
, Vol.
5132
(pp.
266
278
).
Berlin
:
Springer
.
Whitbrook
,
A. M.
,
Aickelin
,
U.
, and
Garibaldi
,
J. M.
(
2010
).
Two-timescale learning using idiotypic behaviour mediation for a navigating mobile robot
.
Journal of Applied Soft Computing
,
10
(
3
):
876
887
.

Notes

1

The problem instances generated prove harder for the benchmark deterministic heuristics to solve optimally, as can be seen by comparing Tables 10 and 12, discussed later in this article.

2

Although these problems have been removed from the network, they can still be solved by the system, as heuristics H1 and H3 remain in the network.

3

In fact, exactly the same goals were identified in generic form by Bersini (1999).

4

In this case this is also a standard GP tree.

5

The authors do not include the 480 problem instances from which prove hard for the variations of DJD used.

6

This ensures an even split of problem instances for each parameter setting between the training and test sets.

7

As both heuristics and problems are continually added with sufficient concentration to allow them to survive for at least three iterations, then at any iteration, there will be potentially be at most three heuristics and 90 problem instances that give no added benefit to the system. Table 13 shows only and after the run finishes, where any unstimulated problems and heuristics are removed; thus, the discrepancy between the mean for is 11.65% in Table 13 and 60% in Figure 8.

8

Note that experiments showed that the order in which the two datasets are presented does not have any impact on the results.