Abstract

A number of representation schemes have been presented for use within learning classifier systems, ranging from binary encodings to artificial neural networks. This paper presents results from an investigation into using a temporally dynamic symbolic representation within the XCSF learning classifier system. In particular, dynamical arithmetic networks are used to represent the traditional condition-action production system rules to solve continuous-valued reinforcement learning problems and to perform symbolic regression, finding competitive performance with traditional genetic programming on a number of composite polynomial tasks. In addition, the network outputs are later repeatedly sampled at varying temporal intervals to perform multistep-ahead predictions of a financial time series.

1  Introduction

Traditionally, learning classifier systems (LCS; Holland, 1976) use a ternary encoding to generalise over the environmental inputs and to associate appropriate actions. A number of representations have previously been presented beyond this scheme, however, including real numbers (Wilson, 2000), fuzzy logic (Valenzuela-Rendón, 1991), and artificial neural networks (Bull, 2002b). Temporally dynamic representation schemes within LCS represent a potentially important approach since temporal behaviour of such kinds are viewed as significant aspects of artificial life, biological systems, and cognition in general (Ashby, 1952).

In this paper, we explore examples of a general dynamical system representation within XCSF (Wilson, 2001)—termed dynamical genetic programming (DGP; Bull, 2009). Traditional tree-based genetic programming (GP; Koza, 1992) has been used within LCS both to calculate the action (Ahluwalia and Bull, 1999) and to represent the condition (e.g., Lanzi and Perrucci, 1999). DGP uses a graph-based representation, each node of which is constantly updated in parallel, and evolved using an open-ended, self-adaptive scheme. We show that XCSF is able to solve a number of computational tasks using this temporally dynamic knowledge representation scheme.

2  Related Work

2.1  Genetic Programming in Learning Classifier Systems

A significant benefit of symbolic representations is the expressive power to represent complex relationships between the sensory inputs (Mellor, 2005). LISP S-expressions composed of a set of Boolean functions (i.e., AND, OR, and NOT) have been used to represent symbolic classifier conditions in LCS to solve Boolean multiplexer and woods problems (Lanzi and Perrucci, 1999), and to extract useful knowledge in a data mining assay (Lanzi, 2001). An analysis of the populations (Lanzi et al., 2008) has subsequently shown an increasing prevalence of subexpressions through the course of evolution as the system constructs the required building blocks to find solutions. However, when logical disjunctions are involved, optimality is unattainable because the symbolic conditions highly overlap, resulting in classifiers sharing their fitness with other classifiers and thereby lowering the fitness values (Lanzi, 2007). Ioannides and Browne (2008) later extended this approach to further include arithmetic functions (i.e., PLUS, MINUS, MULTIPLY, DIVIDE, and POWEROF) as well as domain specific functions (i.e., VALUEAT and ADDROF) to solve a number of multiplexer problems.

In addition, Lanzi (2003) based classifier conditions on stack-based genetic programming (Perkis, 1994) and solved the 6-bit and 11-bit Multiplexer as well as Woods1 problems. Here, the conditions are linear sequences of tokens, expressed in Reverse Polish Notation, where each token represents either a variable, a constant, or a function. The function set used was composed of Boolean operators (i.e., AND, OR, NOT and EOR) and arithmetic operators (i.e., +, −, >, =).

Ahluwalia and Bull (1999) presented a simple form of LCS which used numerical S-expressions for feature extraction in classification tasks. Here each rule's condition was a binary string indicating whether or not a rule matched for a given feature and the actions were S-expressions which performed a function on the input feature value. More recently, Wilson (2008) has explored the use of a form of gene expression programming (GEP; Ferreira, 2006) within LCS. Here the expressions are composed of arithmetic functions and applied to regression tasks. The conditions are represented as expression trees which are evaluated by assigning the environmental inputs to the tree's terminals, evaluating the tree, and then comparing the result with a predetermined threshold. Whenever the threshold value is exceeded, the rule becomes eligible for use as the output.

Forsyth (1981) with his BEAGLE system was the first to use a purely evolution-based form of LCS (Pittsburgh style, Smith, 1983) to evolve LISP S-expressions for classification tasks. Landau et al. (2001) used a Pittsburgh-LCS in which the rules are represented as directed graphs where the genotypes are tokens of a stack-based language, whose execution builds the labeled graph. Bit strings are used to represent the language tokens and are applied to non-Markov problems. The genotype is translated into a sequence of tokens and then interpreted similarly to a program in a stack-based language with instructions to create the graph's nodes, connections, and labels. Subsequently, the unused conditions and actions in the stack are added to the structure which is then popped from the stack. Tokens are used to specify the matching conditions and executable actions as well as instructions to construct the graph, and to manipulate the stack. The bit strings were later replaced with integer tokens and again applied to non-Markov problems (Landau et al., 2005).

2.2  Graph-Based Genetic Programming

Most relevant to the form of GP used herein is the relatively small amount of prior work on graph-based representations. Neural programming (NP; Teller and Veloso, 1996) uses a directed graph of connected nodes, each performing an arbitrary function. Potentially selectable functions include READ, WRITE, and IF-THEN-ELSE, along with standard arithmetic and zero-arity functions. Additionally, complex user defined functions may be used. Significantly, recursive connections are permitted and each node is executed with synchronous parallelism for some number of cycles before an output node's value is taken.

Poli (e.g., Pujol and Poli, 1998) presented a similar scheme wherein the graph is placed over a two-dimensional grid and executes its nodes synchronously in parallel. Connections are directed upward and are only permitted between nodes situated on adjacent rows; however, by including identity functions, connections between nonadjacent layers are possible and thus any parallel distributed program may be represented.

Teller and Veloso (1997) also presented parallel algorithm discovery and orchestration (PADO) which uses an arbitrary directed graph of nodes and an indexed memory. Each node in the graph consists of an action and a branch-decision component, with multiple outgoing branches permitting the various potential flows of control. A stack is used from where each program's inputs are drawn and the results are pushed. The potentially selectable actions are similar to NP and include arithmetic operators, negation, minimum and maximum, and the ability to read from and write to the indexed memory, along with nondeterministic and deterministic branching instructions. The graphs are executed chronologically for a fixed amount of time with each node selecting the next to take control. The output nodes are then averaged, giving additional weighting to the more recent states.

Other examples of graph-based GP typically contain sequentially updating nodes, for example, finite state machines (e.g., Fogel et al., 1965), Cartesian GP (Miller, 1999), genetic network programming (Hirasawa et al., 2001), linear-graph GP (Kantschik and Banzhaf, 2002), and graph structured program evolution (Shirakawa et al., 2007). Schmidt and Lipson (2007) have recently demonstrated a number of benefits from graph encodings over traditional trees, such as reduced bloat and increased computational efficiency.

We have recently introduced the use of graph-based Boolean logic networks within LCS (Bull and Preen, 2009; Preen and Bull, 2009). In this paper we extend that work to the continuous-valued domain through arithmetic operators and to the most recent form of LCS, Wilson's XCSF.

3  XCSF Overview

For each phase in the XCSF learning cycle (Wilson, 2001), a match set [M] is generated from the population set [P], composed of all of the classifiers whose environment condition matches the current environmental input. In the event [M] is empty, covering is used to produce classifiers that match the current environment state with random actions.

Subsequently, a system prediction is made for each action in [M], based upon the fitness-weighted average of all of the predictions of the classifiers proposing the action. If there are no classifiers in [M] advocating one of the potential system actions, covering is invoked to generate classifiers that both match the current environment state and advocate the relevant action. An action is then selected using the system predictions, typically by alternating exploring (by either roulette wheel or random selection) and exploiting (the best action). In multistep problems, a biased selection strategy is often employed wherein exploration is conducted at probability pexplr, otherwise exploitation occurs (Lanzi, 1999). An action set [A] is then built composed of all the classifiers in [M] advocating the selected action. Next, the action is executed in the environment and feedback is received in the form of a payoff, P.

In a single step problem, [A] is updated using the current reward. The genetic algorithm (GA; Holland, 1975) is then run in [A] if the average time since the last GA invocation is greater than the threshold value, . When the GA is run, two parent classifiers are chosen (typically by either roulette wheel or tournament selection) based on fitness. Offspring are then produced from the parents, usually by use of crossover and mutation. The offspring then have their payoff, error, and fitness set to the average of their parents’ values. If subsumption is enabled and the offspring are subsumed by either parent, it is not included in [P]; instead, the parents’ numerosity is incremented. In a multistep problem, the previous action set [A]-1 is updated using a Q-learning (Watkins, 1989) type algorithm and the GA may be run as described above on [A]-1 as opposed to [A] for single step problems. The sequence then loops until it is terminated.

Each classifier also maintains a vector of a series of weights, where there are as many weights as there are inputs from the environment, plus one extra, x0. This extra weight is set as a constant value and is uniform across all classifiers in the population. That is, each classifier maintains a prediction (cl.p) which is calculated as a product of the environmental input (st) and the classifier weight vector (w):
formula
1
Each of the input weights is initially set to zero, and subsequently adapted to accurately reflect the prediction using a modified rule (Mitchell, 1997). The rule was modified such that the correction for each step is proportional to the difference between the current and correct prediction, and controlled by a correction rate, . The modified rule for the reinforcement update is thus:
formula
2
where is the correction rate and |st|2 is the norm of the input vector st. The values are used to update the weights of the classifier cl with:
formula
3
Subsequently, the prediction error is updated with:
formula
4

Giani et al. (1995) provide the first example of an LCS where the prediction is computed for each environment state, that is, the prediction can vary over the condition's domain. There, neural networks were used to compute the prediction values within a Pittsburgh LCS based on a Q-learning strategy. This enables a more accurate, piecewise linear approximation of the payoff (or function), as opposed to the standard piecewise constant approximation, and can also be applied to binary problems such as the Boolean multiplexer and maze environments, resulting in faster convergence to optimality. By computing the prediction, greater systemwide generalisations can also be formed, including within different payoff levels, potentially resulting in a more compact and general rule base. See Wilson (2002) for further details.

4  Dynamical Arithmetic Networks

The standard arithmetic operators (shown in Table 1) have become the default operators within genetic programming for regression tasks (Koza, 1992). The functions comprise the basic operational toolset within mathematics for transforming two numbers into a single product; because of this, most forms of genetic programming, whether tree-based (e.g., Koza) or graph-based (e.g., Miller and Thomson, 2000), use two fixed connections (i.e., K=2) to each node which act as inputs to be transformed by the receiving node's operator. In dynamical systems, K=2 has been identified as the critical regime, with higher connectivity resulting in increasing chaos (e.g., Kauffman, 1993). Significantly, arithmetic operators are unbounded, unlike fuzzy logic for example.

Table 1:
Selectable arithmetic operators.
IDFunctionLogic
if(x>y) return 1.0; else return 0.0 
×  
x+y 
− xy 
x/y 
IDFunctionLogic
if(x>y) return 1.0; else return 0.0 
×  
x+y 
− xy 
x/y 

Therefore, to incorporate an arithmetic dynamical genetic programming scheme within XCSF (hereinafter, aDGP-XCSF, see, e.g., Figure 1), here K=2 and each node performs one of the potentially selectable operations (from Table 1) before being capped at a minimum or maximum node state of 10,000.0, which is necessary since the dynamical behaviour of the network could result in states of positive or negative infinity. Finally, the introduction of constants can be achieved similarly to traditional genetic programming through the use of ephemeral random constants; however, following Miller and Thomson (2000) and Clegg et al. (2007), here we start with only using one selectable constant of value 1.0.

Figure 1:

Example aDGP network and node encoding.

Figure 1:

Example aDGP network and node encoding.

Figure 2 illustrates the fraction of nodes changing state over time within a synchronous 13 node network (where the results are an average of 100 randomly constructed networks).

Figure 2:

Fraction of nodes changing state on a 13 node synchronous aDGP network (average of 100 networks; K=2).

Figure 2:

Fraction of nodes changing state on a 13 node synchronous aDGP network (average of 100 networks; K=2).

5  Arithmetic DGP in XCSF

To use dynamical arithmetic genetic networks as the rules within XCSF, the following scheme is adopted. The population of classifiers is fully initialised randomly. Each randomly created network initially consists of Ninit number of nodes, with each node maintaining two randomly assigned connections, and where these connections are assigned to external inputs (i.e., input variables and constants) at a 20% uniformly random probability and to other nodes within the network at the remaining 80%; thus ensuring a consistent distribution as the number of nodes increases. In addition, each node is randomly assigned one of the aforementioned operators.

Node states are initialised at random for the first step of a trial but thereafter they are not reset for each subsequent matching cycle. Matching consists of synchronously executing each rule for T cycles based on the current input. An extra matching node is also required to enable a network to (potentially) only match specific sets of inputs. If a given network has a value of less than 0.5 on the match node, regardless of the state of its outputs, the rule does not join the match set, [M]. During exploitation, the single rule with the highest prediction multiplied by accuracy is chosen as the system output. During exploration, a single rule is chosen under a prediction proportionate scheme. Once a rule has been chosen, an action set, [A], is constructed, composed of all other matching rules whose output node states lie within of the chosen network's output node. Parameters are then updated as usual and the GA is executed in [A] during exploration. When covering is necessitated, a randomly constructed network is created and then executed for T cycles to determine the status of the match node. This procedure is repeated until a network is created that matches the environment state.

Following Preen and Bull (2009), each rule has its own mutation rate . Mutation only is used here and applied to the node's function and connectivity map at rate . A node's function is represented by an integer which references the appropriate operation to execute upon its received inputs (see Table 1 for the arithmetic operators used). Further, each node's connectivity is represented as a list of two integers, with positive integers referencing inputs to be received from other nodes in the network and negative integers referencing external inputs. Each integer in the list is subjected to mutation on reproduction at the self-adapting rate for that rule. Hence, within the representation, evolution can select different operators for each node within a given network rule, along with its connectivity map. Specifically, each rule has its own mutation rate stored as a real number and initially seeded uniform randomly in the range [0, 1]. This parameter is passed to its offspring. The offspring then applies its mutation rate to itself using a Gaussian distribution, that is, , before mutating the rest of the rule at the resulting rate. This is similar to the approach used in evolution strategies (ES; Schwefel, 1981) where the mutation rate is a locally evolving entity in itself, that is, it adapts during the search process. Self-adaptive mutation not only reduces the number of hand-tunable parameters of the evolutionary algorithm, it has also been shown to improve performance.

Due to the need for a possible different number of nodes within the rules for a given task, the DGP scheme is also of variable length. Within our system, once the function and connections have been mutated, a new randomly connected node is either added or the last added node is removed with the same probability . The latter case only occurs if the network currently consists of more than the initial number of nodes. Subsequently, parameter T (i.e., the number of execution cycles) undergoes mutation. Here, each rule maintains its own T value which is initially seeded randomly between 1 and 50. Thereafter, offspring potentially increment or decrement T by 1 at probability , and T remains bounded between 1 and 50. Thus, DGP is temporally dynamic both in the search process and the representation scheme. Traditional GP can be seen to primarily rely upon recombination to search the space of possible tree sizes, although the standard mutation operator effectively increases or decreases tree size also. Whenever an offspring classifier is created and no changes occur to its network when undergoing mutation, the parent's numerosity is increased and mutation rate is set to that of the offspring.

Furthermore, since XCSF computes the predicted value of a state–action pairing, each classifier maintains a vector of a series of weights, where there are as many weights as there are inputs from the environment, plus one extra, x0. This extra weight is set as a constant value and is uniform across all classifiers in the population. Each of the input weights is initially set to zero, and subsequently adapted to accurately reflect the prediction using a modified rule. The modified rule provides a correction for each step that is proportional to the difference between the current and correct prediction, and controlled by a correction rate, . Following Wilson (2007), an extra weight is also included which receives as input the classifier's current action. In addition, here each offspring's vector weights are reset upon reproduction. Figure 3 shows an illustration of a rule generated whilst solving the sextic polynomial f(x)=x6+2x4+x2. The rule has an error of 0 and an accuracy of 1 while having an experience of 685, showing that it is a highly accurate rule. The fitness is only 0.118 since it is shared among classifiers in the same niche.

Figure 3:

Example aDGP-XCSF sextic polynomial rule.

Figure 3:

Example aDGP-XCSF sextic polynomial rule.

6  Experimentation

6.1  Reinforcement Learning

We begin experimentation using two well-known reinforcement learning problems, the real-multiplexer and the frog problem. The 6-bit real multiplexer problem provides a continuous-input discrete-output task as a first step to understanding the capabilities of aDGP-XCSF; it demonstrates the handling of a multivariate problem and enables the comparison with prior work. The frog problem provides a fully continuous-input and output reinforcement learning task and enables the exploration of the applicability of aDGP-XCSF to continuous reinforcement learning.

6.1.1  6-Bit Real Multiplexer

The Boolean multiplexer problem consists of binary strings of length l=x+2x under which the x bits index into the remaining 2x bits, returning the value of the indexed bit. The real multiplexer problem (Wilson, 2000) is an extension of the Boolean multiplexer where the binary strings are replaced with randomly generated real-valued vectors in the range [0,1]. Each value in the vector is then interpreted as 0 if greater than a threshold value , else 1. Similar to Wilson (2000), here . In this experiment only, the output node is discretised to 0 or 1 depending on its state being greater or less than 0.5. The actions of each network are then used to construct a prediction array as in the standard XCS approach. Each node state is also restricted in the range [−1,1]. Training switches between explore and exploit trials. In each case, a random example is generated. Under explore trials, an action is chosen at random from within the matching set of rules [M]. Rules are updated and the GA may fire. Under exploit trials, the GA is not used.

From Figure 4a, it can be seen that optimal performance is achieved after approximately 100,000 trials. The learning speed is slower than XCSR (Wilson, 2000); however, here 100% performance is ultimately achieved, whereas with XCSR “performance reaches its maximum [at] approximately 98%” (Wilson, 2000). It is important to note that pure evolution such as that used here will, in general, require longer learning times than those able to exploit a statistical update procedure. However, here the representation brings benefits in terms of inherent memory (see, e.g., Preen and Bull, 2009) along with improved signal to symbol transformation and a number of other benefits, which are shown later.

Figure 4:

aDGP-XCSF 6-bit real-multiplexer problem performance.

Figure 4:

aDGP-XCSF 6-bit real-multiplexer problem performance.

Figure 4a also shows that prior to reaching optimality, the number of macro-classifiers increases from 500 to 1,000 and the average mutation rate declines from 45% to 3%; after 100,000 trials both values remain stable. Figure 4b shows that the average number of nodes increases marginally from 22 to 22.5, while the average value of T remains relatively stable throughout experimentation (30 to 28.5).

6.1.2  Continuous-Action Frog Problem

The frog problem (Wilson, 2004, 2007) is a single-step problem with a nonlinear continuous-valued payoff function in a continuous one-dimensional space. A frog is given the learning task of jumping to catch a fly that is at a distance d from the frog, where . The frog receives a sensory input, x(d)=1−d, before jumping a chosen distance, a, and receiving a reward based on its new distance from the fly, as given by:
formula
5

In the continuous-action case, the frog may select any continuous number in the range [0,1] and thus the optimal achievable performance is 100%. Parameters are then updated and the GA executed as usual in [A]. Exploitation functions by selecting the single rule with the lowest error divided by fitness from [M]. The parameters used here are the same as used by Wilson (2004, 2007) and Tran et al. (2007), that is, P=2000, , , , , , , x0=1. Only one output node is required and thus Ninit=3.

Wilson (2007) presented a form of XCSF where the action was computed directly as a linear combination of the input state and a vector of action weights, and conducted experimentation on the continuous-action frog problem, selecting the classifier with the highest prediction for exploitation. Tran et al. (2007) subsequently extended this by adapting the action weights to the problem through the use of an ES. In addition to the action weights, a vector of standard deviations is maintained for use as the mutation step size by the ES. During exploration, the ES is applied to each member of [A] to evolve the action weights and standard deviations, where each rule functions as a single parent producing an offspring via mutation; the offspring is then evaluated on the current environment state and its fitness updated and compared with the parent; if the offspring has a higher fitness, it replaces the parent, otherwise, it is discarded. Moreover, the exploration action selection policy was modified from purely random to selecting the action with the highest prediction. After reinforcement updates and running the ES, the GA is invoked using a combination of mixed crossover and mutation. They reported greater than 99% performance after an averaged number of 30,000 trials (P= 2,000), which was superior to the performance reported by Wilson (2007). More recently, Ramirez Ruiz et al. (2008) applied a fuzzy-LCS with continuous vector actions, where the GA only evolved the action parts of the fuzzy systems, to the continuous-action frog problem, and achieved a lower error than Q-learning (discretized over 100 elements in x and a) after 500,000 trials (P=200).

Figure 5 shows the performance of aDGP-XCSF on the continuous-valued frog problem. As can be seen, aDGP-XCSF attains greater than 99% performance after approximately 8,000 trials. This is an improvement on previously reported results.

Figure 5:

aDGP-XCSF continuous-action frog problem performance.

Figure 5:

aDGP-XCSF continuous-action frog problem performance.

6.2  Regression/Function Approximation

To adapt aDGP-XCSF for regression tasks, several modifications are necessary. A trial now consists of an input from a dataset of real numbers, followed by the construction of [M], receiving the correct answer from the dataset, updating all classifiers in [M], and then running the GA in [M]. Performance is measured as the absolute error between the answer and the action from the single network with the lowest error divided by fitness.

Following Stalph and Butz (2010), who found that by increasing the number of reproduced classifiers, each GA invocation can increase the learning speed of XCSF in regression tasks, here a new parameter, , is introduced to control the number of offspring created from the two parents chosen through roulette wheel each GA invocation, with being equal to traditional LCS. As can be seen in the following experiments, this was necessary to increase the amount of search performed, because the number of possible phenotypes represented by aDGP is extremely large. Furthermore, following neural-XCS for regression (Bull and O'Hara, 2002), MAM updating of inexperienced classifiers is disabled. The dataset used in the following experiments consists of 50 equally spaced real-valued numbers in the range [−1,1] and the parameters used are: P=2000; ; ; ; ; ; Ninit=2.

6.2.1  Sextic Polynomial

Figure 6 shows the performance of aDGP-XCSF on the sextic polynomial (from Koza, 1994):
formula
6
Figure 6:

aDGP-XCSF sextic polynomial performance.

Figure 6:

aDGP-XCSF sextic polynomial performance.

The average absolute error using an unmodified GA (i.e., ) remains above 0.1 after 500,000 trials, with only one in 10 experiments achieving an error below (not shown). In contrast, with increased local search, the average error is reduced below after approximately 210,000 trials with 100 offspring created each GA invocation (also not shown), and approximately 125,000 trials with 250 offspring (see Figure 6a). In addition, the average time (in trials) taken to reach an error below with (M= 103890, SD= 80883, N= 10) is significantly greater than (M= 33520, SD= 33620, N= 10) using a two-sample t-test assuming unequal variances, t(12) = 2.54, .026, showing that aDGP-XCSF benefits from increased local search. As might be expected, the time to is slower than standard XCSF, which does not use pure evolution, requiring around 7,600 trials to achieve an error below the target threshold. Figure 6a shows the average number of macro-classifiers with initially declines from 2,000 to 1,450 over the first 50,000 trials before increasing and converging to around 1,800 after approximately 250,000 trials. Over the first 125,000 trials, where the average error is reduced below , the average mutation rate (also Figure 6a) declines rapidly from 50% to 6% and the average number of nodes in the networks (Figure 6b) grows from 2 to 22. Furthermore, from Figure 6b, it can be seen that the average value of T remains stable around 20 throughout experimentation.

6.2.2  Quintic Polynomial

Figure 7 shows the performance of aDGP-XCSF with on the quintic polynomial (from Koza, 1994):
formula
7
Figure 7:

aDGP-XCSF quintic polynomial performance.

Figure 7:

aDGP-XCSF quintic polynomial performance.

The quintic polynomial provides slightly less potentially exploitable regularity and modularity than the sextic previously considered (Koza, 1994). Figure 7a shows that the average absolute error of aDGP-XCSF is consistently below after 160,000 trials. Again, this is slower than standard XCSF, which reaches the target error after approximately 1,200 trials. Figure 7a shows an initial decline in the average number of macro-classifiers over the first 50,000 trials from 2,000 to 1,400 before converging to around 1,850, similar to the sextic problem. The average mutation rate (Figure 7a) declines from 50% to 8% after 160,000 trials while a system error below is achieved, stabilising around 5% thereafter. The average number of nodes in the networks (Figure 7b) grows from 2 to 20 after 160,000 trials and continues to grow throughout the experiment, reaching 37 after 500,000 trials. The average value of T remains around 25 throughout (see also Figure 7b).

6.2.3  Two-Composite Polynomial

Combining multiple polynomials to create a composite function can increase the complexity of the regression task. Here we combine the sextic and quintic polynomials aforementioned to form a two-composite function defined by:
formula
8

Figure 8 shows the performance of aDGP-XCSF on the two-composite polynomial with , while Figure 9 shows the performance of tree-based GP (P=10,000; MAX_LEN = 1,000; DEPTH = 5; CROSSOVER = 0.9; MUTATION PER NODE = 0.05) on the same problem. From Figure 8a, it can be seen that the average absolute error of aDGP-XCSF is consistently zero after approximately 200,000 trials. In contrast, tree-based GP attains a minimum average absolute error of 0.02, twice (i.e., sum of errors, 1.0, divided by dataset size, 50) after 500 generations (Figure 9a). Being generous to tree-based GP and assuming the average [M] set size is equivalent to the entire population size (in reality it is closer to half), 125,000 trials correspond to 500 generations (both composed of 250 million evaluations); the average absolute error after 125,000 trials of aDGP-XCSF with (M= 0.006, SD= 0.0062, N= 10) is significantly less than tree-GP after 500 generations (M= 0.02, SD= 0.019, N= 10) using a two-sample t-test assuming unequal variances, t(11) =−2.25, .0456. Standard XCSF reaches the target threshold after approximately 5,400 trials.

Figure 8:

aDGP-XCSF two-composite polynomial performance.

Figure 8:

aDGP-XCSF two-composite polynomial performance.

Figure 9:

Tree-GP two-composite polynomial performance.

Figure 9:

Tree-GP two-composite polynomial performance.

The average number of macro-classifiers used by aDGP-XCSF converges to around 1,850 after 250,000 trials while the average mutation rate declines to around 9% (Figure 8a). The average value of T utilised by the aDGP-XCSF networks remains around 30 throughout experimentation (Figure 8b). As might be expected (e.g., Schmidt and Lipson, 2007), the average number of nodes in the networks used by graph-based aDGP-XCSF (16 after 125,000 trials; see Figure 8b) is fewer than the average number of nodes used by tree-based GP (35,000 after 500 generations; see Figure 9b).

6.2.4  Four-Composite Polynomial

Next, aDGP-XCSF is applied to the four-composite function defined by:
formula
9

Figure 10 shows the performance of aDGP-XCSF on the four-composite polynomial with , while Figure 11 shows the performance of tree-based GP. A lower value is used here because the four-composite problem requires more niching and larger values cause too much of the niche to be replaced through GA activity which results in performance spikes (not shown). From Figure 10a, it can be seen that the average absolute error of aDGP-XCSF is consistently below after approximately 80,000 trials. In contrast, tree-based GP attains a minimum average absolute error of 0.02, twice (Figure 11 a). Again, being generous to tree-based GP and assuming the average [M] set size is equivalent to the entire population size, 125,000 trials correspond to 500 generations; the average absolute error after 125,000 trials of aDGP-XCSF with (M= 0.004, SD= 0.0027, N= 10) is significantly less than tree-GP after 500 generations (M= 0.0197, SD= 0.0173, N= 10) using a two-sample t-test assuming unequal variances, t(9) =−2.82, .02. Standard XCSF reaches the target error after approximately 2,100 trials.

Figure 10:

aDGP-XCSF four-composite polynomial performance, .

Figure 10:

aDGP-XCSF four-composite polynomial performance, .

Figure 11:

Tree-GP four-composite polynomial performance.

Figure 11:

Tree-GP four-composite polynomial performance.

The average number of macro-classifiers (Figure 10a) rapidly decreases from 2,000 to 1,450 after 10,000 trials before steadily converging to around 1,750. The average mutation rate (also Figure 10a) declines from 50% to 13% over the first 80,000 trials while solutions are learned and then declines at a slower rate to around 6.5% after 500,000 trials. The average value of T utilised by the aDGP-XCSF networks remains around 26 throughout experimentation (Figure 10b). Similar to the two-composite function, the average number of nodes in the networks used by graph-based aDGP-XCSF (12 after 125,000 trials; see Figure 10b) is fewer than the average number of nodes used by tree-based GP (23,000 after 500 generations; see Figure 11b).

Figure 12 shows the matching classifiers on the four-composite polynomial problem where each classifier has an error less than 10% of (i.e., the 15 lowest error matching rules in the population). In addition, the composite function is plotted above, showing that XCSF correctly partitions the input space into four separate niches with distinct matching classifiers.

Figure 12:

Matching classifiers with error under on the four-composite polynomial problem.

Figure 12:

Matching classifiers with error under on the four-composite polynomial problem.

7  Look-Ahead Learning

7.1  Anticipatory LCS

Samuel (1959) showed that by generating an internal model of the environment, the system can make predictions about the expected consequences of various sequences of action, that is, it can look ahead. To incorporate future state predictions into LCS, Holland (1990) proposed that, in addition to a condition and action, each classifier also calculates the effect of performing the proposed action. Riolo (1990) extended this to perform such learning without external reinforcement, calculating the next-state rather than the next-reward, that is, latent learning. Stolzmann (1998) presented a heuristic-driven LCS (ACS) which uses the explicit next-state rule structure to build anticipatory models of the environment where the accuracy of the rules’ predictions are factored into their utility. Through anticipating the consequences of actions with the evolving model, system behaviour can adapt faster. Similarly, YACS (Gerard and Sigaud, 2001) performs the same anticipatory learning; however, it modifies the condition and effect separately with the goal of easing over-specialised conditions. Zatuchna and Bagnall (2005) incorporated memory within an ACS-like approach and found faster convergence to optimality in non-Markov mazes when compared with LCS using explicit memory. Holley et al. (2004) extended ACS to use incomplete information contained in the classifier list as a basis for an abstract world model in which to interact or dream. They found that the abstract thread (or dream direction) can be used to cycle well-known states, resulting in fewer interactions with the environment to develop a confident model in a simple maze environment. With the goal of discovering new regularities, Gerard et al. (2005) proposed a version which included don't know symbols wherein a classifier may anticipate a few attributes only. Since a single classifier describes only a partial view of the next situation, the anticipating unit is composed of the entire LCS instead of a single rule. ACS was later extended to incorporate a GA for generalisation (ACS2) which resulted in improved performance (Butz and Stolzmann, 2002).

LCSs, which use rule linkage over succeeding time steps (e.g., Tomlinson, 2001), may also implicitly build predictions of future states when the condition of a linked rule represents the next state. Bull (2002a) cast the internal model building task as a single-step task within ZCS (Wilson, 1994), where reward is given only if a rule predicts the expected outcome of taking its action under the condition matched. Bull and Hurst (2003) explored a ZCS where each rule is embodied as a neural network with separate output nodes for each condition, action, and anticipation. In addition, O'Hara and Bull (2005) encompassed two neural networks within each XCS classifier (Wilson, 1995) to solve a number of discrete Markov mazes; one network was used to calculate the current matching condition and action, and a second (trained via backpropagation) to produce a description of the anticipated next state. More recently, Bull et al. (2007) found competitive performance using a simple array of perceptions to provide the anticipation mappings. To date, all work on anticipatory LCS has only considered discrete-valued problems.

7.2  Multistep-Ahead Prediction Neural Networks

In addition to the common single step-ahead prediction task, neural networks have been used to construct an H-steps ahead prediction. The various approaches to designing a multistep-ahead prediction (MSP) can be broadly categorised as either iterative or direct. The iterative approach is the oldest technique for MSP and involves iterating, H times, a one-step-ahead predictor where the output is fed back as input to produce the next step prediction, that is, estimated values are used as inputs instead of actual observations and thus a propagation of error is inherent, which may result in low performance. This is particularly significant on long-horizon tasks because the models are tuned with one-step criteria and consequently do not take the temporal behaviour into account appropriately. Typically, recurrent neural networks are used in iterative approaches (e.g., Williams and Zipser, 1989). In order to correct the propagation of error during training, frequently in dynamical supervised learning tasks, the network output is replaced with the corresponding desired response (i.e., target signal) wherever one is available for the subsequent computation of the dynamic behaviour of the network, that is, teacher forcing (Williams and Zipser, 1989).

Direct approaches include the use of multiple prediction models (i.e., H number of networks), and multiple-input multiple-output (MIMO) models (i.e., vector of outputs). Estimating H prediction models is a much higher functional complexity than an iterated approach. In addition, direct models learned independently induce a conditional independence of the estimators, preventing the technique from considering complex dependencies between the variables and consequently biasing the prediction accuracy (Ben Taieb et al., 2010). Multivariate responses prediction consists of H number of output nodes per network, “with the goal of preserving, among the predicted values, the stochastic dependency characterising the time series” (Ben Taieb et al., 2010, p. 1950, i.e., the relationship between future values is captured). However, MIMO approaches can suffer from too tight a coupling among the outputs (Huang and Lian, 2000) and require considerable training time (Selvaraj et al., 1995).

8  Arithmetic DGP-XCSF Look Ahead Learning

The DGP computation of the first step-ahead prediction remains as before, that is, each network is processed for T cycles before sampling the match and output nodes. Further predictions are computed by iteratively sampling the output nodes each subsequent W cycles, that is, each matching network is processed a total of T+(W(H−1)) cycles. To maintain uniform processing of all networks, after the classifiers have been updated and the GA (potentially) run, each matching network's nodes are reset to the final states after computing the first step prediction (i.e., the state of the network after T cycles).

To perform an H-steps-ahead online forecast, here each classifier maintains an |H| by |H| matrix representing the last H number of H-step predictions. In addition, the system as a whole stores the previous H match sets. Upon receiving the current time step environmental input, P(t0), each of the match sets performs a single update similar to traditional XCSF, where each classifier in [M]-i is updated based on the absolute error of the current input and the classifier's ith step forecast, , that is, (see Figure 13 for an example where H= 4). In this way, t-1 one-step-ahead forecasts are updated, and t-2 two-step-ahead forecasts are updated, and so on. In addition, the previous time-delayed inputs are embedded to create a fixed length memory buffer, presenting the inputs simultaneously.

Figure 13:

Multistep-ahead error updating scheme.

Figure 13:

Multistep-ahead error updating scheme.

We compare this approach with an XCSF with H weight vectors to perform direct learning of future steps. The condition is composed of of interval pairs; when the GA is invoked, each offspring's interval is mutated by adding a , where Random Number() is a real-valued random number in the range [−1,1], and is the same self-adaptive mutation procedure used within DGP; no macro-classifiers are used. To provide the supervised update, the past H state vectors are also maintained by the system.

9  Experimentation

The foremost trend following indicator is the moving average. The use of moving averages for financial time-series prediction is said to have originated from anti-aircraft gunners who used the averages to aim guns at enemy aircraft during World War II and then applied the averages to prices (Elder, 1993). Their use in financial time series forecasting was later popularised by R. Donchian (a Merrill Lynch employee) and J. M. Hurst (an engineer). The simple moving average (SMA) is the average price over the most recent specified period. Formally, this can be represented as:
formula
10
where P is the price being averaged and N is the number of days in the moving average. Moreover, the SMA can be applied to price proxies such as the typical price (i.e., (Phigh+Plow+Pclose)/3) as well as other mathematical technical indicators. The problem with an SMA is that whilst it affords excellent price smoothing, it is a very lagging indicator, with the specified period length (N) proportional to the time lag in its signal. Furthermore, each data value counts twice in the calculation; once initially, as the new information is added to the average, and again at the end when the value is removed in order to make way for the new information. This double counting of data values can result in the average either rising or falling despite the most recent data being contrary.
In order to reduce this lag and eliminate the double counting of data, the exponential moving average (EMA) was developed. The EMA achieves this by adding greater emphasis to the most recent data values. This makes the average more responsive to the newer prices and less responsive to the older prices over the specified period. The calculation for an EMA is:
formula
11
where K=2/(N+1), N is the number of days in the EMA, Ptoday is today's price and is the EMA of yesterday.

Beyond the SMA and EMA, there have been numerous research studies within the field of digital signal processing seeking to reduce the lag and improve the smoothness of the averages, for example, the adaptive moving average (AMA; Kaufman, 1995). The central premise of AMA is that in fast trending markets, a smaller N should be used to calculate the average to maintain a low lag, and in slower moving markets, a larger value should be used to maintain smoothness.

The target time series for our experiments is the single most liquid financial instrument in the world, trading approximately $1 trillion a day, the Euro/U.S. dollar currency pair (EURUSD). Since financial time series are widely acknowledged as being extremely noisy, instead of forecasting the future values of the series directly, here we use an EMA as a simple smooth price proxy with which to predict the trend. In addition, the EMA is calculated using the typical price instead of the closing price since currency markets are open 24 hr per day and there is therefore no psychological importance to the closing price. Further, different brokers around the world calculate the closing price slightly differently depending on the local time shift.

Figure 14 illustrates the typical price and the 50-day EMA of EURUSD used for experimentation. Initial learning is conducted by looping over the training set (i.e., the first 179 days, in sequence) for 500 iterations. Thereafter, a single pass of the conjoining (98 day) test set is conducted (including classifier updates and GA invocation) to evaluate the performance on unseen data in an online manner. Table 2 describes the training and testing periods. In each of the following experiments, the results are an average of 10 runs.

Figure 14:

Typical price (solid) and 50 Day EMA (dashed) of EURUSD.

Figure 14:

Typical price (solid) and 50 Day EMA (dashed) of EURUSD.

Table 2:
Summary Descriptive Statistics
Training Set Test Set 
  
Mean 1.32059902 Mean 1.399067279 
Standard error 0.001389333 Standard error 0.002176917 
Median 1.31706851 Median 1.399351445 
Standard deviation 0.018588011 Standard deviation 0.021550375 
Sample variance 0.000345514 Sample variance 0.000464419 
Kurtosis −0.744922569 Kurtosis −0.824531903 
Skewness 0.21951874 Skewness 0.071625332 
Range 0.07788101 Range 0.08572053 
Minimum 1.28736279 Minimum 1.35558296 
Maximum 1.3652438 Maximum 1.44130349 
Sum 236.3872246 Sum 137.1085933 
Count 179 Count 98 
Training Set Test Set 
  
Mean 1.32059902 Mean 1.399067279 
Standard error 0.001389333 Standard error 0.002176917 
Median 1.31706851 Median 1.399351445 
Standard deviation 0.018588011 Standard deviation 0.021550375 
Sample variance 0.000345514 Sample variance 0.000464419 
Kurtosis −0.744922569 Kurtosis −0.824531903 
Skewness 0.21951874 Skewness 0.071625332 
Range 0.07788101 Range 0.08572053 
Minimum 1.28736279 Minimum 1.35558296 
Maximum 1.3652438 Maximum 1.44130349 
Sum 236.3872246 Sum 137.1085933 
Count 179 Count 98 

9.1  Two-Step-Ahead Prediction

In each of the following experiments, P=10,000, , , , , , , , , Ninit=20, and three ephemeral random constants [−0.001, +0.001] are used. For XCSF, and x0=1.0.

From Figure 15a, it can be seen that after 500 iterations of the training set aDGP-XCSF with achieves an average absolute error over a two-step-ahead prediction of . With additional embedded inputs, aDGP-XCSF can learn faster and with more accurate solutions, where (Figure 15b) achieves an equivalent error of (). This is confirmed after a single evaluation of the test set with average error for , and for ().

Figure 15:

Average two-step-ahead prediction error of aDGP-XCSF.

Figure 15:

Average two-step-ahead prediction error of aDGP-XCSF.

In comparison, the average absolute errors after 500 training iterations of aDGP-XCSF with and are significantly less than XCSF with (Figure 16a; and respectively). Adding extra memory to XCSF (i.e., ; see Figure 16b) resulted in no statistical difference after 500 training iterations. However, over the test set, XCSF with () is significantly less than XCSF with (), . By comparing XCSF and aDGP-XCSF over the test set, it can be seen that aDGP-XCSF with both and achieve lower errors than XCSF and these are statistically significant ( and ).

Figure 16:

Average two-step-ahead prediction error of XCSF.

Figure 16:

Average two-step-ahead prediction error of XCSF.

9.2  Five-Step-Ahead Prediction

To perform the next five-steps-ahead prediction, identical parameters are used, except . From Figure 17a, it can be seen that after 500 iterations of the training set, aDGP-XCSF with achieves an average absolute error over the five steps of . Again, with additional embedded inputs, aDGP-XCSF performance is improved, where (Figure 17b) achieves an equivalent error of (). However, over the test set, there is no statistically significant difference (with achieving and achieving average errors).

Figure 17:

Average five-step-ahead prediction error of aDGP-XCSF.

Figure 17:

Average five-step-ahead prediction error of aDGP-XCSF.

The average absolute errors after 500 training iterations of aDGP-XCSF with and are significantly less than XCSF with (Figure 18a; and , respectively). Similar to the two-step-ahead prediction, adding extra memory to XCSF (; Figure 18b) resulted in no statistical difference after 500 training iterations. However, over the test set, XCSF with () is significantly less than XCSF with (), . By comparing XCSF and aDGP-XCSF over the test set, it can be seen that aDGP-XCSF with both and achieve lower errors than XCSF, and these are statistically significant ( and ).

Figure 18:

Average five-step-ahead prediction error of XCSF.

Figure 18:

Average five-step-ahead prediction error of XCSF.

10  Conclusions

This paper has explored Dynamical Genetic Programming (DGP), a temporally dynamic graph-based representation, within the XCSF LCS. The DGP syntax presented consists of each node receiving two inputs from an unrestricted topology and then performing an arbitrary function. The representation is evolved under a self-adaptive and open-ended scheme, allowing the topology to grow to any size to meet the demands of the problem space. The collective mechanics of dynamical arithmetic networks have been shown to be exploitable in solving continuous-valued input-output reinforcement learning problems, with superior performance to those reported previously in the frog problem.

The GA was modified to produce an increased number of offspring for each invocation beyond the traditional two. The GA modification was found to result in an increased local search that significantly benefited DGP which encapsulates a large number of phenotypes. XCSFs, utilising rules comprised of arithmetic genetic networks, were then shown capable of finding optimal performance for symbolic regression on a number of polynomial functions, and significantly superior performance and more compact solutions were found in a number of composite polynomial functions when compared with a benchmark tree-based genetic programming scheme.

Finally, it has been shown possible to exploit the collective emergent behaviour of ensembles of dynamical arithmetic networks to perform a multistep-ahead prediction of the EURUSD 50-day EMA, where the average error across the predictions was significantly less than XCSF with interval predicates.

Traditional approaches of iterating a one-step prediction typically result in a low performance model of a multistep-ahead prediction. Direct approaches suffer from the inability to capture the stochastic dependency between the future predictions and multiple-input multiple-output models suffer from too tight a coupling between the outputs. Here we have shown that by iterating each matching network, sampling the outputs each W cycles, and providing online reinforcement, dynamical genetic networks can provide a multistep-ahead prediction. The symbolic nature of the representation enables the modelling of relationships between sensory input, and here between the antecedent inputs. In addition, interval conditions assume that patterns will reoccur over the exact same state space and cannot generalise outside of the trained space, whereas symbolic conditions can learn to match based on the shape of the temporal dynamics.

It should be noted that with the increased capability resulting from more complex representations, the time to convergence can be dramatically slower than standard XCSF. The performance found by DGP, however, is generally similar to MLP-based neural-XCSF, yet with an improved signal to symbol transformation. Future work will continue to explore and identify problems wherein the additional expressiveness of DGP is most beneficial, such as those requiring memory and temporal prediction.

References

Ahluwalia
,
M.
, and
Bull
,
L
. (
1999
).
A genetic programming classifier system
. In
Proceedings of the Genetic and Evolutionary Computation Conference
, pp.
11
18
.
Ashby
,
W. R
. (
1952
).
Design for a brain
.
New York
:
Wiley
.
Ben Taieb
,
S.
,
Sorjamaa
,
A.
, and
Bontempi
,
G.
(
2010
).
Multiple-output modeling for multi-step-ahead time series forecasting
.
Neurocomputing
,
73
:
1950
1957
.
Bull
,
L
. (
2002a
).
Lookahead and latent learning in ZCS
. In
Proceedings of the Genetic and Evolutionary Computation Conference
, pp.
897
904
.
Bull
,
L.
(
2002b
).
On using constructivism in neural classifier systems
. In
J. J. Merelo, P. Adamidis, and H.-G. Beyer
(Eds.),
Parallel problem solving from nature: PPSN VII. Lecture notes in computer science
, Vol.
2439
(pp.
558
567
).
Berlin
:
Springer-Verlag
.
Bull
,
L.
(
2009
).
On dynamical genetic programming: Simple Boolean networks in learning classifier systems
.
International Journal of Parallel, Emergent and Distributed Systems
,
24
:
421
442
.
Bull
,
L.
, and
Hurst
,
J
. (
2003
).
A neural learning classifier system with self-adaptive constructivism
. In
Proceedings of the IEEE Congress on Evolutionary Computation, CEC ’03
, Vol.
2
, pp.
991
997
.
Bull
,
L.
, and
O'Hara
,
T
. (
2002
).
Accuracy-based neuro and neuro-fuzzy classifier systems
. In
Proceedings of the Genetic and Evolutionary Computation Conference
, pp.
905
911
.
Bull
,
L.
,
O'Hara
,
T.
, and
Lanzi
,
P. L
. (
2007
).
Anticipation mappings for learning classifier systems
. In
Proceedings of the IEEE Congress on Evolutionary Computation, CEC ’07
, pp.
2133
2140
.
Bull
,
L.
, and
Preen
,
R. J
. (
2009
).
On dynamical genetic programming: Random Boolean networks in learning classifier systems
. In
Proceedings of the 12th European Conference on Genetic Programming, EuroGP ’09
, pp.
37
48
.
Butz
,
M. V.
, and
Stolzmann
,
W
. (
2002
).
An algorithmic description of ACS2
. In
Revised Papers from the 4th International Workshop on Advances in Learning Classifier Systems, IWLCS ’01
, pp.
211
230
.
Clegg
,
J.
,
Walker
,
J. A.
, and
Miller
,
J. F
. (
2007
).
A new crossover technique for Cartesian genetic programming
. In
Proceedings of the 9th Annual Conference on Genetic and Evolutionary Computation, GECCO ’07
, pp.
1580
1587
.
Elder
,
A
. (
1993
).
Trading for a living: Psychology, trading tactics, money management
.
New York
:
Wiley
.
Ferreira
,
C
. (
2006
).
Gene expression programming: Mathematical modeling by an artificial intelligence
.
Studies in Computational Intelligence. Berlin
:
Springer-Verlag
.
Fogel
,
L. J.
,
Owens
,
A. J.
, and
Walsh
,
M. J
. (
1965
).
Artificial intelligence through a simulation of evolution
. In
Biophysics and Cybernetic Systems: Proceedings of the 2nd Cybernetic Sciences Symposium
, pp.
131
155
.
Forsyth
,
R.
(
1981
).
BEAGLE: A Darwinian approach to pattern recognition
.
Kybernetes
,
10
:
159
166
.
Gerard
,
P.
,
Meyer
,
J. A.
, and
Sigaud
,
O.
(
2005
).
Combining latent learning with dynamic programming in the modular anticipatory classifier system
.
European Journal of Operational Research
,
160
:
614
637
.
Gerard
,
P.
, and
Sigaud
,
O
. (
2001
).
YACS: Combining dynamic programming with generalization in classifier systems
. In
Revised Papers from the Third International Workshop on Advances in Learning Classifier Systems, IWLCS ’00
, pp.
52
69
.
Giani
,
A.
,
Baiardi
,
F.
, and
Starita
,
A
. (
1995
).
PANIC: A parallel evolutionary rule based system
. In
Proceedings of the Fourth Annual Conference on Evolutionary Programming
, pp.
753
771
.
Hirasawa
,
K.
,
Okubo
,
M.
,
Katagiri
,
H.
,
Hu
,
J.
, and
Murata
,
J
. (
2001
).
Comparison between genetic network programming (GNP) and genetic programming (GP)
. In
Proceedings of the IEEE Congress on Evolutionary Computation, 2001
, Vol.
2
, pp.
1276
1282
.
Holland
,
J. H
. (
1975
).
Adaptation in natural and artificial systems
.
Ann Arbor, MI
:
University of Michigan Press
.
Holland
,
J. H.
(
1976
).
Adaptation
. In
R. Rosen and F. M. Snell
(Eds.),
Progress in theoretical biology
,
Vol. 4 (pp. 263–293). New York
:
Academic Press
.
Holland
,
J. H.
(
1990
).
Concerning the emergence of tag-mediated lookahead in classifier systems
.
Physica D
,
42
:
188
201
.
Holley
,
J.
,
Pipe
,
A. G.
, and
Carse
,
B.
(
2004
).
Oneiric processing utilising the anticipatory classifier system
. In
X. Yao, E. Burke, J. A. Lozano, J. Smith, J. J. Merelo-Guervos, J. A. Bullinaria, J. Rowe, P. Tino, A. Kaban, and H. P. Schwefel
(Eds.),
Parallel problem solving from nature: PPSN VIII
. Lecture notes in computer science, Vol.
3242
(pp.
1103
1112
).
Berlin
:
Springer-Verlag
.
Huang
,
S.-J.
, and
Lian
,
R.-J.
(
2000
).
A combination of fuzzy logic and neural network controller for multiple-input multiple-output systems
.
International Journal of Systems Science
,
31
:
343
357
.
Ioannides
,
C.
, and
Browne
,
W.
(
2008
).
Investigating scaling of an abstracted LCS utilising ternary and S-expression alphabets
. In
J. Bacardit, E. Bernado-Mansilla, M. V. Butz, T. Kovacs, X. Llora, and K. Takadama
(Eds.),
Learning classifier systems
(pp.
46
56
).
Berlin
:
Springer-Verlag
.
Kantschik
,
W.
, and
Banzhaf
,
W
. (
2002
).
Linear-graph GP—A new GP structure
. In
Proceedings of the 5th European Conference on Genetic Programming, EuroGP ’02
, pp.
83
92
.
Kauffman
,
S. A
. (
1993
).
The origins of order: Self-organization and selection in evolution
.
Oxford, UK
:
Oxford University Press
.
Kaufman
,
P. J
. (
1995
).
Smarter trading: Improving performance in changing markets
.
New York
:
McGraw-Hill
.
Koza
,
J. R
. (
1992
).
Genetic programming
.
Cambridge, MA
:
MIT Press
.
Koza
,
J. R
. (
1994
).
Genetic programming II
.
Cambridge, MA
:
MIT Press
.
Landau
,
S.
,
Picault
,
S.
, and
Drogoul
,
A
. (
2001
).
ATNoSFERES: A model for evolutive agent behaviors
. In
Proceedings of the AISB’01 Symposium on Adaptive Agents and Multi-Agent Systems
.
Landau
,
S.
,
Sigaud
,
O.
, and
Schoenauer
,
M
. (
2005
).
ATNoSFERES revisited
. In
Proceedings of the 2005 Conference on Genetic and Evolutionary Computation, GECCO ’05
, pp.
1867
1874
.
Lanzi
,
P. L.
(
1999
).
An analysis of generalization in the XCS classifier system
.
Evolutionary Computation
,
7
:
125
149
.
Lanzi
,
P. L
. (
2001
).
Mining interesting knowledge from data with the XCS classifier system
. In
Proceedings of the Genetic and Evolutionary Computation Conference, GECCO ’01
, pp.
958
965
.
Lanzi
,
P. L
. (
2003
).
XCS with stack-based genetic programming
. In
Proceedings of the IEEE Congress on Evolutionary Computation, CEC ’03
, Vol.
2
, pp.
1186
1191
.
Lanzi
,
P. L
. (
2007
).
An analysis of generalization in XCS with symbolic conditions
. In
Proceedings of the IEEE Congress on Evolutionary Computation, CEC 2007
, pp.
2149
2156
.
Lanzi
,
P. L.
, and
Perrucci
,
A
. (
1999
).
Extending the representation of classifier conditions, Part II: From messy coding to S-expressions
. In
Proceedings of the Genetic and Evolutionary Computation Conference, GECCO ’99
, pp.
345
352
.
Lanzi
,
P. L.
,
Rocca
,
S.
,
Sastry
,
K.
, and
Solari
,
S.
(
2008
).
Analysis of population evolution in classifier systems using symbolic representations
. In
J. Bacardit, E. Bernad-Mansilla, M. V. Butz, T. Kovacs, X. Llor, and K. Takadama
(Eds.),
Learning classifier systems
. Lecture notes in computer science, Vol.
4998
(pp.
22
45
).
Berlin
:
Springer-Verlag
.
Mellor
,
D
. (
2005
).
A first order logic classifier system
. In
Proceedings of the 2005 Conference on Genetic and Evolutionary Computation, GECCO ’05
, pp.
1819
1826
.
Miller
,
J. F
. (
1999
).
An empirical study of the efficiency of learning Boolean functions using a Cartesian genetic programming approach
. In
Proceedings of the Genetic and Evolutionary Computation Conference, GECCO ’99
, pp.
1135
1142
.
Miller
,
J. F.
, and
Thomson
,
P.
(
2000
).
Cartesian genetic programming
. In
Proceedings of the Third European Conference on Genetic Programming, EuroGP 2000. Lecture notes in computer science
, Vol.
1802
(pp.
121
132
).
Berlin
:
Springer-Verlag
.
Mitchell
,
T
. (
1997
).
Machine learning
.
New York
:
McGraw Hill
.
O'Hara
,
T.
, and
Bull
,
L
. (
2005
).
Building anticipations in an accuracy-based learning classifier system by use of an artificial neural network
. In
The IEEE Congress on Evolutionary Computation
, Vol.
3
, pp.
2046
2052
.
Perkis
,
T
. (
1994
).
Stack-based genetic programming
. In
Evolutionary Computation, Proceedings of the First IEEE Conference on Computational Intelligence
, pp.
148
153
.
Preen
,
R. J.
, and
Bull
,
L
. (
2009
).
Discrete dynamical genetic programming in XCS
. In
Proceedings of the 11th Annual Conference on Genetic and Evolutionary Computation, GECCO ’09
, pp.
1299
1306
.
Pujol
,
J. C. F.
, and
Poli
,
R
. (
1998
).
Efficient evolution of asymmetric recurrent neural networks using a PDGP-inspired two-dimensional representation
. In
Proceedings of the First European Workshop on Genetic Programming
, pp.
130
141
.
Ramirez Ruiz
,
J. A.
,
Valenzuela-Rendón
,
M.
, and
Terashima-Marín
,
H.
(
2008
).
QFCS: A fuzzy LCS in continuous multi-step environments with continuous vector actions
. In
G. Rudolph, T. Jansen, S. M. Lucas, C. Poloni, and N. Beume (Eds.)
,
Parallel problem solving from nature: PPSN X
, pp.
286
295
.
Riolo
,
R. L
. (
1990
).
Lookahead planning and latent learning in a classifier system
. In
From Animals to Animats, Proceedings of the First International Conference on Simulation of Adaptive Behavior
, pp.
316
326
.
Samuel
,
A.
(
1959
).
Some studies in machine learning using the game of checkers
.
IBM Journal
,
3
:
210
229
.
Schmidt
,
M.
, and
Lipson
,
H
. (
2007
).
Comparison of tree and graph encodings as function of problem complexity
. In
Proceedings of the 9th Annual Conference on Genetic and Evolutionary Computation, GECCO ’07
, pp.
1674
1679
.
Schwefel
,
H.-P
. (
1981
).
Numerical optimization of computer models
.
New York
:
Wiley
.
Selvaraj
,
R.
,
Deshpande
,
P. B.
,
Tambe
,
S. S.
, and
Kulkarni
,
B. D.
(
1995
).
Neural networks for the identification of MSF desalination plants
.
Desalination
,
101
:
185
193
.
Shirakawa
,
S.
,
Ogino
,
S.
, and
Nagao
,
T
. (
2007
).
Graph structured program evolution
. In
Proceedings of the 9th Annual Conference on Genetic and Evolutionary Computation, GECCO ’07
, pp.
1686
1693
.
Smith
,
S. F.
(
1983
).
Flexible learning of problem solving heuristics through adaptive search
.
PhD thesis, University of Pittsburgh
.
Stalph
,
P. O.
, and
Butz
,
M. V.
(
2010
).
How fitness estimates interact with reproduction rates: Towards variable offspring set sizes in XCSF
. In
J. Bacardit, W. Browne, J. Drugowitsch, E. Bernado-Mansilla, and M. Butz
(Eds.),
Learning classifier systems
. Lecture notes in computer science, Vol.
6471
(pp.
47
56
).
Berlin
:
Springer-Verlag
.
Stolzmann
,
W.
(
1998
).
Anticipatory classifier systems
. In
J. R. Koza, W. Banzhaf, K. Chellapilla, K. Deb, M. Dorigo, D. Fogel, M. Grazon, D. E. Goldberg, H. Iba, and R. Riolo (Eds.)
,
Proceedings of the Third Annual Conference on Genetic Programming
, pp.
658
664
.
Teller
,
A.
, and
Veloso
,
M.
(
1996
).
Neural programming and an internal reinforcement policy
. In
J. R. Koza (Ed.)
,
Late Breaking Papers at the Genetic Programming 1996 Conference
, pp.
186
192
.
Teller
,
A.
, and
Veloso
,
M.
(
1997
).
PADO: A new learning architecture for object recognition
. In
K. Ikeuchi and M. Veloso
(Eds.),
Symbolic visual learning
(pp.
77
112
).
Oxford, UK
:
Oxford University Press
.
Tomlinson
,
A.
(
2001
).
CXCS: Triggered linkage. (Tech. Rep. UWELCSG01-003) University of the West of England
.
Tran
,
H. T.
,
Sanza
,
C.
,
Duthen
,
Y.
, and
Nguyen
,
T. D
. (
2007
).
XCSF with computed continuous action
. In
Proceedings of the 9th Annual Conference on Genetic and Evolutionary Computation, GECCO ’07
, pp.
1861
1869
.
Valenzuela-Rendón
,
M
. (
1991
).
The fuzzy classifier system: A classifier system for continuously varying variables
. In
Proceedings of the Fourth International Conference on Genetic Algorithms
, pp.
346
353
.
Watkins
,
C. J. C. H.
(
1989
).
Learning from delayed rewards. PhD thesis, Cambridge University, Cambridge, UK
.
Williams
,
R. J.
, and
Zipser
,
D.
(
1989
).
A learning algorithm for continually running fully recurrent neural networks
.
Neural Computation
,
1
:
270
280
.
Wilson
,
S. W.
(
1994
).
ZCS: A zeroth level classifier system
.
Evolutionary Computation
,
2
:
1
18
.
Wilson
,
S. W.
(
1995
).
Classifier fitness based on accuracy
.
Evolutionary Computation
,
3
:
149
175
.
Wilson
,
S. W.
(
2000
).
Get real! XCS with continuous-valued inputs
. In
Learning classifier systems, From foundations to applications
(pp.
209
222
),
Berlin
:
Springer-Verlag
.
Wilson
,
S. W
. (
2001
).
Function approximation with a classifier system
. In
Proceedings of the Genetic and Evolutionary Computation Conference, GECCO ’01
, pp.
974
981
.
Wilson
,
S. W.
(
2002
).
Classifiers that approximate functions
.
Natural Computing
,
1
:
211
234
.
Wilson
,
S. W.
(
2004
).
Classifier systems for continuous payoff environments
. In
Genetic and Evolutionary Computation, GECCO 2004. Lecture notes in computer science
, Vol.
3103
(pp.
824
835
).
Berlin
:
Springer-Verlag
.
Wilson
,
S. W
. (
2007
).
Three architectures for continuous action
. In
Proceedings of the 2003–2005 International Conference on Learning Classifier Systems, IWLCS’03-05
, pp.
239
257
.
Wilson
,
S. W.
(
2008
).
Classifier conditions using gene expression programming
. In
J. Bacardit, E. Bernado-Mansilla, M. V. Butz, T. Kovacs, X. Llora, and K. Takadama
(Eds.),
Learning classifier systems
(pp.
206
217
).
Berlin
:
Springer-Verlag
.
Zatuchna
,
Z. V.
, and
Bagnall
,
A. J
. (
2005
).
AgentP classifier system: Self-adjusting vs. gradual approach
. In
Proceedings of the IEEE Congress on Evolutionary Computation
, Vol.
3
, pp.
1196
1203
.