## Abstract

It has been shown previously that the control of a robot arm can be efficiently learned using the XCSF learning classifier system, which is a nonlinear regression system based on evolutionary computation. So far, however, the predictive knowledge about how actual motor activity changes the state of the arm system has not been exploited. In this paper, we utilize the forward velocity kinematics knowledge of XCSF to alleviate the negative effect of noisy sensors for successful learning and control. We incorporate Kalman filtering for estimating successive arm positions, iteratively combining sensory readings with XCSF-based predictions of hand position changes over time. The filtered arm position is used to improve both trajectory planning and further learning of the forward velocity kinematics. We test the approach on a simulated kinematic robot arm model. The results show that the combination can improve learning and control performance significantly. However, it also shows that variance estimates of XCSF prediction may be underestimated, in which case self-delusional spiraling effects can hinder effective learning. Thus, we introduce a heuristic parameter, which can be motivated by theory, and which limits the influence of XCSF's predictions on its own further learning input. As a result, we obtain drastic improvements in noise tolerance, allowing the system to cope with more than 10 times higher noise levels.

## 1 Introduction

Controlling a robot arm such that goals can be reached efficiently by the robot's hand (end-effector) requires an understanding of how motor commands affect the end-effector position. Thus, the relationship between changes in limb angles to changes in end-effector positions (the so-called velocity kinematics) has to be known. Assuming that the robot controller has no prior knowledge of the arm's geometry, it has to learn these mappings using simultaneous observations of limb angles and the end-effector position. Stereo-vision sensors provide a 3D measurement of the position of the robot's hand, but with a relatively high degree of uncertainty. As we will see, the noise introduced by end-effector position measurements may severely disrupt learning at moderate noise levels. Thus, any successful real-world learning approach needs to provide strategies to cope with this noise.

It is known that in a noise-free scenario, the velocity kinematics can be learned very successfully using XCSF, an evolutionary system of a population of linear regressors. The aim of this paper is to improve the noise robustness of an XCSF robot controller using Kalman filtering. The key idea is that the estimation of the state change, which is required in the Kalman prediction step, can be obtained using the velocity kinematics mappings stored in XCSF.

### 1.1 Robot Arm Control with XCSF

Stalph and Butz (2012) have developed a system that successfully learns to control robot arms with up to seven degrees of freedom, using an evolutionary function approximation system called XCSF. XCSF can be described as a derivative (Wilson, 2002) of the XCS learning classifier system (Wilson, 1995). It approximates nonlinear functions by a population of locally-weighted linear local approximators. Each approximator is usually referred to as a classifier whose condition determines its activity and thus its local influence given a current input. The classifier prediction is a linear approximation of the encountered input-output value combinations.

It was shown recently (Butz et al., 2008b; Stalph et al., 2010) that XCSF shares many similarities with the locally-weighted projection regression (LWPR) algorithm (Vijayakumar et al., 2005), which is well-known in the neuro-robotics literature (Peters and Schaal, 2008; Sigaud and Peters, 2010; Sigaud et al., 2011). With this interpretation in mind, XCSF classifiers may be considered as neural structures that specify local receptive fields.

### 1.2 Exploiting Model Knowledge to Improve Noise Robustness

XCSF learns the forward velocity kinematics (mapping changes in joint angles to changes in hand locations given the current arm configuration) from which the inverse velocity kinematics (mapping desired changes in hand location space to changes in joint angles) is constructed in order to generate targeted movement commands. The forward knowledge itself, however, has not been exploited in any way so far.

An obvious approach to increase noise robustness is to exploit temporal dependencies: since the observations are made along trajectories of the robot arm, subsequent samples are highly correlated. There is thus some potential for significant gains in the signal to noise ratio by exploiting these correlations by means of filtering. It is necessary to use adaptive filters, because fixed filters with a constant transfer function tend to introduce temporal delay, which introduces systematic biases in learning.

We propose to use a well-established approach, known as Kalman filtering, which assumes knowledge of the system's dynamics. A Kalman filter operates recursively on a stream of noisy input data to produce a statistically optimal estimate of the underlying system state. Each iteration can be split into a prediction step and an update step. In the prediction step, the system's next state is predicted from its presumed state in the previous time step and known control inputs. In the update step, the new system state estimate is obtained by fusing information obtained from sensors with the predicted state.

The crucial contribution of this paper is to use forward kinematic velocity estimates provided by XCSF as control input for the Kalman filter. This requires an estimate of the prediction variance, as investigated in Drugowitsch and Barry (2007, 2008) and Loiacono et al. (2008). When implementing the Kalman filter without any precautions, we show that the system may learn from over-filtered sensory information and thus may get stuck in a self-delusional loop, where XCSF over-believes its own predictions. By effectively limiting the estimated prediction confidence, however, we can show that the resulting XCSF system can deal with much larger measurement noise, still learning its bidirectional forward prediction and inverse control structures effectively.

### 1.3 Paper Organization

We first give an overview of XCSF and how it is applied to learn arm control. Next we introduce the basics of Kalman filtering. Then, we specify the modification of the interaction cycle necessary to exploit temporal dependencies of samples in the arm control scenario by means of Kalman filtering. We scrutinize the performance of XCSF on two simulated arm models with 2 DOF and 7 DOF, respectively.

A key finding, which was already reported at the GECCO 2012 conference (Kneissler et al., 2012) in a shorter version of this paper, is that XCSF with Kalman filtering can solve problems at a higher magnitude of sensory noise, but only after introduction of an additional threshold parameter. Here we additionally report the results of follow-up experiments that confirmed the self-delusional effect as the reason for failure without thresholding and find consistency with theoretical considerations, which provides a formula to calculate the (so-far manually optimized) threshold value as a function of the noise level. Finally, we investigate the possible sources for the failure of the baseline XCSF-based arm control under noisy conditions separately and find that the evolutionary algorithm of XCSF is the most vulnerable spot of the system. Summary and future work considerations conclude the paper.

## 2 Robot Arm Control Using XCSF

### 2.1 Prior Work

Learning classifier systems (LCSs) were originally introduced by John H. Holland (Holland and Reitman, 1978). The accuracy-based XCS learning classifier system was introduced by Stewart W. Wilson (Wilson, 1995). XCS was successfully applied in binary classification tasks (Wilson, 1995; Butz, 2006), data mining problems (Bernadó-Mansilla and Garrell-Guiu, 2003; Wilson, 2000), and function approximation problems (Wilson, 2001, 2002; Lanzi et al., 2007; Butz, Lanzi, et al., 2008), among others (Bull, 2004). In the real-valued function approximation case, XCS is termed XCSF, essentially constituting an iteratively learning regression system.

We focus on the XCSF system (Wilson, 2002; Butz, Lanzi, et al., 2008) and its potential for learning the forward velocity kinematics of a robot arm for inverse control (Butz and Herbort, 2008; Butz and Stalph, 2011; Stalph and Butz, 2012). The considered XCSF setup is based on the available implementation (Stalph and Butz, 2009) and the detailed descriptions available in the literature (Butz and Herbort, 2008; Butz, Lanzi, et al., 2008; Stalph and Butz, 2012).

### 2.2 Learning Forward Velocity Kinematics

In general, XCSF learns to approximate nonlinear multidimensional functions using piecewise linear models. It evolves a population of classifiers, where each classifier covers a particular subspace of the input space, which may be termed the receptive field of a classifier. Moreover, each classifier learns a linear model for approximating the function surface in its subspace. The linear model of a classifier is typically adapted by means of recursive least squares (Butz, Lanzi, et al., 2008; Lanzi et al., 2006). The receptive field, that is, the condition part of a classifier, is evolved over time by a steady-state genetic algorithm (GA). We use general Gaussian kernel-based receptive fields, yielding ellipsoidal regions of influence for a classifier. While each single classifier approximates a subset, the whole population approximates the complete function surface. Due to evolutionary optimization, XCSF tends to implicitly minimize the absolute difference between this surface and the encountered function (however, there is actually no such explicit goal encoded algorithmically in XCSF).

XCSF can be used to learn the velocity kinematics of a robot arm. A robot arm may be characterized by its configuration space and the task space of the end-effector. While the task space is usually encoded in a Cartesian coordinate system, the configuration space may be encoded by joint angles. Due to the arm kinematics, a configuration uniquely determines the corresponding end-effector location in task space . This forward kinematics mapping can be expressed as a typically nonlinear function ** x**=

*f*(

**q**). The forward velocity kinematics is obtained by taking the derivative , where is the Jacobian matrix of

*f*.

To control the robot arm, a goal direction needs to be translated into necessary control commands by applying the inverse velocity kinematics. Here lies the strength of XCSF: Due to the inherent ambiguity of the inverse kinematics, it is much easier to learn the forward kinematics instead. Since XCSF models the target function locally linear, the local Jacobian **J**(**q**), given the specified configuration **q**, can be extracted from the population of XCSF classifiers at almost no computational cost.

It remains to invert **J**(**q**), resolving redundancy by introducing additional constraints. Here, we invert the Jacobian such that free DOFs are used to avoid extreme angular values. However, it was shown elsewhere (Butz et al., 2009; Stalph and Butz, 2012) that other redundancy resolution techniques can equally well be applied, such as applying additional movement constraints or computing the Moore-Penrose pseudo-inverse (Ben-Israel and Greville, 2003).

To sum up, XCSF learns the mapping from configuration space velocities to task space velocities , depending on the current configuration **q**. XCSF is well-suited for this task, since the algorithm is able to cluster a context space while learning a function that operates on a different space, but depends on the context. In our task, the current configuration is the context and XCSF clusters are the configuration space with hyper-ellipsoidal RFs (Butz, Lanzi, et al., 2008). In turn, each classifier approximates the mapping by estimating the local Jacobian matrix *J*(** q**) using linear recursive least squares approximations. Given a particular configuration, the classifiers whose conditions overlap with the current configuration are considered; their linear Jacobians are combined in a weighted manner; finally, the resulting Jacobian matrix is inverted for generating a control signal.

## 3 Exploiting Temporal Dependencies

### 3.1 Kalman Filtering

Filtering can be seen as a procedure for transforming a noisy time series to a smoothed time series closer to the sequence of ”true” values (*x _{n}*), in our case the sequence of end-effector locations that are traversed in task space. Kalman filtering does that by elegantly combining the following two sources of information:

The filtered value from the previous time step, corrected by the predicted change between steps with mean and variance (formed from the predictions of the individual classifiers, as we will describe in a later section), and

The sensor readings for the current time step.

At each time step *n*, the Kalman filter maintains an estimate of the end-effector location that is given by its mean and its variance *V _{n}*. Assuming knowledge of this estimate at time

*n*−1, it is updated in the light of the predicted change and the current sensory reading in two steps.

*V*converges quickly toward a value close to 0.

_{n}### 3.2 Kalman Filtering with XCSF

The overall flow of information in an XCSF-based arm control learner without filtering is shown in Figure 1(a). It is consistent with previous experiments, as conducted, for example, by Butz and Herbort (2008); Butz et al. (2009); and Stalph and Butz (2012). XCSF interacts with a controller module, which reads sensory information from and executes motor commands in the associated arm model. Motor noise and sensor noise may be added in this process. Given the current joint angle state of the arm *q*_{n-1}, XCSF generates a set of matching classifiers. Using this set and a given desired direction of the hand toward goal *g*, XCSF generates a control command by means of its locally linear, inverse velocity kinematics model. In our kinematic arm simulations, motor noise is added to this motor command, which is then sent to the arm model. In return, the next angular state *q _{n}* is perceived. Additionally, the consequent location of the end-effector

*x*is perceived and noise is added, yielding . The information is thus complete to learn from the described interaction using

_{n}*q*

_{n-1}as the context signal for matching, and as well as for the prediction and resulting error and fitness updates.

The described original XCSF setup for learning and goal-directed behavior control is modified as shown in Figure 1(b). The added Kalman filtering process essentially filters the successive perceptions of end-effector locations by means of XCSF's forward velocity kinematics model. Previously, the change in end-effector state was directly determined by the noisy location signal, that is, . Now the change in the end-effector location is calculated by filtering the location feedback with the state prediction generated by the locally linear forward velocity kinematics model of XCSF, that is, , additionally considering the variance estimate of the local model. In consequence, XCSF can exploit its predictive knowledge controlling the arm based on the filtered internal hand location estimate and learning from the filtered location changes in each iteration *n*. Thus, we expect to reap the benefits of the filtering process for both learning and movement planning.

### 3.3 Error and Confidence Estimation

XCSF traditionally estimates the mean absolute deviation of its prediction. However, a variance estimate is necessary to combine the prediction with other, for example, sensory information. Thus, XCSF is modified by estimating the standard deviation of a classifier prediction, rather than the mean absolute deviation. This was proposed and investigated elsewhere for the XCSF system in function approximation problems (Loiacono et al., 2008), where it was shown that the performance of XCSF generally does not suffer from this system change.

*k*samples that a classifier has learned from, the variance of the error of a single classifier in XCSF may be estimated by using the sequence of values and respectively generated predictions Note that specifies the prediction that was generated when the sample was encountered. It is thus based on the knowledge of the classifier available until that point in time. A disadvantage of this approach is that the estimate tends to be pessimistic, in that the true is likely to be smaller because the improvement of the linear model due to later samples is not taken into account. The advantage, however, is that it is not necessary to calculate the prediction using all previous samples in every time step. In consequence, it is not required to keep a history of the previous samples. Rather, the variance estimate can be updated iteratively online:

### 3.4 Combining Stochastic Estimates across Classifiers

XCSF predicts function values as weighted linear combinations of predictions of the subset of classifiers that match the current context. In contrast to traditional XCSF, where these weights are proportional to the classifiers’ fitness, we use the inverses of the variance estimates as weights, as proposed by Drugowitsch and Barry (2007). The joint error variance for its prediction, as used in the Kalman filter, is then computed as the weighted sum, such that highly accurate classifiers contribute more to the final prediction than inaccurate classifiers.

Application of the Kalman filter requires us to estimate the mean and variance of the change in task space, , given a particular executed control . While we have such an estimate for each classifier separately, we here describe how to combine these local estimates to provide a single, global estimate.

First, let us recapitulate a well-known methodology for extracting a higher fidelity signal by linear combination of several low-fidelity input signals. Let us therefore assume that we are given *n* random variables (the estimate of each classifier *i*) that all share the same mean and have variances . We further assume that the variances are known to us, but the mean is not, and shall therefore be estimated by a linear combination , with some suitable weights *w _{i}*, with the hope that gives us a better (i.e., lower variance) representation of the true value .

*n*: We have chosen to use the more conservative approach of Equation (12) in our simulations.

## 4 Results

To evaluate the approach and scrutinize the dependency on the threshold , which controls the maximum influence of XCSF's predictive knowledge as detailed above, we tested performance on an exemplary 2-DOF robot arm system operating in a plane, as well as an anthropomorphic 7-DOF robot arm in three dimensions. All the reported results were carried out on a kinematic arm simulation.

### 4.1 XCSF Settings

In all experiments reported in this paper, we used the XCSF setup that was reported in Butz, Lanzi, et al. (2008). In particular, we use XCSF with hyper-ellipsoidal conditions, that is, general Gaussian kernels, which were first introduced in Butz et al. (2006). Moreover, each classifier learns a linear prediction by means of recursive least squares approximation. In particular, the parameters of XCSF were set as follows: *N*=2, 000, , , , , , , . The GA application threshold is set to and the target error is set to . The mutation probability for each element of the condition is set to one over the number of alleles, for example for the two-dimensional case and for the 7-DOF arm. If mutated, center values are mutated within the receptive field bounds, the stretches are decreased or increased in size, at a maximum, doubling or halving their current size. The initial radius of receptive fields is taken uniformly randomly from [0.01, 1]. Uniform crossover, GA subsumption, and tournament selection with are applied. Condensation (Wilson, 1998) commences after 80% of the learning iterations.

### 4.2 Experimental Setup

The planar 2-DOF arm with limb lengths *l*_{1}=1.4, *l*_{2}=0.9 and angular ranges was used to explore the potential of the filtering approach. The maximum rotation velocity per joint is restricted to 0.01 radians per iteration. The kinematic specifications of the 7-DOF anthropomorphic arm are illustrated in Figure 2. The arm has a total length of 100 cm. Rotation axes are drawn as dashed lines; the two rotary joints are depicted with a circle. Joint angles are restricted to , , , , , , . Similar to a human arm, the shoulder has 3 DOFs, the elbow allows for 2 DOFs, and the wrist offers another 2 DOFs.

During learning, the involved controller generates goal locations uniformly randomly within the workspace of the arm. From the beginning, XCSF utilizes its internal model to strive for the given goal. A goal is assumed to be reached if the arm end-effector reaches a distance smaller than 5% of its arm length to the goal. If the goal was reached or 300 iterations passed without reaching the goal, a new goal location was generated. In this way, well-balanced exploration was achieved, as proposed elsewhere (Stalph and Butz, 2012). Moreover, 0.05 standard deviations in radians of angular motor noise were added to the motor commands, as also done elsewhere (Stalph and Butz, 2012). This is particularly necessary during the early stage to bootstrap XCSF's learning progress.

### 4.3 Quality Criteria

To track learning performance, offline testing phases were inserted. In an offline testing phase, XCSF-learning was switched off, but filtering was performed as usual. Each testing phase contained 100 episodes with randomly chosen start/goal-pairs (the same in all testing phases, disjoint to the start/goal-pairs used in learning). The Kalman filter was set to its initial state (infinite variance) at the beginning of each episode. When the distance of the hand position to the current goal was less than 0.05, the episode was considered successful. As during learning, if the goal was not reached within 300 time steps, the episode was aborted. During this testing period, we evaluated the following two measures of the system's path planning capability:

**Quality.**The percentage of successful episodes was multiplied with the average path efficiency in order to obtain a performance measure in the range , where 0 indicates complete failure and 100 stands for perfect planning. The path efficiency of an episode is defined as the straight distance from start point to end point of the trajectory divided by its actual length.**Angular Error.**The average angle (in radians) between the direction toward the target and the actually performed change in hand position (averaged over all time steps of all episodes). Since the measure includes the effect of motor noise, the theoretically optimal value of 0 is, in practice, never obtained.

### 4.4 Learning Curves

In Figure 3, we give the time course of XCSF performance for typical cases. Figure 3(a) is the baseline case without filtering. At a noise level of 0.01, learning of the baseline system already fails completely. With Kalman filtering (Figure 3(b)) and a suitably chosen (see Section 4.5), XCSF still learns well, even when the noise level is increased by a factor of 5.

Performances of different setups are subsequently compared by averaging the quality measures in the offline phases between 300k and 400k learning iterations (the final phase of an XCSF run is a condensation phase, starting at iteration 400k, during which redundant classifiers are eliminated).

### 4.5 Effect of Threshold Parameter θ_{β}

#### 4.5.1 2-DOF Arm

We systematically measured the effect of the proposed threshold on performance for different levels of position sensor noise. In the plots of Figure 4, the quality measure (averaged over 20 runs) is shown as a function of for different levels of sensor noise . We find that if the noise is almost negligible, values of close to 1 are advantageous. For higher levels of sensor noise, however, there is a clearly distinguishable optimum (triangles on the *x*-axis, determined by a quadratic fit to the data points in the shaded area), which moves gradually to smaller values as increases. The height of the hill gradually decreases, and beyond a noise level of 0.2, the performance finally completely collapses for all choices of .

In effect, these results show that XCSF benefits from Kalman filtering with an appropriate threshold, but may also suffer from delusional cycles, in which it starts to believe too strongly in its own, inaccurate predictions (small values in Figure 4). The higher the actual noise of the sensors, the higher the need to prevent the system from believing too strongly in its own predictions.

The results show that the introduction of a threshold in Kalman filtering is helpful for a wide range of sensor noise levels. More than 20% SD of the actual arm length noise must be added to the location sensor to deduce a change in performance.

#### 4.5.2 7-DOF Arm

While it is comparatively simple to reach high precision for a two-joint planar arm, the 7D task generally shows lower baseline performance, since the task generally scales exponentially in the dimension of the input space. Nonetheless, the XCSF learning classifier system is able to learn a suitable representation sufficient for accurate control.

The result of an exhaustive search for optimal threshold parameters in Figure 5 is qualitatively very comparable to the 2-DOF case: For very small sensor noise in Figure 5(a) through Figure 5(c), the filtering approach cannot improve performance, which is confirmed by an optimal . With increasing sensor noise, the hill of optimal shifts to the left until performance collapses for noise levels beyond 0.1 in Figure 5(k) through Figure 5(l).

### 4.6 Noise Robustness

Finally, in Figure 6, we have assembled the summary of all experiments so far in a pair of graphs. They show the quality measure for different noise levels for the baseline system as well as the Kalman filtering systems with and without threshold (optimized according to Figures 4 and 5). It can be seen that pure Kalman filtering reduces performance in the regime of moderate sensor noise levels. Only for higher noise levels does it show a small benefit for the 2-DOF arm, but only when its performance has already become unacceptably bad. The introduction of a filtering threshold completely changes the picture. For small , the baseline is reproduced and the performance stays almost unaffected for a much wider range (until 0.02 instead of 0.005 for 2 DOF, until 0.01 instead of 0.002 for 7 DOF). Taking 60% as the acceptance limit for the quality, we see that our proposed system can cope with higher noise levels compared to the baseline system (10 times higher for 2 DOF, four times higher for 7 DOF).

In Figure 7, we compare in similar fashion the angular error for baseline and thresholded Kalman filtering for 2-DOF and 7-DOF arms. Note that these data, which were already published in Kneissler et al. (2012), were obtained using slightly different XCSF settings (learning steps, population size, and other parameters) and using a faster offline measurement procedure (only the first 30 steps of a period where evaluated for averaging the angular error). The solid and dashed curves were obtained by fitting a tanh function to the angular error data points of the baseline and thresholded Kalman filtering system, respectively (in the domain). The insets show the resulting gain factor in noise robustness plotted over the acceptance level of the angular error. It can be seen that noise tolerance is up to 20 times higher in case of the 2-DOF arm. For the 7-DOF arm, the maximal factor of approximately 10 is reached at the less challenging end (high error levels accepted). When requiring higher performance, the noise tolerance factor decreases to five.

### 4.7 Optimal Thresholds θ_{β}

So far, we have treated the threshold as a heuristically introduced, free parameter and have optimized it by exhaustive search. Now, we derive a formula to get an estimate for as a function of based on reasonable assumptions.

Let us start by assuming that irrespective of the noise level, XCSF eventually is able to establish a good distribution of classifiers, such that the estimation error becomes uniformly distributed over the input space. Furthermore, let us assume that in the limit of infinitely many samples observed, each of the estimators converges to its optimum (as if it would have seen noise-free samples). This implies that for each noise level, the root mean squared prediction error is limited uniformly by the same final error .

*V*, and thus also the mixing constant , will then converge as well, say against . We can calculate these steady-state solutions by using Equation (2) and setting and in Equations (4) and (5): This system leads to a quadratic equation for with a single positive solution:

_{n}Since the average error in the early phases can be expected to be , the corresponding Kalman mixing factor should always be larger than . It is thus reasonable to choose the threshold in order to avoid self-delusions without degradation of the system's learning capability.

### 4.8 Analysis of the Self-Delusion Effect

To investigate the effect of self-delusional loop that was introduced by feeding the XCSF predictions back to the Kalman filter, we allowed each individual of the XCSF population to maintain three independent predictors; one was trained using the clean position samples (before addition of sensor noise), the second was trained using noisy samples (before Kalman filtering) and the third was obtained from the output of the thresholded Kalman filtering. In Figure 9, we compare three sets of simulations that differ in the way that the Kalman filter was used: in setup CC, the Kalman filter was fed with output from the predictors trained on clean samples; setup FF used the filtered predictors; in setup FN the prediction of hand position change came from the filtered predictors while the error variance prediction came from the noisy predictors of the matched XCSF subpopulation. A number of observations can be made.

**Effect of Self-Delusion Loop**We suspected in Section 3.2 that the failure of a system with unbounded Kalman gain is due to the feeding of information from predictors that have been trained on filtered samples back into the filter. This is confirmed by observing that the performance of system FF drops to the level of the baseline (or even below for the 7-DOF arm) when the threshold parameter approaches 0. In the setups CC, it stabilizes to a performance above the baseline for .**Role of Error Variance Estimation in Self-Delusion Spiraling.**The curves FN, in which only the position prediction comes from predictors that have been trained on filtered samples, but not the variance estimates, agrees with those of CC and FF for high , and remains constant for small . For low levels of sensor noise, its performance is not too far from the maximum of curve FF. Thus, partially breaking the loop by avoiding variance feedback effectively prevents delusional-spiraling and leads to a Kalman filtering system that performs well without thresholding (and thus without the need for optimizing the ad hoc parameter ). Unfortunately, the performance of the FN system quickly drops with increasing and is scarcely above baseline in Figure 9(c, d).**Influence of Thresholding.**On the right side of the optimum, from the curves for the setups, FF are quite close to the curves for CC. Only toward lower does the performance drop for FF, while it remains constant or further increases for CC. This can be taken as an indication that the thresholding does not interfere strongly with the Kalman filtering itself. Furthermore, there appears to exist a critical value for below which the negative effect of self-delusion starts getting noticeable and eats up the gains that could still be achieved by stronger filtering.

### 4.9 Investigation of Contributions of Different Parts of the System

The positive effect of filtering in the arm-controlling scenario is a combination of three separate sources:

**Planning.**The desired movement direction can be determined more accurately using the filtered hand position instead of the noisy position signal from the sensors.**Inverse Kinematics.**The Jacobian used to translate desired movement into desired joint angle changes is more accurate when extracted from predictors trained on filtered samples.**Genetic Algorithm.**The error variance of predictors trained on noisy samples is contaminated with the sensor noise. In the filtering system, a predictor's error variance is thus closer to the noise-independent part of the prediction error (which measures to non-linearity of the target function for well-trained estimators). Therefore, its fitness estimate (derived from estimated error variance) is much better suited to guide the evolutionary algorithm.

In order to investigate the individual contributions of the three effects and their dependencies, we trained two predictors for each classifier: one trained on clean samples, the other trained on noisy samples (before/after addition of sensor noise). This was done in a modified baseline system (no filtering) in which we separately controlled which of the two sets of predictors was used at the three places described above. The result is given in Figure 10; it shows all eight possible combinations, labeled CCCNNN (standing for planning, inverse kinematics, and genetic algorithm, as above). It shows, consistently for both arm setups, that the component that is most severely affected by noise is the genetic algorithm. The inverse kinematics is more critical than the planning in the 2-DOF case and the converse is true for the 7-DOF arm.

To verify how far we can overcome the effect of noise by filtering, for the three described sources, we made similar changes to a system with thresholded Kalman filtering, allowing a third set of predictors that were trained on filtered samples. The input to the Kalman filter was always taken from the predictors that were fed with filtered samples. The threshold was set to optimal values according to Figure 8(a).

In Figure 11, we give the performance as a function of the sensor noise level . As can be seen, the genetic algorithm is affected the earliest of all three (i.e., note the steep drop of curve CCN in the range ) while planning and inverse kinematics appear to be deteriorated only at much higher noise levels (around ). Compared to the baseline curve NNN, we see that the effect on the evolutionary algorithm is the only reason for failure under moderately noisy conditions. Only later does the CCN curve depart from NNN, which occurs when a negative effect on planning and inverse kinematics becomes noticeable (at ).

Curiously, the deterioration due to noise in the genetic algorithm appears to become stable at , and even extreme noise levels (in the order of magnitude of the total arm length) do not cause further performance loss. This is different for the effect of noise on planning and inverse kinematics: performance gradually drops to 0 when noise is applied to these components of the system. So while the evolutionary subsystem is already affected by low noise levels (compared to the rest of the system), it appears not to fail completely, irrespective of how much noise is present.

The dotted lines show that filtering drastically improves performance for all three sources. With the exception of curve CCF (genetic algorithm) in the 7-DOF case (Figure 10(b)), quality stays stable up to or even beyond a noise level of 0.1.

## 5 Summary and Conclusion

The goal of this work was to improve the performance of XCSF in noisy arm control scenarios. To do so, we have introduced information content-based weighting of classifier predictions and an online estimation of the prediction error variance. To incorporate Kalman filtering, the evolving forward velocity kinematics model of XCSF was used to generate state predictions and thus filter the incoming sensory information. The performance results suggest that XCSF indeed runs into a self-delusional spiral, in which it may overfilter the incoming sensory information. With the setting of an appropriate threshold to prevent this effect, it was shown that XCSF can cope much better with sensor noise (up to 20 times higher noise levels, depending on circumstances and performance criterion).

In further investigations, the self-delusional feedback loop was confirmed as the source for the complete failure of Kalman filtering without thresholding. We identified the evolutionary algorithm as the limiting factor for performance degradation due to sensory noise. The other components of the system were found to be inherently much more noise-tolerant. By the means of thresholded Kalman-filtering, the noise-resistance of the evolutionary algorithm becomes comparable to the other components of the system. The dependency of the threshold parameter on the noise level could be derived theoretically, allowing this setting to be tuned automatically.

This finding encourages us to further explore the relation to Bayesian information processing with XCSF. We aim to derive a fitness criterion for the predictors that is inherently less deteriorated by sensor noise, promising an overall improvement of fitness guidance in XCSF.

We expect that the presented approach is also applicable to other prediction scenarios involving noisy sensors and possibly actuators. The setup, however, will generally be the same and the message of this paper appears to apply in a general sense: given that XCSF is predicting sensory information changes over time, its learning performance, its predictive capabilities, and possibly (dependent on the setup), its inverse control capabilities, can be improved by exploiting the predictive knowledge of XCSF, filtering the incoming sensory information online.

## Acknowledgments

The authors acknowledge funding from the Emmy Noether program (German Research Foundation, DFG, BU1335/3-1) and thank the Cognitive Modeling team.