Abstract

Image pattern classification is a challenging task due to the large search space of pixel data. Supervised and subsymbolic approaches have proven accurate in learning a problem’s classes. However, in the complex image recognition domain, there is a need for investigation of learning techniques that allow humans to interpret the learned rules in order to gain an insight about the problem. Learning classifier systems (LCSs) are a machine learning technique that have been minimally explored for image classification. This work has developed the feature pattern classification system (FPCS) framework by adopting Haar-like features from the image recognition domain for feature extraction. The FPCS integrates Haar-like features with XCS, which is an accuracy-based LCS. A major contribution of this work is that the developed framework is capable of producing human-interpretable rules. The FPCS system achieved 91 1% accuracy on the unseen test set of the MNIST dataset. In addition, the FPCS is capable of autonomously adjusting the rotation angle in unaligned images. This rotation adjustment raised the accuracy of FPCS to 95%. Although the performance is competitive with equivalent approaches, this was not as accurate as subsymbolic approaches on this dataset. However, the benefit of the interpretability of rules produced by FPCS enabled us to identify the distribution of the learned angles—a normal distribution around —which would have been very difficult in subsymbolic approaches. The analyzable nature of FPCS is anticipated to be beneficial in domains such as speed sign recognition, where underlying reasoning and confidence of recognition needs to be human interpretable.

1  Introduction

Images provide a rich source of information for artificial agents from object recognition to salient patterns. Historically, computer vision has been considered one of the hard applications for machine learning, primarily due to the challenges posed by the high-dimensionality of image data (Osuna et al., 1997). While many computer vision problems can be largely solved with modern supervised (as in “with ground truth data labels available”), off-line (as in “the entire dataset is available at once”) learning algorithms by generalizing over sufficiently large training sets with available ground truth, visual learning through reinforcement remains a challenging task. Furthermore, it may be necessary to examine the reasoning behind the decisions made and the confidence in those decisions, for example, speed sign recognition in autonomous driving cars.

Learning classifier systems (LCSs) have their roots in cognitive systems and are capable of dealing with simultaneous-response classification problems where they can learn general rules for complex multi-class problems. Reinforcement learning is particularly useful in dynamic and unknown scenarios where obtaining correct examples that represent all the situations that the agent may encounter is impractical (Sutton and Barto, 1998). LCSs combine genetic evolutionary operators and reinforcement learning to evolve a population of decision rules. The result is a system that enables agents to successfully learn to operate within unknown, and possibly dynamic, environments (Orriols-Puig et al., 2009; Butz et al., 2004).

This work introduces adopting LCSs for the pattern recognition domain (Kukenys et al., 2011a, 2011b) since the capabilities of LCSs have been minimally explored in that domain. LCSs have been applied to handwritten letter classification (Frey and Slate, 1991). For almost two decades (from 1991 to date) there was little work performed using LCSs since the reported results were not optimal due to available technology at that time. However, advancements in various aspects of machine learning and vision processing techniques are anticipated to lead to significant improvement in the performance of LCSs.

A major issue when adopting LCSs in the image domain is creation of conditions for rules. Traditional systems use pixel-level information to create their conditions, for example Frey and Slate (1991) used 16 numerical attributes representing primitive statistical features of the pixel distribution. Conditions that are based on pixel-level values are low-level and may not provide informative features. Moreover, such conditions do not scale well as the size of the images grows.

The first contribution of this paper is adapting LCSs to the image recognition domain. This is achieved through the development of a framework called the feature pattern classification system (FPCS). To the best of our knowledge, this is the first time that LCSs have been assimilated into the image recognition domain. In order to adjust LCSs for vision pattern recognition tasks, various modules that need amendments were identified and modified. The second contribution investigates how Haar-like features can be utilized to produce conditions in LCSs. Identifying important features in computer vision and pattern classification applications is extremely difficult due to the sparseness of patterns compared with the total number (or type) of features. We demonstrate how Haar-like features were adjusted to work with LCSs. In addition, we apply FPCS to online, dynamic situations where new classes of the problem may be introduced into the system. We demonstrate that FPCS is capable of adapting in such dynamic domains. The last contribution of this work utilizes the flexibility of the framework to include image manipulation in an analyzable format. We attempt to improve the classification rate by enabling the FPCS to autonomously adjust the rotation angle of images. The human-interpretable rules enabled the analysis of the results to simply identify the distribution of the learned angles.

The MNIST dataset (LeCun et al., 1998) was chosen as a benchmark for testing FPCS. It contains images of the real world and thus provides a realistic test problem. It is a competitive benchmark, and has been widely used for evaluating the performance of various methods on the handwritten-digit recognition problem. It serves as a standard for comparison between performance of different pattern recognition methods. We compare the result of the FPCS on the MNIST dataset against state of the art techniques published in the literature even if their aim was different, that is, pure accuracy of classification rather than online regime and human-interpretable rules.

The rest of this paper is structured as follows. Section 2 reviews different classification techniques for the pattern recognition domain. Section 3 describes the various components of the FPCS. It provides the details of LCSs and how they were integrated with Haar-like features. Moreover, it explains how LCSs have been adjusted for the image domain. Section 4 provides the details of the benchmark dataset, and performance results of the FPCS in both off-line and online scenarios. Section 5 demonstrates how the FPCS can automatically adjust to rotation angle in unaligned images. Section 6 compares the results of the FPCS to other classification methods on the same dataset. Section 7 provides the discussion and the future work, and finally Section 8 concludes the paper.

2  Related Work

Various machine learning techniques have been used in the pattern classification domain. This section briefly introduces relevant techniques and identifies their advantages and disadvantages in the pattern recognition domain.

Random forest classifiers (RFs) or randomized trees were introduced in the machine learning field by Amit and Geman (1997) and further developed by Breiman (2001). They have been applied to object recognition and classification tasks (Bosch et al., 2007a; Moosmann et al., 2008). RFs are defined as a collection of tree-like structures where each tree represents a classifier. Each tree is determined by values of a random vector that is sampled interdependently and identically for all trees. RFs offer a probabilistic output and are capable of sharing features similar to multi-class classifiers. The aforementioned features in addition to robustness (with respect to noise) of this classification method led to their applications in various supervised classification tasks (Maree et al., 2005; Deselaers et al., 2007). RFs are known to have issues such as lack of generalization and overfitting, although in terms of performance they are comparable to support vector machines (SVMs) in multi-class problems (Bosch et al., 2007b).

SVMs, or kernel methods, are a modular framework that can be used in different tasks by adjusting their kernel functions and base algorithm (Schölkopf and Smola, 2002). They have been extensively applied to the pattern classification domain. One of the main advantages of SVMs is that nonlinear decision boundaries can be learned using the so-called kernel trick (Maji et al., 2008). However, the nonlinear property adds to the complexity of the runtime. In addition, SVMs offer a fast training and classification rate, and require a substantially smaller amount of memory than methods using nonlinear kernels. This latter result is due to compact representation of the decision function (Maji et al., 2008). Thus, linear kernel SVMs have become a popular method that have been applied to several online applications (Zhang et al., 2006). They have also been applied to object recognition (Grauman and Darrell, 2005; Lazebnik et al., 2006). The SVMs results on Caltech and Pascal VOC datasets are among the best known results (Varma and Ray, 2007; Bosch et al., 2007b). A major drawback when using SVMs is that they can only classify data vectors that have fixed length. Therefore, they do not suit classification tasks dealing with variable length data. In addition, SVMs have mainly been used in supervised domains, although a number of researchers have recently tried to adopt semi-supervised and unsupervised techniques in SVMs (Zhao et al., 2009).

A large number of applications of neural networks (NNs) to pattern recognition have been developed in the past few years. These applications utilize and extend different types of neural-network architecture including multi-layer perceptron (MLP), radial basis function (RBF), self-organizing map (SOM), shared weight neural networks (LeCun et al., 1990) and probabilistic neural networks (PNNs; Musavi et al., 1994; Romero et al., 1997; Specht, 1990). Among these structures, PNNs have become a popular method for classification in various domains due to their ease of training and sound statistical foundation in Bayesian estimation theory (Mao et al., 2000). However, PNNs have a major issue with respect to determining the size of the network and the locations of pattern layer neurons. In PNNs, the pattern layer includes all training samples, of which many may be redundant. The issue of including the redundant samples leads to large network structures. Large network structures are computationally expensive since the computation required for classifying an unknown pattern is proportional to the size of the network (Mao et al., 2000). Moreover, large network structures tend to provide poor generalization in the case of unseen data (Nigrin, 1993). A number of studies have tried to address this issue, for example, Mao et al. (2000) proposed a mechanism that restricts the network size and utilizes a genetic algorithm to find the smoothing parameters.

Deep belief networks (DBNs) of restricted Boltzmann machines (RBMs) have recently been used in pattern classification. DBNs follow a hierarchical structure in which layers at the bottom of the hierarchy extract simple features and feed them to the higher layers, which then are able to detect complex features. There have been various approaches to learning deep networks (Ranzato et al., 2007; Hinton et al., 2006) and they can benefit from advances in both supervised and unsupervised learning. DBNs have been successfully used to learn high-level structures in a wide variety of domains, including handwritten digits (Larochelle et al., 2007). Although DBNs have successfully been used in controlled environments, utilizing them in realistic situations remains difficult due to high dimensionality of images. Lee et al. (2009) proposed a DBN that uses an unsupervised learning method. They adopted the approach suggested by LeCun et al. (1989) to learn features that are common across all locations in an image. The claim is that their model is capable of handling large images using only a small number of feature detectors.

SVMs and RFs do not suit our task since they generally suit supervised scenarios. In addition, LCSs allow for cooperation between rules without the fixed length of data restriction posed by SVMs (Bull et al., 2007). NNs, especially DBNs, have achieved good performance results for image classification in an unsupervised manner. However, rules produced by NNs cannot be interpreted by humans easily. Moreover, LCSs remove redundancy, which is a known issue in PNNs.

The computational overhead of evolutionary computation component of the LCSs results in slower initial off-line training (Huy et al., 2009). However, LCSs can be configured as online, reinforcement learning systems that can adapt to changes in the problem domain relatively quickly. Such features made LCSs a suitable choice for expanding the capability of techniques applied to image recognition.

3  Feature Pattern Classification System

This section introduces various components of the FPCS including the LCS and its parameters. It also describes how Haar-like features were utilized for creating classifier conditions. In addition, it explains how various components of LCSs, including covering, crossover, mutation, and condition matching, were adjusted to function with Haar-like features.

3.1  Learning Classifier System Concept

An LCS represents an agent enacting an unknown environment via a set of sensors for input and a set of effectors for actions. After observing the current state of the environment, the agent performs an action, and the environment provides a reward (Lanzi et al., 2007).

This work utilizes the XCS formulation of LCSs, which was proposed by Wilson (1995). The XCS uses accuracy-based fitness to learn the problem by forming a complete mapping of states and actions to rewards. In addition, XCS evolves more general classifiers subject to having the accuracy criterion. Thus XCS is a suitable choice for the pattern detection domain.

XCS has two modes of operation, explore and exploit. The former refers to the training period where the system explores the environment and learns through exploring examples and the latter refers to the the situations where the system selects the best rules describing the problem. The following provides more details about these modes.

In the explore mode, the system attempts to obtain information about the environment and describe it by creating decision rules. During the explore mode, the system executes the following actions:

  1. Observes the current state of the environment, .

  2. Selects classifiers from the classifier population that have conditions matching the state s, to form the match set.

  3. Performs covering: for every action in the set of all possible actions, if ai is not represented in , a random classifier is generated that matches s and advocates ai, and is added to the population.

  4. Forms a system prediction array, P for every . is a fitness weighted average of the payoff predictions of all classifiers advocating ai. The prediction array provides the system’s best estimate of the payoff for each action ai. Algorithm 1 shows how the prediction array is calculated in details.

    formula
  5. Selects an action ai to explore (probabilistically or randomly) and selects all the classifiers in that advocated ai to form the action set .

  6. Performs the action ai, recording the reward from the environment, r, and uses r to update the predictions of all classifiers in .

  7. When appropriate, runs a genetic algorithm (GA) to introduce new classifiers to the population. The GA has two operators: crossover and mutation. In XCS, two parent classifiers are selected from and two offspring are produced by applying crossover on their conditions, such that both offspring match the currently observed state. For mutation, randomly selected features of a classifier condition are mutated to maintain the diversity of population.

Each classifier has a field called numerosity. New classifiers are produced during the explore mode and their numerosity values are set to 1. Basically, when a new classifier is produced, the entire population is checked to see whether the new classifier has a similar condition and action as any existing classifier. If that is the case, then the new classifier is not added to the population and the numerosity value of the existing classifier is increased by one.

XCS may also execute subsumption after the GA generates new classifiers. The subsumption mechanism is in charge of aggregating over specific classifiers into matching and genotypically more general rules that can subsume the specific rules. In addition, subsumption has a delete method which removes classifiers that have low fitness values.

In contrast, in the exploit mode, the system does not attempt to create new rules or learn alternative rules. The system exploits its best current prediction and adapts the associated rules based on the environmental interaction.

The environment is assumed to have the Markov property, meaning that performing the same action in the same state will result in the same reward. LCSs have been shown to be robust to small amounts of noise and are often more robust than most machine learning techniques with increasing amounts of noise (Butz, 2006). The generalization property in LCSs allows a single rule to cover more than one state provided that the action-reward mapping is similar (more information about the theoretical aspects of LCSs can be found in Drugowitsch, 2008).

3.2  Image Pattern Classification Approaches

Each classifier has a condition that when satisfied results in the classifier being added to the match set. In order to create conditions when dealing with images, two approaches may be considered. These approaches are described in this section.

3.2.1  Naïve Pixel-Based Conditions

To learn compact and general models, LCSs utilize generalized condition rules in the individual classifiers. In simple ternary encoding schemes, generalizing conditions are achieved using a special don’t care symbol (#). Consider simple binary pixel black (0) and white (1) images, where every image can be encoded as a string of nine bits. To learn the distinction of images where the center pixel is white from images where the center pixel is black, two classifiers would be sufficient (see Figure 1).

Figure 1:

Simple pattern classification problem of distinguishing patterns based on the color of the center pixel from examples (left) can be solved with two classifiers (middle) using don’t care encoding (depicted as grey pixels, right).

Figure 1:

Simple pattern classification problem of distinguishing patterns based on the color of the center pixel from examples (left) can be solved with two classifiers (middle) using don’t care encoding (depicted as grey pixels, right).

Note that the two classifiers in Figure 1 are maximally accurate and general, and cover the entire problem domain. However, as soon as the problem becomes a little more complicated, generalizing at the pixel level becomes difficult. Consider learning to recognize images that have a horizontal line of three white pixels on any of the rows. Three classifiers are needed to model the positive class:
formula
and yet another 27 rules (that do not allow for three consecutive 1s in one row) are needed to fully cover the negative class. While learning such a problem is still fully possible, when useful image pattern sizes (hundreds of pixels) and more pixel states (e.g., grayscale values) are considered, there are typically thousands of example instances of every pattern class, and in turn those commonly represent only a very sparse sampling of the underlying problem, for example, all the possible images representing the object of interest. At the pixel level, the different images depicting the same object will often be so different that generalization with a don’t care pixel is not effective, and the LCSs are forced to keep one classifier for every example instance they have seen, resulting in poor pattern recognition performance.

Significant image differences at the pixel level are a well known problem in computer vision, and it is commonly tackled using some form of feature extraction. In the next section we show how Haar-like features can be used to enable LCS applications for image classification.

3.2.2  Haar-like Feature Conditions

Haar-like features are well established in image classification systems (Viola and Jones, 2001). They encode the difference between the average pixel intensities of rectangular regions of an image (see Figure 2) by utilizing a so-called integral image, where each pixel is replaced by a sum of all pixels to the left and above:
formula
The integral image can be computed with a single pass over the image I. In order to form a condition (ci), one type of Haar-like features is selected randomly from the available types presented in Figure 2. The maximum height and width for the feature is calculated according to the following:
formula
Figure 2:

Haar-like features. The feature values are computed by subtracting sums of pixel intensities in neighboring rectangular regions A, B, C, and D. Note that when applied to an image, the position and scale of the feature is important.

Figure 2:

Haar-like features. The feature values are computed by subtracting sums of pixel intensities in neighboring rectangular regions A, B, C, and D. Note that when applied to an image, the position and scale of the feature is important.

The scale is created by producing random values for and within the and . The system calculates the value of all possible features using the and , and selects the feature with the maximum contrast. The value of feature f at location and scale , can be computed with just a few lookup calls in the integral image . By applying a threshold t and comparison direction d on the outputs of the Haar-like features, binary decision rules can be formed that detect the presence (or absence) of contrast in neighboring regions in the image. We thus propose the following conditions for use in the LCS decision rules:
formula
A single Haar-like feature is insufficient for describing a complex pattern. Therefore, we utilize a messy encoding (Lanzi and Perrucci, 1999) which is not common in LCSs research. Messy encoding supports construction of multiple features by joining several feature conditions using a logical operator. This allows for the creation of complex decision rule conditions that are capable of representing the condition for various patterns:
formula

The classifier conditions must allow generalization while being accurate, meaning that a classifier must be as general as possible but not over-general. These properties allow a classifier to offer maximally general learning. We argue that the suggested Haar-like multi-feature conditions exhibit that property.

  • Generalization. The symbolic encoding gained with the don’t care symbol (#). Haar-like features achieve this by ignoring the image information outside of the feature positions and by thresholding the feature values. An extreme case of generalization (all #) can be achieved by setting a threshold on a feature such that every feasible image pattern will match.

  • Accuracy/Specificity. Every condition can be made more specific by adding more features to it. Essential for ensuring this property is the type-zero Haar-like feature introduced here that simply returns the sum of the pixel intensities within a rectangular region, which effectively enables very precise thresholding of individual pixel values if needed. An extreme case of specificity, when no generalization is possible, is a set of type-zero single pixel features that completely describe a single unique image.

In practice, LCS learning attempts to select a good trade-off between the two extremes, as it has evolutionary pressures (Butz, 2006) for both accuracy and generalization, and the Haar-like multi-feature conditions provide sufficient flexibility for the search along this front, as the experimental results suggest.

3.3  Adjusting LCSs for Image Patterns

In order to use LCSs with images, several adjustments must be made to the standard XCS implementation. These changes include modifying the covering, crossover, mutation, and condition matching components of the LCSs. The following describes the details of these changes.

3.3.1  Covering

When performing covering, a random number of features were generated for the condition, randomly selecting feature type, position, scale, and direction, but setting the threshold to the current value of the feature, ensuring that the condition matches the currently observed state.

3.3.2  Crossover

During uniform crossover of two classifiers, individual feature conditions are moved between the classifiers with equal probability. Since all the features in both classifiers had to match the observed state in order to be selected to the action set, the resulting children will also match the current instance from the environment.

3.3.3  Mutation

Every property of every feature was allowed to mutate randomly, except for the thresholds, where after mutation each threshold value was adjusted to match the observed state if needed. The idea here is that eventually classifier rules would emerge with thresholds for feature values that cover related groups of problem instances.

3.3.4  Condition Validation

In the cases where mutation moved the feature window to an infeasible region in the image, the offspring was subsumed by the parent classifier by increasing the numerosity of the latter.

4  Evaluating FPCS

This section provides the details of the benchmark dataset and experiments. Several experiments have been executed to investigate whether the FPCS approach is feasible and maintains the benefits of LCSs. These benefits include human-interpretable rules, where the value of the rules can be identified by assessing the statistical parameters associated with each classifier, for example, experience, fitness, prediction.

4.1  Dataset

We selected the problem of handwritten digit classification to test our proposed FPCS with Haar-like features. The MNIST dataset has widely been used by the research community (LeCun et al., 1998; Lee et al., 2009; Bernard et al., 2009; Jarrett et al., 2009). It contains two different sets of data for training and testing. The training dataset includes 60,000 example images of all ten handwritten digits , collected from individuals. The test set contains 10,000 examples written by different individuals. The examples are presented as pixel grayscale (pixel intensity values ranging ) images, centered around the pixel intensity center of mass. The proposed system does not utilize human constructed preprocessing of the training data, where preprocessing of the image is known to improve results (LeCun et al., 1998). In many papers, this problem has been considered sufficiently complex and studied thoroughly (Bernard et al., 2009; Larochelle et al., 2007; Lee et al., 2009; Ciresan et al., 2011).

4.2  Implementation Details

We have used an implementation of XCS based on the XCSJava project developed by Butz (2006). The code was adjusted to work with image patterns. The adjustments were mainly performed on the following components.

4.2.1  Features

Six types of Haar-like features were designed: a single rectangle sum (type-zero feature that is not known to be used in other Haar-related approaches), a two rectangle difference (horizontal, vertical), a three rectangle difference (horizontal, vertical), and a four rectangle difference feature (see Figure 2).

4.2.2  Messy Encoding

Figure 3 shows a histogram of the number of features in classifiers when the classifier limit was set to features in a separate trial, indicating that most of the classifiers had or fewer features, with majority having between three and six. Thus we allowed each classifier to have a random number of up to eight features, with each condition having to exceed the feature threshold when applied to the image for the classifier condition to match. During mutation, the feature list in the classifier was allowed to shrink or grow (adding a new feature based on the currently observed example) with a small probability of .

Figure 3:

Distribution of size of features in classifiers under messy encoding.

Figure 3:

Distribution of size of features in classifiers under messy encoding.

4.2.3  XCS Inherited Parameters

The system uses the following parameter values, as defined in XCS (Butz, 2006): fitness fall-off rate ; prediction error threshold ; fitness exponent ; learning rate ; threshold for GA application in the action set ; experience threshold for classifier deletion ; fraction of mean fitness for deletion ; classifier experience threshold for subsumption ; crossover probability ; mutation probability (significantly higher than typical XCS applications due to a large variability in the Haar-like features); tournament selection fraction .

4.2.4  Population and Generations

The population size was limited to 60,000 classifiers, and the experiments were run for different numbers of generations specified separately for each experiment. One generation represents one instance of the problem (one state message).1 While the population size may seem large for the problem at hand, it is common for LCSs. Figure 4 shows performance of the partial evaluation of the population after a trial run. It demonstrates that the fittest classifiers are responsible for over of the performance.

Figure 4:

Classification performance using a part of the population sorted by fitness.

Figure 4:

Classification performance using a part of the population sorted by fitness.

All the experiments were repeated times, and the reported numbers are averages, with standard deviation where applicable.

4.3  Measuring FPCS Accuracy

This experiment was designed to demonstrate the recognition performance of the FPCS in an off-line scenario. The FPCS was executed using the training data of the MNIST dataset for 4,000,000 generations requiring 15–20 hours for each of the 30 runs. Figure 5 shows the behavior of the classification performance on the training data and relative population size. As Figure 5 shows, the system reaches 80% accuracy relatively quickly (after 500,000 generations) but it requires more time to reach 90% accuracy. During this period, the system combines more specific rules into more general and accurate rules (micro classifiers vs. macro classifiers). As can be interpreted from the diagram, the population of rules reaches 42,000 (70% of the 60,000 limit) at its peak but then drops to 33,000 (55% of the 60,000 limit) at the end of the experiment. This is due to macro classifiers that have numerosity greater than one being formed by the system. Once the rules are learned, then this system can classify in real time. We applied the learned rules of this experiment to the MNIST test data. The FPCS achieved a overall classification rate on the unseen test set in less than a minute.

Figure 5:

Performance of the LCSs with error bars on 30 repetitions as measured internally on the training set, and average population size (number of unique classifiers compared with 60,000 limit).

Figure 5:

Performance of the LCSs with error bars on 30 repetitions as measured internally on the training set, and average population size (number of unique classifiers compared with 60,000 limit).

4.4  Human-Interpretable Rules

Figure 6 shows example classifiers learned in the previous experiment as feature images. It must be noted that some classifier conditions are intuitively interpretable and target the regions of high contrast where the curves of handwritten digits will consistently pass through, while others are harder to interpret yet are useful to the system due to their collaborative nature.

Figure 6:

Example digit images and learned matching Haar-like feature classifiers. While some features intuitively cover regions where lines in the image would pass, others are harder to interpret.

Figure 6:

Example digit images and learned matching Haar-like feature classifiers. While some features intuitively cover regions where lines in the image would pass, others are harder to interpret.

Figure 7 shows an example of the rules learned by the FPCS. Here Exp represents the experience of the classifiers. The experience refers to the number of times that a rule has been used. N is the numerosity, F is the fitness, and P is the classifier’s prediction. The rule condition includes HAAR types (Haar-like features described in Figure 2) followed by their position, scale, and direction. The first rule (r1) has been used and generated 21 and 19 times. Therefore, it represents a relatively generic and experienced rule. The high fitness value in combination with high experience and numerosity suggests that this rule is a general and accurate rule. The next rule, r2, is less experienced and generic than r1 but still very accurate (F0.8101). The high number of features can be the reason for the low experience as it requires all thresholds to be reached to match. The r3 is an experienced and relatively generic rule but not very accurate. One can infer that the condition of this rule may be over-general and therefore may very occasionally match different digit classes. The last rule, r4, presents an extremely experienced and generic rule due to high experience coupled with high prediction.

Figure 7:

A sample of the learned rules by FPCS. It shows the different values associated with each rule. Here Exp is the experience, N is the numerosity, F is the fitness, and P is the classifier’s prediction (note that the range for P is 0–1,000, which is the standard LCS approach for reinforcement learning rewards).

Figure 7:

A sample of the learned rules by FPCS. It shows the different values associated with each rule. Here Exp is the experience, N is the numerosity, F is the fitness, and P is the classifier’s prediction (note that the range for P is 0–1,000, which is the standard LCS approach for reinforcement learning rewards).

4.5  Examining FPCS in an Online Scenario

This experiment was designed to demonstrate the online learning capability of the FPCS. In this experiment, the agent must adapt to dynamic situations where new classes of the problem are introduced to the system during the runtime. An instance of the FPCS was executed where initially only two of digit classes, 0 and 1, which are learned easily, were introduced to the system. The subsequent digit classes were added to the training every 200,000 generations. Figure 8 shows the performance of the FPCS. The online nature of LCSs enabled the system to partially recover from the performance drop effected by unseen classes of examples.

Figure 8:

Online learning with LCSs. In the first half of the training, a new digit class (sequentially 0 to 9) is introduced every 200,000 generations.

Figure 8:

Online learning with LCSs. In the first half of the training, a new digit class (sequentially 0 to 9) is introduced every 200,000 generations.

5  Automatic Adjustment of Rotation Angle in Unaligned Images

Although the performance of the FPCS is reasonably good (around 91%), it does not reach the performance of some other benchmarks (for a comparison with other techniques refer to Table 1, discussed in Section 6). We hypothesized that since most people write on a slant, being able to autonomously learn to adapt to an image angle would improve the performance. In order to test this hypothesis, the angle was modeled as a precondition in the classifiers’ rules so when a classifier condition is being examined for an image, the image is rotated with the amount specified in the precondition and then the extracted feature condition is compared with the classifier’s rule condition.

Table 1:
Comparison of a number of classification techniques. The supervised methods only suit off-line scenarios while unsupervised methods can be employed in both online and off-line scenarios.
DepictionSystemMethodError rateLearning
Subsymbolic (LeCun et al., 19981-layer NNs 12% Supervised 
 (Larochelle et al., 2007polynomial SVMs 3.69 0.17% Supervised 
 (Lee et al., 2009DBNs 0.80% Unsupervised 
 (Ciresan et al., 2011Convolutional nets 0.27 0.02% Unsupervised 
Haar (Fleuret and Sahbi, 2003SVMs 3.93% Supervised 
 (Casagrande, 2005AdaBoost 1.31% Supervised 
 FPCS original LCSs Reinforcement 
 FPCS-learning angle LCSs Reinforcement 
DepictionSystemMethodError rateLearning
Subsymbolic (LeCun et al., 19981-layer NNs 12% Supervised 
 (Larochelle et al., 2007polynomial SVMs 3.69 0.17% Supervised 
 (Lee et al., 2009DBNs 0.80% Unsupervised 
 (Ciresan et al., 2011Convolutional nets 0.27 0.02% Unsupervised 
Haar (Fleuret and Sahbi, 2003SVMs 3.93% Supervised 
 (Casagrande, 2005AdaBoost 1.31% Supervised 
 FPCS original LCSs Reinforcement 
 FPCS-learning angle LCSs Reinforcement 

5.1  Automatically Learning Angles of Images

We performed an experiment that enables classifiers to automatically learn the angles of images. When making the covering match set the system allowed for constructing classifiers with angles between to in increments. Mutation and crossover were allowed to occur on the precondition, but it was always implemented so not subject to the match method. The experiment was run for 12,000,000 generations requiring 10 to 12 days for each of the 30 runs. The number of generations is higher compared to the previous experiments since learning the angle increases the complexity of the problem. Therefore, the agent requires more time for discovering optimal rules. Rotating the images prior to classification is also time-consuming. Figure 9 shows the behavior of the FPCS in an off-line scenario when the agent autonomously learned classifiers’ angles. The system reached 99% accuracy on the training set after 1,000,000 generations and continued performing at the same level until the end of the experiment (12,000,000 generations in total). The rules produced after 12,000,000 generations, achieved 95% accuracy on the MNIST test data (the decrease from 99% training performance indicates a lack of generalization, which was possibly caused by overfitting).

Figure 9:

Performance of the FPCS when learning the angles for classifiers on 30 repetitions measured internally on the training set.

Figure 9:

Performance of the FPCS when learning the angles for classifiers on 30 repetitions measured internally on the training set.

The original implementation of FPCS was executed for 12,000,000 generations, so the accuracy of the rotation enabled FPCS can be compared with the results of the original FPCS. The original FPCS achieved 94% accuracy after 12,000,000 generations. This suggests that original Haar-like features can cope with rotation and that additional rotation does improve test performance slightly. In addition, the slight overfitting observed in Figure 9 is the result of imprecise rotation of features.

5.2  Distribution of Classifiers’ Angles

The interpretable rules enables humans to gain an insight into the problem by analyzing the rules. If it is assumed that the majority of the MNIST subjects were right-handed and write left-to-right, then a skew in the distribution of angles could have been expected. However, the result shows that this assumption does not hold in the MNIST dataset.

Figure 10 shows the distribution of angles learned by classifiers after 12,000,000 generations. Our analysis of the distribution angles reveals a normal distribution of the learned angles around (upright), which is interesting as the LCSs could have selected angles in the the range to to rotate the images so the distribution could have been any distribution, for example, flat. Therefore, the FPCS is capable of automatically learning the distribution of handwritten letter angles.

Figure 10:

Normal distribution of angles learned by the FPCS after 12,000,000 generations.

Figure 10:

Normal distribution of angles learned by the FPCS after 12,000,000 generations.

5.3  Wrapper-Based Approach to Learning Angles

A wrapper-based approach for learning classifiers’ angles was also implemented with the hope of producing the best classifiers more quickly. In this approach, a wrapper was developed for the classifier rules produced by the original implementation of the FPCS (2,500 rules). Since those rules were the final product of the FPCS, the genetic component of the wrapper, which performs mutation and crossover on the rules’ conditions, was turned off. In addition, the preconditions (with GA) were added so that the optimum preconditions (angles) can be learned.

We doubled the rules and assigned a angle to the introduced precondition of half of the classifiers and a normal distribution of angles in the range of to to the rest of the classifiers. The system was run for another generations on the training set. Using this approach, only 91% accuracy was achieved on the test set. This two-stage wrapper approach could have been more effective if the initial training were on upright images only. It is considered that Haar-like feature condition rules can alter the angles in an image, so rotating those rules would not be beneficial. Instead, generating features and associated rotation concurrently produced better results.

5.4  Testing the Impact of the Learning on FPCS

In order to demonstrate how learning angles for classifiers impacted the FPCS ability in digit recognition in situations where digits are written on an angle, DigitApp was developed. DigitApp is a human interface that interacts with rules. It allows users to draw a digit on a frame, and uses classifiers produced by the two versions of the FPCS (the original FPCS and FPCS that automatically learns angles). It is noted that once the classification has been learned, the rules classify in real-time.

Figures 11 and 12 show a snapshot of the DigitApp. Each instance of the Digit-App shows two tables: original and angled. The original table shows the system digit recognition values when using the original FPCS, and the angled table shows the same values using FPCS that learned angles for classifiers. The first column of each table contains the digits ordered by the confidence values, and the second column includes the confidence values calculated by the systems.

Figure 11:

A snapshot of the DigitAPP. The first column displays the predicted number and the second column shows the learned prediction (out of 1,000).

Figure 11:

A snapshot of the DigitAPP. The first column displays the predicted number and the second column shows the learned prediction (out of 1,000).

Figure 12:

A snapshot of the DigitAPP. It compares the recognition rate of the original FPCS and the FPCS that learned the angles in a situation where the digit is rotated by to the right.

Figure 12:

A snapshot of the DigitAPP. It compares the recognition rate of the original FPCS and the FPCS that learned the angles in a situation where the digit is rotated by to the right.

Figure 11 shows how the system recognizes the digit 7 when the digit is written upright. In that position, both systems recognize the digit with high confidence values. The image was then rotated by . As Figure 12 shows, the confidence values for both systems dropped. The original FPCS has not been trained to learn angles and therefore the fall in confidence value is expected. The drop in confidence values of new system (angled) is due to increase in the complexity of the problem (the system had to learn angles). Despite the drop in confidence values, the FPCS that used active angle learning is more accurate at recognizing the rotated 7 than the original FPCS. In fact, this pattern also holds for the other digits (classes). The ensemble nature of LCSs is shown as the confidence of the classes is evident, which could be beneficial in problem domains where decisions are based on the digit recognition, for example, speed sign recognition.

6  Comparison of Various Classification Techniques

Table 1 shows the comparison of the proposed method to the performance of other known systems on the MNIST dataset. According to this table, subsymbolic approaches generally perform well on the MNIST dataset. However, it must be noted that most of these methods are supervised and therefore can only be applied to off-line scenarios. DBNs and convolutional networks are two examples of the subsymbolic methods that have achieved high accuracy and can be employed in unsupervised scenarios. However, these methods do not produce human-interpretable rules, and the knowledge built by the system is not readily available for human interpretation.

The FPCS utilizes reinforcement learning and therefore it is suited to online scenarios. In addition, the original FPCS, and the version that automatically adjusts the rotation angle, take 15–20 hr, and 10-12 days on a single machine for training. Moreover, the FPCS offers human-analyzable rules. None of the other systems mentioned in the table offer this property. Human-interpretable rules are particularly useful when dealing with more complex problems where in-depth knowledge of the system is necessary for better understanding of the problem.

7  Discussion and Future Work

This work sought to adapt a technique that has hypothesized benefits to a domain, rather than achieving the highest classification accuracy regardless of other constraints and objectives. It focused on adapting LCSs to the image recognition task, which has several advantages including human-interpretable rules. This feature enabled the interpretation of the statistics and values associated with the classifiers. The statistical values are an inherent attribute of the LCS technique and are extremely useful in advancing the understanding of the problem and learned solutions.

The investigation revealed that LCS deliver on the promise to form a generalizing model using human-interpretable rules. The current Haar-like multi-features approach learns descriptions of patterns that are comparable to other Haar related approaches, such as the AdaBoost cascade (Viola and Jones, 2001). AdaBoost algorithms create a set of classifiers by maintaining a set of weights over the training set and adjusting the weights after each boosting iteration. They are capable of creating generalized rules. However, accuracy dilemma is known in AdaBoost, which means that as the accuracy of the two classifiers increases, there is less chance that they will disagree (Dietterich, 2000; Li et al., 2008). This impacts the balance between accuracy and diversity since AdaBoost can only present good generalization performance when there is a balance between accuracy and diversity.

The ability of the FPCS in an online learning scenario was demonstrated as it could adapt to newly introduced classes. In online learning methods, the system automatically identifies the changes in the environment state and therefore there is not a need for human operators to determine when supervised off-line training must be resumed.

Several studies that have achieved high performance results on the MNIST dataset have preprocessed images so they can benefit from image realignment. It was studied how the FPCS system can benefit from autonomously learning variation factors. We specifically studied learning the angles of images by adding a precondition to the classifier rules and showed that it improved the accuracy of image classification. The implementation of DigitApp demonstrated how the system recognition for determining the class images written on a slant was improved.

The FPCS takes 15–20 hr to be trained. This was increased to 10 to 12 days when the complexity of the problem was increased as the system adapted to image angles. If the application domain is known in advance, then this slow training is acceptable since the evolved rules execute very fast (ms). However, in online applications with novelty, for example, learning features in a new disaster zone, this technique would not be appropriate.

Further extensions to the FPCS will enable the system to be adapted for online scenarios, such as real-time speed sign recognition. Furthermore, other image feature types should be explored to determine which ones can be most effectively used with LCSs. Finally, this foundation work has enabled future versions to be able to include scaling and transportation.

Although the aim of this work was to develop human-interpretable methods and analysis for visual pattern recognition, it is worth considering how to improve the performance for the given dataset. A supervised version of LCS, for example, based on Bernadó-Mansilla and Garrell-Guiu (2003), could be implemented, which would have the advantage of the ability to repair incorrect rules to the known class or cover gaps in knowledge. It would still have the advantage of human-interpretable rules, but it would be unlikely to have speed advantages (evolutionary computation is often slower if compared with subsymbolic approaches). Many of the high-performing methods use the ‘kernel trick’ to alter the featured dimensions, which may prove an interesting avenue for the feature manipulation in LCSs.

8  Conclusions

This work investigated how LCSs can be adjusted to work in the pattern recognition domain. Our investigations shows that the LCS technique can be successfully applied to the field of pattern classification and demonstrates novel functionality and promising results. The generalization capability of the LCS in combination with the messy encoding enabled formation of compact, accurate, and general classifiers.

The human-interpretable nature of production rules, which is anticipated to be required in many real-world domains, was assisted by flexible encoding. This feature assists humans in gaining in-depth knowledge of systems. In addition, FPCS was enabled to automatically adjust the rotation angle in unaligned images. Automatic rotation alignment improved recognition accuracy of the FPCS on the MNIST dataset. This foundation work has enabled future versions to be able to include scaling and transportation. In order to further improve the classification rate of FPCS, more high-level features capable of capturing crucial features, such as curves and angles, may prove useful. The transfer of knowledge from off-line learning in similar domains to online scenarios will be investigated to leverage FPCS’ online learning abilities while mitigating long training times.

References

Amit
,
Y.
, and
Geman
,
D.
(
1997
).
Shape quantization and recognition with randomized trees
.
Neural Computation
,
9
(
7
):
1545
1588
.
Bernadó-Mansilla
,
E.
, and
Garrell-Guiu
,
J. M.
(
2003
).
Accuracy-based learning classifier systems: Models, analysis and applications to classification tasks
.
Evolutionary Computation
,
28
(
3
):
209
238
.
Bernard
,
S.
,
Heutte
,
L.
, and
Adam
,
S.
(
2009
).
Towards a better understanding of random forests through the study of strength and correlation
. In
D.-S. Huang, K.-H. Jo, H.-H. Lee, H.-J. Kang, and V. Bevilacqua
(Eds.),
Emerging Intelligent Computing Technology and Applications
,
Vol. 5755 (pp. 536–545). Berlin
:
Springer-Verlag
.
Bosch
,
A.
,
Zisserman
,
A.
, and
Munoz
,
X.
(
2007a
).
Representing shape with a spatial pyramid kernel
. In
Proceedings of the ACM International Conference on Image and Video Retrieval
, pp.
401
408
.
Bosch
,
A.
,
Zisserman
,
A.
, and
Munoz
,
X
. (
2007b
).
Image classification using random forests and ferns
. In
Proceedings of the IEEE International Conference on Computer Vision
, pp.
1
8
.
Breiman
,
L.
(
2001
).
Random forests
.
Machine Learning
,
45
(
1
):
5
32
.
Bull
,
L.
,
Studley
,
M.
,
Bagnall
,
A.
, and
Whittley
,
I.
(
2007
).
Learning classifier system ensembles with rule-sharing
.
IEEE Transactions on Evolutionary Computation
,
11
(
4
):
496
502
.
Butz
,
M. V.
(
2006
).
Rule-based evolutionary online learning systems: A principled approach to LCS analysis and design.
Berlin
:
Springer-Verlag
.
Butz
,
M. V.
,
Kovacs
,
T.
,
Lanzi
,
P. L.
, and
Wilson
,
S. W.
(
2004
).
Toward a theory of generalization and learning in XCS
.
IEEE Transactions on Evolutionary Computation
,
8
(
1
):
28
46
.
Casagrande
,
N.
(
2005
).
Automatic music classification using boosting algorithms and auditory features
.
MS thesis, Computer and Operational Research Department, University of Montreal
.
Ciresan
,
D. C.
,
Meier
,
U.
,
Gambardella
,
L. M.
, and
Schmidhuber
,
J
. (
2011
).
Convolutional neural network committees for handwritten character classification
. In
Proceedings of the International Conference on Document Analysis and Recognition
, pp.
1135
1139
.
Deselaers
,
T.
,
Criminisi
,
A.
,
Winn
,
J.
, and
Agarwal
,
A
. (
2007
).
Incorporating on-demand stereo for real time recognition
. In
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
, pp.
1
8
.
Dietterich
,
T. G.
(
2000
).
An experimental comparison of three methods for constructing ensembles of decision trees: Bagging, boosting, and randomization
.
Machine Learning
,
40
(
2
):
139
157
.
Drugowitsch
,
J.
(
2008
).
Design and analysis of learning classifier systems: A probabilistic approach.
Berlin
:
Springer
.
Fleuret
,
F.
, and
Sahbi
,
H
. (
2003
).
Scale-invariance of support vector machines based on the triangular kernel
. In
Proceedings of the International Workshop on Statistical and Computational Theories of Vision
.
Frey
,
P.
, and
Slate
,
D.
(
1991
).
Letter recognition using Holland-style adaptive classifiers
.
Machine Learning
,
6
(
2
):
161
182
.
Grauman
,
K.
, and
Darrell
,
T
. (
2005
).
The pyramid match kernel: Discriminative classification with sets of image features
. In
Proceedings of the IEEE International Conference on Computer Vision
, Vol.
2
, pp.
1458
1465
.
Hinton
,
G. E.
,
Osindero
,
S.
, and
Teh
,
Y.-W.
(
2006
).
A fast learning algorithm for deep belief nets
.
Neural Computation
,
18
(
7
):
1527
1554
.
Huy
,
N. Q.
,
Soon
,
O. Y.
,
Hiot
,
L. M.
, and
Krasnogor
,
N.
(
2009
).
Adaptive cellular memetic algorithms
.
Evolutionary Computation
,
17
(
2
):
231
256
.
Jarrett
,
K.
,
Kavukcuoglu
,
K.
,
Ranzato
,
M.
, and
LeCun
,
Y.
(
2009
).
What is the best multi-stage architecture for object recognition?
In
Proceedings of the IEEE International Conference on Computer Vision
, pp.
2146
2153
.
Kukenys
,
I.
,
Browne
,
W. N.
, and
Zhang
,
M
. (
2011a
).
Confusion matrices for improving performance of feature pattern classifier systems
. In
Proceedings of the Genetic and Evolutionary Computation Conference
, pp.
181
182
.
Kukenys
,
I.
,
Browne
,
W. N.
, and
Zhang
,
M
. (
2011b
).
Transparent, online image pattern classification using a learning classifier system
. In
Proceedings of the International Conference on Applications of Evolutionary Computation
, pp.
183
193
.
Lanzi
,
P. L.
,
Loiacono
,
D.
,
Wilson
,
S. W.
, and
Goldberg
,
D. E.
(
2007
).
Generalization in the XCSF classifier system: Analysis, improvement, and extension
.
Evolutionary Computation
,
15
(
2
):
133
168
.
Lanzi
,
P. L.
, and
Perrucci
,
A.
(
1999
).
Extending the representation of classifier conditions. Part ii: From messy coding to s-expressions
. In
W. Banzhaf et al. (Eds.)
,
Proceedings of the Genetic and Evolutionary Computation Conference
, Vol.
1
, pp.
345
352
.
Larochelle
,
H.
,
Erhan
,
D.
,
Courville
,
A.
,
Bergstra
,
J.
, and
Bengio
,
Y
. (
2007
).
An empirical evaluation of deep architectures on problems with many factors of variation
. In
Proceedings of the International Conference on Machine Learning
, pp.
473
480
.
Lazebnik
,
S.
,
Schmid
,
C.
, and
Ponce
,
J
. (
2006
).
Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories
. In
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
, Vol.
2
, pp.
2169
2178
.
LeCun
,
Y.
,
Boser
,
B.
,
Denker
,
J. S.
,
Henderson
,
D.
,
Howard
,
R. E.
,
Hubbard
,
W.
, and
Jackel
,
L. D.
(
1989
).
Backpropagation applied to handwritten zip code recognition
.
Neural Computation
,
1
(
4
):
541
551
.
LeCun
,
Y.
,
Bottou
,
L.
,
Bengio
,
Y.
, and
Haffner
,
P.
(
1998
).
Gradient-based learning applied to document recognition
.
Proceedings of the IEEE
,
86
(
11
):
2278
2324
.
LeCun
,
Y.
,
Matan
,
O.
,
Boser
,
B.
,
Denker
,
J.
,
Henderson
,
D.
,
Howard
,
R.
, …
Baird
,
H
. (
1990
).
Handwritten zip code recognition with multilayer networks
. In
Proceedings of the International Conference on Pattern Recognition
, Vol.
2
, pp.
35
40
.
Lee
,
H.
,
Grosse
,
R.
,
Ranganath
,
R.
, and
Ng
,
A. Y
. (
2009
).
Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations
. In
Proceedings of the International Conference on Machine Learning
, pp.
609
616
.
Li
,
X.
,
Wang
,
L.
, and
Sung
,
E.
(
2008
).
AdaBoost with SVM-based component classifiers
.
Engineering Applications of Artificial Intelligence
,
21
(
5
):
785
795
.
Maji
,
S.
,
Berg
,
A.
, and
Malik
,
J
. (
2008
).
Classification using intersection kernel support vector machines is efficient
. In
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
, pp.
1
8
.
Mao
,
K.
,
Tan
,
K.-C.
, and
Ser
,
W.
(
2000
).
Probabilistic neural-network structure determination for pattern classification
.
IEEE Transactions on Neural Networks
,
11
(
4
):
1009
1016
.
Maree
,
R.
,
Geurts
,
P.
,
Piater
,
J.
, and
Wehenkel
,
L
. (
2005
).
Random subwindows for robust image classification
. In
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
, Vol.
1
, pp.
34
40
.
Moosmann
,
F.
,
Nowak
,
E.
, and
Jurie
,
F.
(
2008
).
Randomized clustering forests for image classification
.
IEEE Transactions on Pattern Analysis and Machine Intelligence
,
30
(
9
):
1632
1646
.
Musavi
,
M.
,
Chan
,
K.
,
Hummels
,
D.
, and
Kalantri
,
K.
(
1994
).
On the generalization ability of neural network classifiers
.
IEEE Transactions on Pattern Analysis and Machine Intelligence
,
16
(
6
):
659
663
.
Nigrin
,
A.
(
1993
).
Neural networks for pattern recognition.
Cambridge, MA
:
MIT Press
.
Orriols-Puig
,
A.
,
Bernadó-Mansilla
,
E.
,
Goldberg
,
D.
,
Sastry
,
K.
, and
Lanzi
,
P.
(
2009
).
Facetwise analysis of XCS for problems with class imbalances
.
IEEE Transactions on Evolutionary Computation
,
13
(
5
):
1093
1119
.
Osuna
,
E.
,
Freund
,
R.
, and
Girosit
,
F
. (
1997
).
Training support vector machines: An application to face detection
. In
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
, pp.
130
136
.
Ranzato
,
M.
,
Huang
,
F. J.
,
Boureau
,
Y.-L.
, and
LeCun
,
Y
. (
2007
).
Unsupervised learning of invariant feature hierarchies with applications to object recognition
. In
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
, pp.
1
8
.
Romero
,
R. D.
,
Touretzky
,
D. S.
, and
Thibadeau
,
R. H.
(
1997
).
Optical Chinese character recognition using probabilistic neural networks
.
Pattern Recognition
,
30
(
8
):
1279
1292
.
Schölkopf
,
B.
, and
Smola
,
A.
(
2002
).
Learning with kernels: Support vector machines, regularization, optimization, and beyond
.
Cambridge, MA
:
MIT Press
.
Specht
,
D. F.
(
1990
).
Probabilistic neural networks
.
Neural Networks
,
3
(
1
):
109
118
.
Sutton
,
R.
, and
Barto
,
A.
(
1998
).
Reinforcement learning: An introduction
(Vol.
28
).
Cambridge, UK
:
Cambridge University Press
.
Varma
,
M.
, and
Ray
,
D
. (
2007
).
Learning the discriminative power-invariance trade-off
. In
Proceedings of the IEEE International Conference on Computer Vision
, pp.
1
8
.
Viola
,
P.
, and
Jones
,
M.
(
2001
).
Rapid object detection using a boosted cascade of simple features
. In
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
, Vol.
1
, pp.
511
518
.
Wilson
,
S. W.
(
1995
).
Classifier fitness based on accuracy
.
Evolutionary Computation
,
3
(
2
):
149
175
.
Zhang
,
J.
,
Marszalek
,
M.
,
Lazebnik
,
S.
, and
Schmid
,
C
. (
2006
).
Local features and kernels for classification of texture and object categories: A comprehensive study
. In
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
, pp.
93
100
.
Zhao
,
K.
,
Liu
,
Y.
, and
Deng
,
N
. (
2009
).
Unsupervised and semi-supervised Lagrangian support vector machines with polyhedral perturbations
. In
Proceedings of the Third International Symposium on Intelligent Information Technology Application
, Vol.
1
, pp.
228
231
.

Note

1

Generations in LCSs are different from many other forms of evolutionary computation where in each generation all examples are evaluated.