Boosting is a general method for improving the performance of learning algorithms. A recently proposed boosting algorithm, Ada Boost, has been applied with great success to several benchmark machine learning problems using mainly decision trees as base classifiers. In this article we investigate whether Ada Boost also works as well with neural networks, and we discuss the advantages and drawbacks of different versions of the Ada Boost algorithm. In particular, we compare training methods based on sampling the training set and weighting the cost function. The results suggest that random resampling of the training data is not the main explanation of the success of the improvements brought by Ada Boost. This is in contrast to bagging, which directly aims at reducing variance and for which random resampling is essential to obtain the reduction in generalization error. Our system achieves about 1.4% error on a data set of on-line handwritten digits from more than 200 writers. A boosted multilayer network achieved 1.5% error on the UCI letters and 8.1% error on the UCI satellite data set, which is significantly better than boosted decision trees.
We present a new classification architecture based on autoassociative neural networks that are used to learn discriminant models of each class. The proposed architecture has several interesting properties with respect to other model-based classifiers like nearest-neighbors or radial basis functions: it has a low computational complexity and uses a compact distributed representation of the models. The classifier is also well suited for the incorporation of a priori knowledge by means of a problem-specific distance measure. In particular, we will show that tangent distance (Simard, Le Cun, & Denker, 1993) can be used to achieve transformation invariance during learning and recognition. We demonstrate the application of this classifier to optical character recognition, where it has achieved state-of-the-art results on several reference databases. Relations to other models, in particular those based on principal component analysis, are also discussed.