A learning account for the problem of object recognition is developed within the probably approximately correct (PAC) model of learnability. The key assumption underlying this work is that objects can be recognized (or discriminated) using simple representations in terms of syntactically simple relations over the raw image. Although the potential number of these simple relations could be huge, only a few of them are actually present in each observed image, and a fairly small number of those observed are relevant to discriminating an object.
We show that these properties can be exploited to yield an efficient learning approach in terms of sample and computational complexity within the PAC model. No assumptions are needed on the distribution of the observed objects, and the learning performance is quantified relative to its experience. Most important, the success of learning an object representation is naturally tied to the ability to represent it as a function of some intermediate representations extracted from the image.
We evaluate this approach in a large-scale experimental study in which the SNoW learning architecture is used to learn representations for the 100 objects in the Columbia Object Image Library. Experimental results exhibit good generalization and robustness properties of the SNoW-based method relative to other approaches. SNoW's recognition rate degrades more gracefully when the training data contains fewer views, and it shows similar behavior in some preliminary experiments with partially occluded objects.