Abstract
We propose a computational model for detecting and localizing instances from an object class in static gray-level images. We divide detection into visual selection and final classification, concentrating on the former: drastically reducing the number of candidate regions that require further, usually more intensive, processing, but with a minimum of computation and missed detections. Bottom-up processing is based on local groupings of edge fragments constrained by loose geometrical relationships. They have no a priori semantic or geometric interpretation. The role of training is to select special groupings that are moderately likely at certain places on the object but rare in the background. We show that the statistics in both populations are stable. The candidate regions are those that contain global arrangements of several local groupings. Whereas our model was not conceived to explain brain functions, it does cohere with evidence about the functions of neurons in V1 and V2, such as responses to coarse or incomplete patterns (e.g., illusory contours) and to scale and translation invariance in IT. Finally, the algorithm is applied to face and symbol detection.