Abstract
A new type of biologically inspired multilayered network is proposed to model the properties of the primate visual system with respect to invariant visual recognition (IVR). This model is based on 10 major neurobiological and psychological constraints. The first five constraints shape the architecture and properties of the network.
1. The network model has a Y-like double-branched multilayered architecture, with one input (the retina) and two parallel outputs, the “What” and the “Where,” which model, respectively, the temporal pathway, specialized for “object” identification, and the parietal pathway specialized for “spatial” localization.
2. Four processing layers are sufficient to model the main functional steps of primate visual system that transform the retinal information into prototypes (object-centered reference frame) in the “What” branch and into an oculomotor command in the “Where” branch.
3. The distribution of receptive field sizes within and between the two functional pathways provides an appropriate tradeoff between discrimination and invariant recognition capabilities.
4. The two outputs are represented by a population coding: the ocular command is computed as a population vector in the “Where” branch and the prototypes are coded in a “semidistributed” way in the “What” branch. In the intermediate associative steps, processing units learn to associate prototypes (through feedback connections) to component features (through feedforward ones).
5. The basic processing units of the network do not model single cells but model the local neuronal circuits that combine different information flows organized in separate cortical layers.
Such a biologically constrained model shows shift-invariant and size-invariant capabilities that resemble those of humans (psychological constraints):
6. During the Learning session, a set of patterns (26 capital letters and 2 geometric figures) are presented to the network: a single presentation of each pattern in one position (at the center) and with one size is sufficient to learn the corresponding prototypes (internal representations).
These patterns are thus presented in widely varying new sizes and positions during the Recognition session:
7. The “What” branch of the network succeeds in immediate recognition for patterns presented in the central zone of the retina with the learned size.
8. The recognition by the “What” branch is resistant to changes in size within a limited range of variation related to the distribution of receptive field (RF) sizes in the successive processing steps of this pathway.
9. Even when ocular movements are not allowed, the recognition capabilities of the “What” branch are unaffected by changing positions around the learned one. This significant shift-invariance of the “What” branch is also related to the distribution of RF sizes.
10. When varying both sizes and locations, the “What” and the “Where” branches cooperate for recognition: the location coding in the “Where” branch can command, under the control of the “What” branch, an ocular movement efficient to reset peripheral patterns toward the central zone of the retina until successful recognition.
This model results in predictions about anatomical connections and physiological interactions between temporal and parietal cortices.