A neural model is described of how the brain may autonomously learn a body-centered representation of a three-dimensional (3-D) target position by combining information about retinal target position, eye position, and head position in real time. Such a body-centered spatial representation enables accurate movement commands to the limbs to be generated despite changes in the spatial relationships between the eyes, head, body, and limbs through time. The model learns a vector representation—otherwise known as a parcellated distributed representation—of target vergence with respect to the two eyes, and of the horizontal and vertical spherical angles of the target with respect to a cyclopean egocenter. Such a vergence-spherical representation has been reported in the caudal midbrain and medulla of the frog, as well as in psychophysical movement studies in humans. A head-centered vergence-spherical representation of foveated target position can be generated by two stages of opponent processing that combine corollary discharges of outflow movement signals to the two eyes. Sums and differences of opponent signals define angular and vergence coordinates, respectively. The head-centered representation interacts with a binocular visual representation of nonfoveated target position to learn a visuomotor representation of both foveated and nonfoveated target position that is capable of commanding yoked eye movements. This head-centered vector representation also interacts with representations of neck movement commands to learn a body-centered estimate of target position that is capable of Commanding coordinated arm movements. Learning occurs during head movements made while gaze remains fixed on a foveated target. An initial estimate is stored and a VOR-mediated gating signal prevents the stored estimate from being reset during a gaze-maintaining head movement. As the head moves, new estimates are compared with the stored estimate to compute difference vectors which act as error signals that drive the learning process, as well as control the on-line merging of multimodal information.