Abstract

Although it is appreciated that humans can use a number of visual cues to perceive the three-dimensional (3-D) shape of an object, for example, luminance, orientation, binocular disparity, and motion, the exact mechanisms employed are not known (De Yoe and Van Essen 1988). An important approach to understanding the computations performed by the visual system is to develop algorithms (Marr 1982) or neural network models (Lehky and Sejnowski 1988; Siegel 1987) that are capable of computing shape from specific cues in the visual image. In this study we investigated the ability of observers to see the 3-D shape of an object using motion cues, so called structure-from-motion (SFM). We measured human performance in a two-alternative forced choice task using novel dynamic random-dot stimuli with limited point lifetimes. We show that the human visual system integrates motion information spatially and temporally (across several point lifetimes) as part of the process for computing SFM. We conclude that SFM algorithms must include surface interpolation to account for human performance. Our experiments also provide evidence that local velocity information, and not position information derived from discrete views of the image (as proposed by some algorithms), is used to solve the SFM problem by the human visual system.

This content is only available as a PDF.