Time-to-contact (TTC) estimation is beneficial for visual navigation. It can be estimated from an image projection, either in a camera or on the retina, by looking at the rate of expansion of an object. When expansion rate (E) is properly defined, TTC = 1/E. Primate dorsal MST cells have receptive field structures suited to the estimation of expansion and TTC. However, the role of MST cells in TTC estimation has been discounted because of large receptive fields, the fact that neither they nor preceding brain areas appear to decompose the motion field to estimate divergence, and a lack of experimental data. This letter demonstrates mathematically that template models of dorsal MST cells can be constructed such that the output of the template match provides an accurate and robust estimate of TTC. The template match extracts the relevant components of the motion field and scales them such that the output of each component of the template match is an estimate of expansion. It then combines these component estimates to provide a mean estimate of expansion across the object. The output of model MST provides a direct measure of TTC. The ViSTARS model of primate visual navigation was updated to incorporate the modified templates. In ViSTARS and in primates, speed is represented as a population code in V1 and MT. A population code for speed complicates TTC estimation from a template match. Results presented in this letter demonstrate that the updated template model of MST accurately codes TTC across a population of model MST cells. We conclude that the updated template model of dorsal MST simultaneously and accurately codes TTC and heading regardless of receptive field size, object size, or motion representation. It is possible that a subpopulation of MST cells in primates represents expansion in this way.
TTC estimation is not strictly necessary for obstacle avoidance. Several species have been shown to perform avoidance maneuvers when a stimulus reaches a certain angular size (Nakagawa & Hongjian, 2010). However, Lee (1976) demonstrated that humans use a measure similar to TTC for playing ball games and driving a car. In the psychophysics literature, the term looming is often used to describe an object that expands on the image plane. Cells that respond to looming have been found in a variety of animals (Hayes & Saiff, 1967; Judge & Rind, 1997; Schiff, Caviness, & Gibson, 1962; Wang & Frost, 1992). Looming responses may be more behaviorally relevant than literal TTC responses since an expansion-sensitive cell will become more active as the threat becomes greater rather than decreasing as a function of threat. Note that a theoretical TTC proportional response cell would have a high tonic activation and become less active as the object approaches. Cells may also be considered TTC responsive if activation rises to a peak and then decreases proportionally to TTC. In both cases, for the range of TTC that is being accurately encoded, activation decreases as TTC gets shorter. However, in the latter case, it has also been shown that peak timing can signify important behavioral information. For example, bullfrogs appear to have cells tuned for a peak response at a particular TTC and initiate avoidance maneuvers when those cells fire (Nakagawa & Hongjian, 2010). Pigeons have been shown to have cells that respond to both expansion and TTC (Wang & Frost, 1992).
In primates, there is no strong evidence for a TTC response in cellular data; however, in dorsal MST (MSTd), some cells respond to global patterns of expansion motion (Duffy & Wurtz, 1991a, 1991b; Tanaka, Fukada, & Saito, 1989). These cells generally have large receptive fields, on average over 1000 deg2 (Raiguel et al., 1997). These MST cell responses are consistent with inputs from a large number of smaller receptive field MT cells, and the organization of the cells’ selectivity for the MT inputs seems to define the pattern preference of the cells. If preferred directions of MT cells are arranged radially, then the resulting MST cell prefers expansion or contraction (Tanaka et al., 1989). MST cells that respond to expansive motion patterns can determine the focus of expansion (FoE) of the motion pattern, which often coincides with the observer's heading (Gibson, 1955). As a result, expansion-sensitive MSTd cells are often characterized as coding heading.
Perrone (1992) and Perrone and Stone (1994) demonstrated that MSTd could be modeled by a number of templates describing behaviorally important patterns of motion, primarily coding different headings. Each template defines the organization of inputs from MT that the MST cell responds optimally to. Template models of MSTd are able to explain human heading perception data in static environments (Browning, Grossberg, & Mingolla, 2009a; Perrone & Stone, 1994, 1998) and in the presence of independently moving objects (Layton, Mingolla, & Browning, 2012). Template models of MSTd can explain rotation data through the use of extraretinal signals to remove the effects of rotation before the template is applied (Beintema & van den Berg, 1998; Elder, Grossberg, & Mingolla, 2009). Template models thereby provide a functionally accurate model of primate MSTd expansion cells. Template models have been demonstrated to code relative depth, which is proportional to TTC in a static world (Browning et al., 2009a; Grossberg, Mingolla, & Pack, 1999; Perrone, 1992; Perrone & Stone, 1994). However, the output of these models is independent of neither the receptive field size of the cell nor the size of the object. For the purposes of producing depth maps of the environment, where the receptive field sizes of the cells are either known or constant, this may be sufficient. However, these template definitions are insufficient for the estimation TTC for approaching objects that do not fill the receptive field of the cell, in environments where the receptive field size is unknown, or for the precise estimation of TTC. Lappe (2004) further argues that the large receptive field sizes of MST cells are inconsistent with TTC estimation. However, if we assume that in general, TTC cells are not object specific, then to obtain an object-size independent TTC response, a cell will require a large receptive field to account for a range of objects throughout their approach trajectories.
The work described in this letter analyzes a general template model of MSTd to determine how TTC can be coded in MSTd regardless of the receptive field size of the cell, independent of the size of the object. This analysis is used to update the ViSTARS model (Browning, Grossberg, & Mingolla, 2009a, 2009b) to demonstrate how V1 and MT encode the required information to enable TTC estimation in MSTd.
2. Time-to-Contact Estimation in a Template Model of MSTd
A template model of MSTd consists of a number of templates, each corresponding to a particular motion pattern. In general, each motion pattern characterizes the expected motion for a particular heading direction (see Figure 2). In the model, a template is represented as a multidimensional array across space and motion. When using a standard 2D motion representation, there is a 2D vector (u, v) representing the expected motion at each spatial position across the input space. For a model consisting of an N-D representation of direction of motion, there is an N-D motion vector at each spatial location. Motion vectors, estimated from the image measurements, are compared against the template via the inner product between the template and the estimated motion. In its simplest form, the template with the highest inner product is considered the best match, and the motion pattern corresponding to the template is considered to be veridical. For example, the heading corresponding to the template with the best match is considered to be the current heading.
We define y and v to contain one vector, y and v, for every spatial location represented by the cell (see Figure 2).
If the templates are constructed such that the vectors in y are represented with unit magnitude, preserving the angle but not the magnitude of the distance from the FoE (see Figure 2A), then the same properties are evident, but the inner product scales linearly rather than as a square of receptive field size, making the response more robust to noise. This was the template definition used in the ViSTARS model (Browning et al., 2009a, 2009b).
This analysis demonstrates that large receptive field template models of MSTd can be constructed to provide accurate expansion/TTC estimates without decomposing the optic flow field into component parts. The response of such model cells is independent of receptive field size and the number of elements in the template.
3. Integration of Time-to-Contact into the ViSTARS Model
We integrated the updated template into a difference-equation version of the ViSTARS model (dViSTARS) to demonstrate that a distributed representation of motion, such as that found in primate V1 and MT, could support the estimation TTC in MSTd. The dViSTARS model is based on the dynamical systems models described in Browning et al. (2009a, 2009b) and Layton et al. (2012).
3.1. The dViSTARS Model.
3.2. Simulation Performance.
A 256 × 256 pixel random dot input was generated simulating an approach to a frontal plane from 10 s time to contact to 1 s time to contact with a frame rate of 100 fps. Random dots were generated using the Matlab rand() function with 1% of the pixels given a value of 1. All other pixels had a value of zero. To ensure sufficient dots in the image projection toward the end of the simulation, the center 65 × 65 pixels were replaced with a random dot array where 10% of the pixels had a value of 1. To provide a realistic image projection, we conceptualized the distance of the frontal plane at the start of the simulation as 10 m, the velocity of the camera at 1 m/s, and the focal length of the camera to 0.1 m. These units and values are arbitrary, provided they maintain the same ratios.
The model was configured with α = 0.775, β = 1.5, γMT = 0.9, γMST = 0.99. Three scales were implemented: scale 1 processed the stimulus at its native resolution, scale 2 processed the stimulus with x and y dimensions reduced to 0.75 that of the original stimulus, and scale 3 processed the stimulus with x and y dimensions reduced to 0.5 that of the original stimulus, with σ1 = 3000, σ2 = 5000, σ3 = 14000. Larger scales correspond to larger cell receptive field sizes in model V1 and MT. The receptive field sizes of model MSTd cells were the same for all scales and covered the whole input space.
Figure 4A shows MSTd outputs for each scale in response to the stimulus. No single scale has an output that accurately codes the expansion rate of the stimulus. However, each scale appears to provide a good approximation of expansion for some range of values. Figure 4B shows how MSTd outputs relate to an estimate of TTC.
In order to demonstrate that the population of MSTd cells represents expansion/TTC accurately, we replaced the scale-independent temporal accumulation in model MSTd with a temporal filter that combined the outputs of each scale. If the output from scale 3 was above 0.7, scale 3 was input to the temporal filter; if scale 3 was below 0.7 and scale 2 was above 0.2, scale 2 was input to the temporal filter; otherwise scale 1 was input to the temporal filter. These cut-off values were chosen empirically to provide a reasonable TTC estimate. The motivation here is to demonstrate that the population accurately codes TTC. It is not intended to reflect how the population response is decoded in the primate brain. Figure 5 shows how this combined estimate produces an accurate expansion/TTC estimate.
Simulations were repeated multiple times with different random dot stimuli, and results were qualitatively the same in each case. To simulate different-sized objects, we repeated the analysis but removed stimulus components toward the periphery of the input space; only the center 65 × 65 pixels contained dots at the start of the stimulus stream. Smaller cell receptive fields were simulated by zeroing out 20 pixels from each edge of the stimulus. For both smaller objects and smaller receptive fields, results were qualitatively the same as are shown above but generally displayed more noise in the MSTd outputs at small expansion rates (higher TTCs). We also tested the model using stimuli based on geometric shapes and on one video of an approach to a human; our results in all cases were qualitatively the same. The geometric shapes were chosen to produce different variations of the aperture problem in V1: circles do not produce an aperture problem; squares and compound triangle shapes had oriented edges that were not orthogonal to the direction of motion and so introduced an aperture problem. Shapes were generated in Matlab to contain no texture in their interior so that the model motion estimates were forced to be sparse around the boundaries of the object. With the exception of the video, which had a different frame rate and pixel intensity distribution, the model parameters were exactly the same for all stimuli. The video was recorded at 60 fps using a ContourHD camera mounted on a John Deere Gator and manually driven away from a human (we then process the video backward). Accurate speed and position information were unavailable. True TTC was defined as the actual time between the current image frame and the camera touching the human. Results for all stimuli and the modified parameters used for the video are shown in Figure 6.
The mathematical analysis and insights described in this letter come directly from the representation of time-to-contact as a function of expansion rate, equation 1.1. This representation clearly describes how expansion (looming/eta) relates to TTC (tau). Equation 2.3 demonstrates a template model of a heading-sensitive cell that responds proportionally to the expansion rate of the input stimulus. This elegant solution to the estimation of TTC indicates that a simple neural circuit, as documented in primate dorsal MST, is capable of accurately estimating TTC directly from an optic flow field while concurrently estimating heading. Provided that motion estimates are accurate, this method provides a robust way of incorporating multiple motion vectors into a single object-based measure of time to contact. The technique could be expanded further to differentially weight motion vectors through manipulation of the parameter N. For example, if N were constructed to incorporate a normalized confidence metric for each of the motion estimates, then the template match equation, 2.6, could perform a weighted mean of all of the component TTC estimates. This could help remove outliers and reduce the reliance on a large value of N to provide accurate results in noisy data. Temporal filtering of equation 2.6, as we did in dViSTARS, further reduces the effect of noise and relaxes the constraint on a large N for any given image frame provided N is large over some subset of image frames.
The primary limitation of our model is that it requires that objects be segmented from each other and the background. If they are not, then the resulting TTC estimate will be a mixture of the TTCs for all objects within the receptive field. Furthermore, the model assumes that objects are roughly planar. Any object that has a large, protruding element or highly irregular surface will produce a biased TTC estimate. These limitations are shared by any method that attempts to estimate a single TTC value for an object of arbitrary 3D shape. Note that in our video example, we did not explicitly segment the object, but the background was sufficiently far away to introduce few, if any, motion signals.
Section 3 shows that when motion direction and speed are represented across a population of units, as it is in primate V1 and MT, the template model accurately represents TTC across the population of templates. Based on these results, we claim that our proposed template definition is sufficient for accurate TTC estimation from a distributed representation of motion, such as that found in primate V1 and MT.
In our analysis, we combined the outputs of different scales after the template match. This is not the optimal method of scale integration and is not sufficient for the robust estimation of TTC across large slanted planar surfaces or highly nonplanar objects. In general, this method will tend toward the smallest TTC (largest expansion rate) component of a given object. In any expanding object, the motion vectors closer to the FoE are smaller than the motion vectors far from the FoE. Scale integration before the template match would allow for this distribution of speeds, all corresponding to the same TTC, to be captured by a single template match. This should improve reliability by increasing the number of valid motion estimates in the template match. Whether or how this may be implemented in the brain is unknown. For computational applications, this is irrelevant since speed is represented by the magnitude of a vector rather than by a population code, and as a result, no combination of scales is required.
Neurophysiological data from MST cells in response to stimuli designed to elicit a TTC response are inconclusive. The analysis shown here may provide insights into how to process the neural data to find TTC responses across a population of cells. In primates, speed is represented across a population of neurons. We show that TTC could be coded across a population of neurons. The experimenter should therefore look for cells that respond proportionally to a small range of TTC values and demonstrate that the population codes a behaviorally beneficial range of TTC values. We predict that TTC is coded in MSTd and is represented by expansion rather than time. However, analyzing individual neurons over small ranges of TTC and neuronal populations over large ranges of TTC will allow for investigation of TTC in any brain area. Moreover, the work described here makes a specific prediction that TTC and heading estimation are performed by the same circuits. Human and primate researchers could investigate this through stimuli designed to show that heading estimation is not affected by TTC and, by construction of contrived inputs, that TTC estimation, or cell activity, does not require a consistent FoE in the stimulus space.
In summary, TTC estimates from our proposed template model of MSTd are accurate regardless of the receptive field of the cell, the object size, or whether motion is coded as a vector or distributed across a population of cells.
Appendix: TTC/Expansion Equivalence
I thank the three anonymous reviewers for their thoughtful and helpful feedback. This work has been supported in part by the Office of Naval Research (ONR N00014-11-1-0535).