Saccadic reaction time to visual targets tends to be faster when stimuli from another modality (in particular, audition and touch) are presented in close temporal or spatial proximity even when subjects are instructed to ignore the accessory input (focused attention task). Multisensory interaction effects measured in neural structures involved in saccade generation (in particular, the superior colliculus) have demonstrated a similar spatio-temporal dependence. Neural network models of multisensory spatial integration have been shown to generate convergence of the visual, auditory, and tactile reference frames and the sensorimotor coordinate transformations necessary for coordinated head and eye movements. However, because these models do not capture the temporal coincidences critical for multisensory integration to occur, they cannot easily predict multisensory effects observed in behavioral data such as saccadic reaction times. This article proposes a quantitative stochastic framework, the time-window-of-integration model, to account for the temporal rules of multisensory integration. Saccadic responses collected from a visual–tactile focused attention task are shown to be consistent with the time-window-of-integration model predictions.