Although it is well documented that occurrence of an irrelevant and nonpredictive sound facilitates motor responses to a subsequent target light appearing nearby, the cause of this “exogenous spatial cuing effect” has been under discussion. On the one hand, it has been postulated to be the result of a shift of visual spatial attention possibly triggered by parietal and/or cortical supramodal “attention” structures. On the other hand, the effect has been considered to be due to multisensory integration based on the activation of multisensory convergence structures in the brain. Recent RT experiments have suggested that multisensory integration and exogenous spatial cuing differ in their temporal profiles of facilitation: When the nontarget occurs 100–200 msec before the target, facilitation is likely driven by crossmodal exogenous spatial attention, whereas multisensory integration effects are still seen when target and nontarget are presented nearly simultaneously. Here, we develop an extension of the time-window-of-integration model that combines both mechanisms within the same formal framework. The model is illustrated by fitting it to data from a focused attention task with a visual target and an auditory nontarget presented at horizontally or vertically varying positions. Results show that both spatial cuing and multisensory integration may coexist in a single trial in bringing about the crossmodal facilitation of RT effects. Moreover, the formal analysis via time window of integration allows to predict and quantify the contribution of either mechanism as they occur across different spatiotemporal conditions.