Multimodal virtual environments (VE) succeed better than single-sensory technologies in creating a sense of presence. We hypothesize that the underlying cognitive mechanism is related to a faster mental processing of multimodal events. Comparing simple detection times of unimodal (auditory, visual, and haptic) events, with bimodal and trimodal combinations, we show that mental processing times are in the following order: unimodal > bimodal > trimodal. Given this processing-speed advantage, multimodal VE users start their cognitive process faster, thus, in a similar exposure time they can pay attention to more informative cues and subtle details in the environment and integrate them creatively. This richer, more complete and coherent experience may contribute to an enhanced sense of presence.