When a single flash of light is presented interposed between two brief auditory stimuli separated by 60–100 msec, subjects typically report perceiving two flashes [Shams, L., Kamitani, Y., & Shimojo, S. Visual illusion induced by sound. Brain Research, Cognitive Brain Research, 14, 147–152, 2002; Shams, L., Kamitani, Y., & Shimojo, S. Illusions. What you see is what you hear. Nature, 408, 788, 2000]. Using ERP recordings, we previously found that perception of the illusory extra flash was accompanied by a rapid dynamic interplay between auditory and visual cortical areas that was triggered by the second sound [Mishra, J., Martínez, A., Sejnowski, T. J., & Hillyard, S. A. Early cross-modal interactions in auditory and visual cortex underlie a sound-induced visual illusion. Journal of Neuroscience, 27, 4120–4131, 2007]. In the current study, we investigated the effect of attention on the ERP components associated with the illusory extra flash in 15 individuals who perceived this cross-modal illusion frequently. All early ERP components in the cross-modal difference wave associated with the extra flash illusion were significantly enhanced by selective spatial attention. The earliest attention-related modulation was an amplitude increase of the positive-going PD110/PD120 component, which was previously shown to be correlated with an individual's propensity to perceive the illusory second flash [Mishra, J., Martínez, A., Sejnowski, T. J., & Hillyard, S. A. Early cross-modal interactions in auditory and visual cortex underlie a sound-induced visual illusion. Journal of Neuroscience, 27, 4120–4131, 2007]. The polarity of the early PD110/PD120 component did not differ as a function of the visual field (upper vs. lower) of stimulus presentation. This, along with the source localization of the component, suggested that its principal generator lies in extrastriate visual cortex. These results indicate that neural processes previously shown to be associated with the extra flash illusion can be modulated by attention, and thus are not the result of a wholly automatic cross-modal integration process.