Cross-modal fusion phenomena suggest specific interactions of auditory and visual sensory information both within the speech and nonspeech domains. Using whole-head magnetoencephalography, this study recorded M50 and M100 fields evoked by ambiguous acoustic stimuli that were visually disambiguated to perceived /ta/ or /pa/ syllables. As in natural speech, visual motion onset preceded the acoustic signal by 150 msec. Control conditions included visual and acoustic nonspeech signals as well as visual-only and acoustic-only stimuli. (a) Both speech and nonspeech motion yielded a consistent attenuation of the auditory M50 field, suggesting a visually induced “preparatory baseline shift” at the level of the auditory cortex. (b) Within the temporal domain of the auditory M100 field, visual speech and nonspeech motion gave rise to different response patterns (nonspeech: M100 attenuation; visual /pa/: left-hemisphere M100 enhancement; /ta/: no effect). (c) These interactions could be further decomposed using a six-dipole model. One of these three pairs of dipoles (V270) was fitted to motion-induced activity at a latency of 270 msec after motion onset, that is, the time domain of the auditory M100 field, and could be attributed to the posterior insula. This dipole source responded to nonspeech motion and visual /pa/, but was found suppressed in the case of visual /ta/. Such a nonlinear interaction might reflect the operation of a binary distinction between the marked phonological feature “labial” versus its underspecified competitor “coronal.” Thus, visual processing seems to be shaped by linguistic data structures even prior to its fusion with auditory information channel.