The neural activity of speech sound processing (the N1 component of the auditory ERP) can be suppressed if a speech sound is accompanied by concordant lip movements. Here we demonstrate that this audiovisual interaction is neither speech specific nor linked to humanlike actions but can be observed with artificial stimuli if their timing is made predictable. In Experiment 1, a pure tone synchronized with a deformation of a rectangle induced a smaller auditory N1 than auditory-only presentations if the temporal occurrence of this audiovisual event was made predictable by two moving disks that touched the rectangle. Local autoregressive average source estimation indicated that this audiovisual interaction may be related to integrative processing in auditory areas. When the moving disks did not precede the audiovisual stimulus—making the onset unpredictable—there was no N1 reduction. In Experiment 2, the predictability of the leading visual signal was manipulated by introducing a temporal asynchrony between the audiovisual event and the collision of moving disks. Audiovisual events occurred either at the moment, before (too “early”), or after (too “late”) the disks collided on the rectangle. When asynchronies varied from trial to trial—rendering the moving disks unreliable temporal predictors of the audiovisual event—the N1 reduction was abolished. These results demonstrate that the N1 suppression is induced by visual information that both precedes and reliably predicts audiovisual onset, without a necessary link to human action-related neural mechanisms.