Previous research has demonstrated a link between language and action in the brain. The present study investigates the strength of this neural relationship by focusing on a potential interface between the two systems: cospeech iconic gesture. Participants performed a Stroop-like task in which they watched videos of a man and a woman speaking and gesturing about common actions. The videos differed as to whether the gender of the speaker and gesturer was the same or different and whether the content of the speech and gesture was congruent or incongruent. The task was to identify whether a man or a woman produced the spoken portion of the videos while accuracy rates, RTs, and ERPs were recorded to the words. Although not relevant to the task, participants paid attention to the semantic relationship between the speech and the gesture, producing a larger N400 to words accompanied by incongruent versus congruent gestures. In addition, RTs were slower to incongruent versus congruent gesture–speech stimuli, but this effect was greater when the gender of the gesturer and speaker was the same versus different. These results suggest that the integration of gesture and speech during language comprehension is automatic but also under some degree of neurocognitive control.