The fluency and the reliability of speech production suggest a mechanism that links motor commands and sensory feedback. Here, we examined the neural organization supporting such links by using fMRI to identify regions in which activity during speech production is modulated according to whether auditory feedback matches the predicted outcome or not and by examining the overlap with the network recruited during passive listening to speech sounds. We used real-time signal processing to compare brain activity when participants whispered a consonant–vowel–consonant word (“Ted”) and either heard this clearly or heard voice-gated masking noise. We compared this to when they listened to yoked stimuli (identical recordings of “Ted” or noise) without speaking. Activity along the STS and superior temporal gyrus bilaterally was significantly greater if the auditory stimulus was (a) processed as the auditory concomitant of speaking and (b) did not match the predicted outcome (noise). The network exhibiting this Feedback Type × Production/Perception interaction includes a superior temporal gyrus/middle temporal gyrus region that is activated more when listening to speech than to noise. This is consistent with speech production and speech perception being linked in a control system that predicts the sensory outcome of speech acts and that processes an error signal in speech-sensitive regions when this and the sensory data do not match.