A critical subroutine of self-monitoring during speech production is to detect any deviance between expected and actual auditory feedback. Here we investigated the associated neural dynamics using MEG recording in mental-imagery-of-speech paradigms. Participants covertly articulated the vowel /a/; their own (individually recorded) speech was played back, with parametric manipulation using four levels of pitch shift, crossed with four levels of onset delay. A nonmonotonic function was observed in early auditory responses when the onset delay was shorter than 100 msec: Suppression was observed for normal playback, but enhancement for pitch-shifted playback; however, the magnitude of enhancement decreased at the largest level of pitch shift that was out of pitch range for normal conversion, as suggested in two behavioral experiments. No difference was observed among different types of playback when the onset delay was longer than 100 msec. These results suggest that the prediction suppresses the response to normal feedback, which mediates source monitoring. When auditory feedback does not match the prediction, an “error term” is generated, which underlies deviance detection. We argue that, based on the observed nonmonotonic function, a frequency window (addressing spectral difference) and a time window (constraining temporal difference) jointly regulate the comparison between prediction and feedback in speech.