How are we able to easily and accurately recognize speech sounds despite the lack of acoustic invariance? One proposed solution is the existence of a neural representation of speech syllable perception that transcends its sensory properties. In the present fMRI study, we used two different audiovisual speech contexts both intended to identify brain areas whose levels of activation would be conditioned by the speech percept independent from its sensory source information. We exploited McGurk audiovisual fusion to obtain short oddball sequences of syllables that were either (a) acoustically different but perceived as similar or (b) acoustically identical but perceived as different. We reasoned that, if there is a single network of brain areas representing abstract speech perception, this network would show a reduction of activity when presented with syllables that are acoustically different but perceived as similar and an increase in activity when presented with syllables that are acoustically similar but perceived as distinct. Consistent with the long-standing idea that speech production areas may be involved in speech perception, we found that frontal areas were part of the neural network that showed reduced activity for sequences of perceptually similar syllables. Another network was revealed, however, when focusing on areas that exhibited increased activity for perceptually different but acoustically identical syllables. This alternative network included auditory areas but no left frontal activations. In addition, our findings point to the importance of subcortical structures much less often considered when addressing issues pertaining to perceptual representations.