To recognize phonemes across variation in talkers, listeners can use information about vocal characteristics, a process referred to as “talker normalization.” The present study investigates the cortical mechanisms underlying talker normalization using fMRI. Listeners recognized target words presented in either a spoken list produced by a single talker or a mix of different talkers. It was found that both conditions activate an extensive cortical network. However, recognizing words in the mixed-talker condition, relative to the blocked-talker condition, activated middle/superior temporal and superior parietal regions to a greater degree. This temporal– parietal network is possibly associated with selectively attending and processing spectral and spatial acoustic cues required in recognizing speech in a mixed-talker condition.