The discrimination of concurrent sounds is paramount to speech perception. During social gatherings, listeners must extract information from a composite acoustic wave, which sums multiple individual voices that are simultaneously active. The observers' ability to identify two simultaneously presented vowels improves with increasing separation between the fundamental frequencies (f0) of the two vowels. Event-related potentials to stimuli presented during attend and ignore conditions revealed activity between 130 and 170 msec after sound onset that reflected the f0 differences between the two vowels. Another, more posterior and right-lateralized, negative wave maximal at 250 msec, and a central-parietal slow negativity were observed only during vowel identification and may index stimulus categorization. This sequence of neural events supports a multistage model of auditory scene analysis in which the spectral pattern of each vowel constituent is automatically extracted and then matched against representations of those vowels in working memory.