Humans have an exceptional ability to extract specific audio streams of interest in a noisy environment; this is known as the cocktail party effect. It is widely accepted that this ability is related to selective attention, a mental process that enables individuals to focus on a particular object. Evidence suggests that sensory neurons can be modulated by top-down signals transmitted from the prefrontal cortex. However, exactly how the projection of attention signals to the cortex and subcortex influences the cocktail effect is unclear. We constructed computational models to study whether attentional modulation is more effective at earlier or later stages for solving the cocktail party problem along the auditory pathway. We modeled the auditory pathway using deep neural networks (DNNs), which can generate representational neural patterns that resemble the human brain. We constructed a series of DNN models in which the main structures were autoencoders. We then trained these DNNs on a speech separation task derived from the dichotic listening paradigm, a common paradigm to investigate the cocktail party effect. We next analyzed the modulation effects of attention signals during all stages. Our results showed that the attentional modulation effect is more effective at the lower stages of the DNNs. This suggests that the projection of attention signals to lower stages within the auditory pathway plays a more significant role than the higher stages in solving the cocktail party problem. This prediction could be tested using neurophysiological experiments.

You do not currently have access to this content.