In the visual modality, perceptual demand on a goal-directed task has been shown to modulate the extent to which irrelevant information can be disregarded at a sensory-perceptual stage of processing. In the auditory modality, the effect of perceptual demand on neural representations of task-irrelevant sounds is unclear. We compared simultaneous ERPs and fMRI responses associated with task-irrelevant sounds across parametrically modulated perceptual task demands in a dichotic-listening paradigm. Participants performed a signal detection task in one ear (Attend ear) while ignoring task-irrelevant syllable sounds in the other ear (Ignore ear). Results revealed modulation of syllable processing by auditory perceptual demand in an ROI in middle left superior temporal gyrus and in negative ERP activity 130–230 msec post stimulus onset. Increasing the perceptual demand in the Attend ear was associated with a reduced neural response in both fMRI and ERP to task-irrelevant sounds. These findings are in support of a selection model whereby ongoing perceptual demands modulate task-irrelevant sound processing in auditory cortex.
The stage in the information processing stream at which task-irrelevant information can be disregarded has been the topic of longstanding debate in cognitive science between theorists advocating early selection (Treisman, 1969; Broadbent, 1958) and those advocating late selection (Duncan & Humphreys, 1992; Duncan, 1980; Norman, 1968; Deutsch & Deutsch, 1963). In early selection models, attention shuts down or attenuates processing of irrelevant information at an early sensory-perceptual stage of processing. In late selection models, attention acts only after incoming relevant and irrelevant information has been fully processed. The load model of attention combines aspects of both views and holds that the level of perceptual demand (load) required for processing task-relevant stimuli determines the extent to which irrelevant information can be disregarded (Lavie, 1995, 2005; Lavie, Hirst, de Fockert, & Viding, 2004; Lavie & Tsal, 1994). While relying on the idea of limited attentional resources (Duncan, Martens, & Ward, 1997), the load model predicts that high perceptual load depletes attentional resources resulting in reduced perception of distractors (early selection view), whereas under low perceptual load, unused resources are directed automatically toward processing of irrelevant distractors (late selection view). There is considerable behavioral evidence in support of the load model, at least in the visual modality (Lavie, 1995, 2005, 2010; Lavie et al., 2004; Lavie & Tsal, 1994; cf. Benoni & Tsal, 2010). Importantly, neuroimaging studies have shown that perceptual demand modulates neural activity associated with irrelevant visual distractors (e.g., faces, moving dots, letters, flickering checkerboards) in the direction predicted by the model, namely smaller neural response in sensory-perceptual networks for distractors under high perceptual load and larger responses when perceptual load is low (Schwartz et al., 2005; Yi, Woodman, Widders, Marois, & Chun, 2004; Berman & Colby, 2002; O'Connor, Fukui, Pinsk, & Kastner, 2002; Vuilleumier, Armony, Driver, & Dolan, 2001; Rees, Frith, & Lavie, 1997). Nevertheless, these findings can be explained with at least one alternative theory according to which the control of attention is based on an inhibition mechanism (as opposed to limited resources) that becomes stronger as attention activity for relevant stimuli is increased with task demands (LaBerge, 1995, 2002).
In the auditory modality, the extent to which task-irrelevant information is processed has been studied widely with behavioral measures starting with Cherry's classic dichotic listening experiments (Koch, Lawo, Fels, & Vorlander, 2011; Dark, Johnston, Myles-Worsley, & Farah, 1985; Johnston & Heinz, 1979; Moray, 1959; Cherry, 1953). The effect of perceptual demand in an auditory central task on sensory-perceptual processing of irrelevant sounds has not been studied systematically with neuroimaging. There is some evidence for greater processing of task-irrelevant sound features in auditory cortex when the demands of an auditory task is higher (Sabri, Liebenthal, Waldron, Medler, & Binder, 2006), inconsistent with findings in the visual modality. One reason could be the lack of spatial separation between relevant and irrelevant information in this study. Facilitatory effects of high perceptual demand were observed in a visual Stroop task, where the target word and distractor color were contained within a single stimulus (Chen, 2003). In such paradigms, the greater attention channeled to task targets under high demand is also directed to the irrelevant information contained in them (Lavie, 2005). A recent dichotic listening study, whereby relevant and irrelevant information were presented to opposite ears, observed greater activity for the latter in auditory cortex as task demands decreased (Rinne, 2010). However, this effect was weak and did not reach significance, possibly due to relatively low statistical power (n = 9).
Here we investigated the extent to which sensory-perceptual processing of task-irrelevant sounds is modulated by the perceptual demand of a primary auditory task, in a dichotic listening paradigm, using simultaneous recordings of ERPs and fMRI. In the primary task (detection of tone in noise), signal-to-noise ratio (SNR) was modulated parametrically to create four perceptual load levels, while keeping the noise level constant. Task-relevant and -irrelevant information was spatially separated using dichotic presentation. To examine the effects of perceptual demand on task-irrelevant information, neural responses to ignored syllables were compared between the lowest and highest loads in the ERP and in localizer-defined speech-sensitive area in auditory cortex. To determine if the load manipulation was related linearly to the BOLD signal in auditory cortex and to the ERP elicited by the syllables, contrasts weighted by the four load levels were employed. Our findings corroborate and extend those in the visual modality, demonstrating reduced activity for task-irrelevant sounds in sensory-perceptual auditory ROI under high compared with low perceptual demand. These findings clarify the mechanism by which the brain manages the processing of multiple sources of auditory information and provide support for a model involving selection at a sensory-perceptual processing stage as modulated by perceptual demand.
Participants were 24 healthy adults (10 men, mean age = 24 years, SD = 3) with no history of neurological or hearing impairments and normal or corrected-to-normal visual acuity. The participants were native English speakers, and all were right-handed according to the Edinburgh Handedness Inventory (Oldfield, 1971). Data from eight participants were excluded from ERP analysis (six due to noisy EEG and two due to equipment failure). Data from one participant were excluded from fMRI analysis (due to excessive motion artifact). Informed consent was obtained from each participant before the experiment, in accordance with the Medical College of Wisconsin Institutional Review Board.
Task Design and Procedure
The study employed an event-related design with individual trials blocked by condition and a dichotic listening paradigm. There were 10 simultaneous ERP/fMRI dichotic-listening runs, each divided into eight blocks of 51 sec/block. Each block was composed of seventeen 1.2-sec trials (Figure 1). Image acquisition (1.8 sec) followed immediately each trial. In the Attend ear, stimulation consisted of a white noise burst (Noise; 1.2 sec) with a 50-msec, 800-Hz signal tone (Tone; p = .47) embedded in eight of the trials. The Tone was presented at a random time ranging from 200 to 1000 msec after the beginning of the trial. The SNR between the Tone and Noise was modulated parametrically to create four perceptual demand/load conditions ranging from low to high (Load 1, Load 2, Load 3, Load 4). The Noise was presented at a fixed intensity (112 dB) with the amplitude of the Tone varying to produce the desired SNR (88, 89, 90, 91 dB). The SNR (Load) was fixed within each 51-sec block.
In the Ignore ear, half of the blocks included syllables. In Syllable blocks, 10 different task-irrelevant syllables (/ba/, /da/, /bi/, /di/, /bu/, /du/, /be/, /de/, /bo/, /do/), each 180 msec in duration, were presented to the Ignore ear at a random time ranging from 200 to 1000 msec after the beginning of the trial. The tones (in the Attend ear) and syllables (in the Ignore ear) were presented such that they did not overlap temporally. Within a single 1.2-sec trial, either a syllable (in the Ignore ear) or a Tone (in the Attend ear) was presented, except for two trials, which included both (ISI ≥ 500 msec) in random order. In No-Syllable control blocks, speech sounds were not presented. Trials were randomized within each block. Eight blocks (4 Syllable, 4 No-Syllable) were delivered randomly within each run. The presentation order of the four load conditions was randomized with equal probability. Each block was followed by a 12-sec rest period. The ISI of the syllables was jittered exponentially between 3 and 15 sec. In the entire experiment, there were 100 ignored syllable events per load condition.
During the experiment, participants performed a signal detection task in the Attend ear and were instructed to ignore the irrelevant speech sounds presented to the other ear. Attend and Ignore ear designation was fixed within a run. Participants were instructed to press Button 1 upon detection of a tone and Button 2 when they did not hear a tone. They were told that approximately half of the noise bursts in a block included a tone and that some of them would be harder to detect. The ear of delivery for the signal detection task was equiprobable and randomized between the runs. A cross-hair was presented in the middle of the screen to assist in minimizing eye movement.
An event-related localizer run, designed to identify areas sensitive to speech stimuli, followed the 10 dichotic-listening runs. In the localizer run, participants discriminated between randomly presented 180-msec binaural tones and syllables by pressing Buttons 1 and 2, respectively. The syllables were identical to those used in the dichotic-listening runs. Tones were 10 logarithmically spaced sinewaves ranging from 200 to 4000 Hz. Stimulation consisted of randomly presented 40 syllable and 40 tone events, occurring during the 1.2 sec between image acquisitions. ISI was jittered exponentially between 3 and 9 sec (mean = ∼5 sec).
The syllables were recorded from a male native English speaker and normalized according to loudness. Sounds were delivered through MRI-compatible STAX SR-003 electrostatic ear inserts (STAX, Saitama Prefecture, Japan). The visual fixation stimulus was projected through an Epson LCD video projector onto an angled mirror located just above the eyes. Stimulus delivery was controlled by a personal computer running Presentation software (Neurobehavioral Systems, Inc., Albany, CA).
fMRI Acquisition and Analysis
Images were acquired on a 3T GE Excite scanner (GE Medical Systems, Milwaukee, WI). Functional data consisted of T2*-weighted, gradient-echo, echo-planar images (echo time = 20 msec, flip angle = 77°, acquisition time = 1.8 sec, delay = 1.2 sec), obtained using clustered acquisition at 3-sec intervals. Sound stimulation (Noise alone, Noise and Tone, or Syllable) was presented during the 1.2-sec period between image acquisitions to avoid perceptual masking by the acoustic noise of the scanner. Functional images were composed of 35 axially oriented 3 mm slices with a 0.5-mm interslice gap covering the whole brain, with field of view = 192 mm and 64 × 64 matrix, resulting in 3.0 × 3.0 × 3.5 voxel dimensions. A total of 1720 images were acquired across the 10 dichotic-listening runs (172 per run). A total of 168 images were acquired in the localizer run. High-resolution anatomical images of the entire brain were obtained using a 3-D spoiled gradient-echo sequence (SPGR) as a set of 130 contiguous axial slices with 0.938 × 0.938 × 1.0 mm voxel dimensions.
Image analysis was conducted using the AFNI software package (Cox, 1996). Within-subject analysis consisted of spatial registration to minimize motion artifacts (Cox & Jesmanowicz, 1999) and coregistration of functional and anatomy images (Saad et al., 2009). In the dichotic-listening runs, analyses focused on task-irrelevant syllables. Voxel-wise multiple linear regression was applied to individual time series, with reference functions separately representing the occurrence of a syllable, a tone in the Syllable and No-Syllable blocks, or syllable and tone in the four load conditions. Another regressor was added to code Noise alone trials. The shape and magnitude of the hemodynamic response (HRF) were estimated using the program 3dDeconvolve. Coefficient maps were generated for Syllables in each load condition representing the lags of the HRF. The individual coefficient maps were projected into standard stereotaxic space (Talairach & Tournoux, 1988) by linear resampling and then smoothed with a Gaussian kernel of 6 mm FWHM.
The localizer run was analyzed in a similar fashion. The reference functions in the multiple regression represented the occurrence of a syllable or a tone. A general linear test between syllables and tones was conducted at the response peak to obtain regions sensitive specifically to speech sounds. Group maps were created using a random-effects analysis. The group maps were thresholded at a voxel-wise p < .01 and corrected for multiple comparisons by removing clusters smaller than 1008 μl, resulting in a corrected map-wise two-tailed α = .05. This cluster threshold was determined through Monte Carlo simulations that provide the chance probability of spatially contiguous voxels exceeding the voxel-wise p threshold.
An ROI analysis was carried out within speech-sensitive area in auditory cortex. An ROI in middle left superior temporal gyrus (STG) was identified based on the localizer. The average BOLD signal in the identified ROI was extracted for the task-irrelevant syllables in each load condition at the peak height of the HRF, for each participant, and subjected to a paired t test between the two extreme loads. In addition, a test for linear trend of the loads (1 > 2 > 3 > 4) was performed on the mean signals using a repeated-measures ANOVA with a weighted contrast vector.
ERP Acquisition and Analysis
Sixty-four-channel EEG activity was acquired using the Maglink system (Neuroscan, Inc.) in a continuous mode and the Quik-Cap electrode positioning system (Neuroscan, Inc.). Activity was recorded at full bandwidth and digitally sampled at 500 Hz per channel. Electrode sites conformed to the International 10–20 System with CPz serving as the reference. Vertical eye movements and electrocardiogram were each monitored with bipolar recordings. Interelectrode resistance was kept below 5 kΩ.
EEG analysis was conducted using the Scan 4.3 software package (Compumedics Neuroscan), focusing on task-irrelevant syllables. Initial within-subject analysis consisted of bandpass filtering at 0.1–30 Hz, ballistocardiogram artifact removal, creating epochs of −100 to +450 msec from each sound onset, baseline-correction of each epoch by removing the mean voltage value of the whole sweep, and rejection of epochs with voltage values exceeding ±150 μV. The remaining epochs were then averaged according to each load condition. Each waveform was baseline corrected by subtracting the mean voltage of the prestimulus period from each point in the post stimulus interval. Grand-averaged waveforms were computed for syllable events in the four load conditions. The resulting waveforms were digitally rereferenced to the mastoids. Group level analyses were performed using MATLAB (MathWorks, Inc., Natick, MA) and STATISTICA (StatSoft, Inc., Tulsa, OK). Mean amplitudes were extracted for each participant and averaged across 16 frontal electrodes (F7, F8, AF7, AF8, F5, F6, F3, F4, AF3, AF4, FP1, FP2, F1, F2, Fz, FPz) in the 130–230 msec time window in each condition and subjected to a paired t test between the two extreme loads and a repeated-measures ANOVA with a weighted contrast vector to test for linear trend of the loads.
The d′ (z[hit] − z[false alarm]) measure of perceptual sensitivity was calculated for each load. Signal detection performance in Syllable blocks varied by load [F(3, 69) = 20.264, p < .001], with a linear decrease in d′ as perceptual load increased [F(1, 23) = 63.98, p < .001] (Figure 2). An ANOVA with Load (Loads 1, 2, 3, 4) and Block Type (Syllable, No-Syllable) as repeated-measures revealed main effect of Load [F(3, 69) = 26.374, p < .001]. The effect of Block Type and the Interaction was not significant [F(1, 23) = .158, p = .69; F(3, 69) = .615, p = .61], confirming no predictive relationship between relevant and irrelevant sound delivery.
Localizer: Syllables > Tones
The focus of the current study is on the effect of perceptual demand on sensory-perceptual processing of task-irrelevant speech sounds. To identify neural regions specifically related to speech processing, we contrasted Syllable and Tone activation in the localizer run. The contrast Syllables–Tones is presented in Figure 3A. Greater significant activation for syllables over tones was observed in one cluster (x = −59, y = −13, z = −2; threshold z > 2.57, cluster-corrected α = .05, 1008 μl) that included the anterior-middle portion of the left STG and STS and the anterior-lateral portion of Heschl's gyrus (HG). No other statistically significant activation clusters were observed. There were no significant areas of activation for tones over syllables.
Load Effects on Irrelevant Syllable Processing in Speech-sensitive Auditory Region Defined in the Localizer
The Syllable over Tone cluster identified in the localizer run was used as an ROI in the dichotic-listening runs. The average BOLD signal in the left STG ROI, as a function of perceptual load, is depicted in Figure 3B. Mean activation at the HRF peak (6 sec; Figure 3C) was significantly different between the low load (Load 1) where the activation was strongest and the high load (Load 4) where the activation was lowest [t(22) = 2.2233, p = .036]. As the level of load increased, the BOLD signal for irrelevant syllables decreased, as indicated by a linear trend [F(1, 22) = 4.49, p = .04].
A whole-brain analysis was performed to examine whether there were differential load activations beyond the defined ROI. There were no significant differences in activations across the load conditions at a corrected whole-brain threshold (threshold z > 1.96, cluster correction α = .05, 5040 μl). The extent of activation in auditory cortex for Load 1 and Load 4 against baseline is depicted in Figure 4 (threshold z > 3.29, cluster correction α = .05, 347 μl).
Load Effects on Irrelevant Syllable Processing
A fronto-central negativity was observed in the N1 time window in response to irrelevant syllables in all load conditions (Figures 5 and 6). Differences across load conditions for irrelevant syllables were observed approximately 130–230 msec after stimulus onset, predominantly in frontal electrodes (Figure 7). This effect was quantified by computing the mean amplitude in this time range on frontal electrodes, for each perceptual load and for each participant (Figure 8). The mean negativity was significantly higher in amplitude in the low load (Load 1) compared with the high load (Load 4) condition [t(15) = 3.49, p = .003]. The test for linear trend of the loads was also significant [F(1, 15) = 10.77, p = .0005].
The extent of processing of task-irrelevant syllable sounds was assessed using fMRI and ERP measures of brain activity. Modulation of syllable processing by auditory perceptual demands was observed in an ROI encompassing primarily the middle portion of the left STG, and in a negative ERP with onset at 130 msec, the N1 component. High perceptual load, as determined in a psychophysical auditory task, was associated with a reduced neural response in the fMRI and ERP for task-irrelevant syllables, whereas a low load level produced the greatest responses. A linear trend was observed in the fMRI and ERP data, demonstrating increased neural response for task-irrelevant syllables as task demands decreased.
The N1 component is a potential elicited in response to auditory stimulation and associated with sensory processing (Näätänen & Picton, 1987). The amplitude of N1 increases with attention (Woldorff & Hillyard, 1991; Näätänen, 1990; Sams, Aulanko, Aaltonen, & Näätänen, 1990; Hansen, Dickstein, Berka, & Hillyard, 1983; Hillyard, Hink, Schwent, & Picton, 1973), suggesting that this component is susceptible to top–down influences. In the current study, the largest negativity in the N1 time range was observed under the lowest perceptual load, in line with the fMRI results, suggesting greater sensory processing of task-irrelevant complex sounds in that condition. The sources of the N1 were estimated previously to include parts of HG and anterior-middle and posterior STG/planum temporale depending on sound characteristics and dipole estimate methods (Ahveninen et al., 2011; Jääskeläinen et al., 2004; Picton et al., 1999; Fujiwara, Nagamine, Imai, Tanaka, & Shibasaki, 1998; Scherg, Vasjar, & Picton, 1989; Scherg & Von Cramon, 1986). Portions of the left anterior-middle STG/anterior HG region were encompassed in the fMRI ROI. It is likely that the effects of perceptual demands on N1 are reflected to some extent in the differential BOLD signal observed in this ROI.
A converging body of evidence from neuroimaging studies suggests that the middle STG/STS, specifically in the left hemisphere, plays a prominent role in phonemic perception and prelexical processing (DeWitt & Rauschecker, 2012; Leaver & Rauschecker, 2010; Liebenthal et al., 2010; Specht, Osnes, & Hugdahl, 2009; Liebenthal, Binder, Spitzer, Possing, & Medler, 2005; Davis & Johnsrude, 2003; Specht & Reul, 2003). Attention to phonetic material enhances activation in this region (Ahveninen et al., 2011; Woods, Herron, Kang, Cate, & Yund, 2011; Woods & Alain, 2009). The posterior part of STG, planum temporale (not included here as an ROI), has been implicated in processing of spectrally and temporally complex sounds, independent of phonetic contents (Specht & Reul, 2003; Jancke, Wustenberg, Scheich, & Kaplan Layer, 2002; Binder, Frost, Hammeke, Rao, & Cox, 1996). The linear trend observed between load level and BOLD signal in the middle STG region suggests that processing of unattended sounds is reduced as demands increase (i.e., successful selection). At lower loads, irrelevant sounds are processed regardless of task instructions possibly due to availability of attentional resources or capacity (Lavie, 1995, 2005, 2010; Duncan et al., 1997; Lavie & Tsal, 1994) or reduced inhibition (LaBerge, 1995, 2002).
Our findings are consistent with imaging studies of perceptual demand manipulations in attention paradigms in the visual modality (Schwartz et al., 2005; Yi et al., 2004; Berman & Colby, 2002; O'Connor et al., 2002; Vuilleumier et al., 2001; Rees et al., 1997). The study was not designed to resolve the controversy between competing theories in accounting for the effects of perceptual demand, namely limited resources versus inhibition. According to the load theory, in conditions of high perceptual load, the processing of irrelevant information is gated at an early sensory-perceptual stage due to limited perceptual resources, in line with early selection accounts (Treisman, 1969; Broadbent, 1958). In conditions of low perceptual load, however, available resources are thought to automatically “spill over” toward processing of irrelevant information until available resources are exhausted (Lavie et al., 2004; Lavie, 1995; Lavie & Tsal, 1994), requiring an additional late selection mechanism (Lavie, 2005, 2010; Yi et al., 2004). According to the inhibition account, inhibition weakens as attention for targets decreases due to low perceptual demands, resulting in greater processing of task-irrelevant information.
The pattern of results reported here might be specific to the type of task employed (perceptual). It has been demonstrated in the visual modality that cognitive control demand (e.g., working memory task) modulates processing of task-irrelevant stimuli in the opposite direction than perceptual demand (Lavie, 2005; Lavie & De Fockert, 2005; Yi et al., 2004; de Fockert, Rees, Frith, & Lavie, 2001). High load on cognitive control was associated with failure of selection. Future studies are needed to investigate the effects of cognitive control demand on processing task-irrelevant information in the auditory modality.
We thank Suzanne Pendl for assistance with data collection, Doug Ward for statistical assistance, and two anonymous reviewers for their valuable comments and suggestions. This work was supported by the National Institute on Deafness and Other Communication Disorders (R03 DC008399; R01 DC006287) and the Clinical and Translational Science Award (CTSA) program of the National Center for Research Resources (UL1RR031973).
Reprint requests should be sent to Dr. Merav Sabri, Department of Neurology, Medical College of Wisconsin, 8701 Watertown Plank Road, Milwaukee, WI 53226, or via e-mail: firstname.lastname@example.org.