Abstract
Understanding the contribution of cognitive processes and their underlying neurophysiological signals to behavioral phenomena has been a key objective in recent neuroscience research. Using a diffusion model framework, we investigated to what extent well-established correlates of spatial attention in the electroencephalogram contribute to behavioral performance in an auditory free-field sound localization task. Younger and older participants were instructed to indicate the horizontal position of a predefined target among three simultaneously presented distractors. The central question of interest was whether posterior alpha lateralization and amplitudes of the anterior contralateral N2 subcomponent (N2ac) predict sound localization performance (accuracy, mean RT) and/or diffusion model parameters (drift rate, boundary separation, non-decision time). Two age groups were compared to explore whether, in older adults (who struggle with multispeaker environments), the brain–behavior relationship would differ from younger adults. Regression analyses revealed that N2ac amplitudes predicted drift rate and accuracy, whereas alpha lateralization was not related to behavioral or diffusion modeling parameters. This was true irrespective of age. The results indicate that a more efficient attentional filtering and selection of information within an auditory scene, reflected by increased N2ac amplitudes, was associated with a higher speed of information uptake (drift rate) and better localization performance (accuracy), while the underlying response criteria (threshold separation), mean RTs, and non-decisional processes remained unaffected. The lack of a behavioral correlate of poststimulus alpha power lateralization constrasts with the well-established notion that prestimulus alpha power reflects a functionally relevant attentional mechanism. This highlights the importance of distinguishing anticipatory from poststimulus alpha power modulations.
INTRODUCTION
When multiple sources of acoustic information are simultaneously present, selective filtering of the available information is necessary to, for instance, focus on a talker of interest while ignoring traffic noise, music playing in the background, or other peoples' conversations. This capacity of the human auditory system is especially astonishing, given that the incoming auditory signals often overlap in time, space, or spectral content. The behavioral effects of such selective orienting of attention in noisy, multispeaker environments, usually referred to as “cocktail party scenarios” (Cherry, 1953), have been studied for decades (for a review, see Bronkhorst, 2015). However, the contribution of neural signals to observable behavioral performance and its underlying cognitive processes is still poorly understood. Here, we investigated the relationship between well-established correlates of spatial attention in the electroencephalogram (EEG) and behavioral performance in an auditory sound localization task. In particular, we specified the role of modulations in the alpha frequency band as well as an anterior contralateral N2 subcomponent (N2ac; Gamble & Luck, 2011) with respect to sound localization performance.
Lateralized modulations of alpha power amplitude have been shown to reflect the orienting of spatial attention in visual (Foster, Sutterer, Serences, Vogel, & Awh, 2017; Ikkai, Dandekar, & Curtis, 2016; Rihs, Michel, & Thut, 2007; Worden, Foxe, Wang, & Simpson, 2000), tactile (Haegens, Luther, & Jensen, 2012; Haegens, Händel, & Jensen, 2011), and auditory space (Klatt, Getzmann, Wascher, & Schneider, 2018b; Wöstmann, Vosskuhl, Obleser, & Herrmann, 2018; Wöstmann, Herrmann, Maess, & Obleser, 2016). Typically, alpha power is shown to decrease contralaterally to the attended location (Kelly, Gomez-Ramirez, & Foxe, 2009; Sauseng et al., 2005) or to increase contralaterally to the unattended or ignored location (Kelly, Lalor, Reilly, & Foxe, 2006; Worden et al., 2000). Consistently across different modalities, this lateralized pattern of alpha-band activity has been shown to be linked to visual detection performance (Händel, Haarmeier, & Jensen, 2011; van Dijk, Schoffelen, Oostenveld, & Jensen, 2008; Thut, Nietzel, Brandt, & Pascual-Leone, 2006), tactile discrimination acuity (Craddock, Poliakoff, El-deredy, Klepousniotou, & Lloyd, 2017; Haegens et al., 2011), and listening performance (Tune, Wöstmann, & Obleser, 2018; Wöstmann et al., 2016). Going beyond a mere correlational approach, recent studies applying stimulation techniques, such as TMS or continuous transcranial alternating current stimulation, suggest a causal role of alpha oscillations in the processing of incoming information (Wöstmann et al., 2018; Romei, Gross, & Thut, 2010). Two major (not necessarily mutually exclusive) mechanisms have been proposed to underlie those asymmetric modulations of alpha power oscillations: target enhancement (Noonan et al., 2016; Yamagishi, Goda, Callan, Anderson, & Kawato, 2005) and distractor inhibition (Schneider, Göddertz, Haase, Hickey, & Wascher, 2019; Rihs et al., 2007; Kelly et al., 2006; Worden et al., 2000). Although the majority of previous studies investigated prestimulus alpha oscillations as an index of anticipatory allocation of spatial attention in young adults, we focused on poststimulus alpha lateralization in a sound localization task, simulating a “cocktail party scenario.” Such an experimental setup more closely resembles frequent real-life situations, in which a person searches for a sound of interest (e.g., a voice or a ringing phone) without knowing in advance where to look for it. In fact, there is first evidence that distinct attentional mechanisms contribute to the preparation for as opposed to the ongoing processing of a stimulus (van Ede, Szebényi, & Maris, 2014). In addition, we explore whether the proposed mechanistic function of alpha oscillations extends to samples of older participants, which remains an ongoing matter of debate (Tune et al., 2018; Mok, Myers, Wallis, & Nobre, 2016; Hong, Sun, Bengson, Mangun, & Tong, 2015; Vaden, Hutcheson, McCollum, Kentros, & Visscher, 2012).
A second neural measure of interest, indicating the allocation of attention within an auditory scene, is the N2ac. The N2ac has been shown to be evoked in the N2 latency range (starting at around 200 msec) when detecting or localizing a target sound in the presence of one or multiple distractor stimuli, using artificial sounds (Gamble & Luck, 2011), animal vocalizations (Klatt, Getzmann, Wascher, & Schneider, 2018a; Lewald & Getzmann, 2015), or spoken numerals (Lewald, Hanenberg, & Getzmann, 2016). Although the N2ac was originally suggested to reflect the allocation of selective attention to the target (Gamble & Luck, 2011), analogously to the visual posterior contralateral N2 subcomponent (N2pc; Eimer, 1996; Luck & Hillyard, 1994), its functional significance remains ambiguous. Here, we aimed to provide further evidence on the functional significance of the N2ac by investigating its relationship to sound localization performance.
In this study, the diffusion modeling approach (Ratcliff, 1978) was applied, allowing for a more detailed understanding of behavioral patterns in discrimination tasks (for recent reviews, see Voss, Nagler, & Lerche, 2013; Ratcliff & McKoon, 2008). Although diffusion models are still only rarely used in cognitive neuroscience research (see, e.g., Schubert, Nunez, Hagemann, & Vandekerckhove, 2019; Nunez, Vandekerckhove, & Srinivasan, 2017; Schubert, Hagemann, Voss, Schankin, & Bergmann, 2015; Ratcliff, Philiastides, & Sajda, 2009; Philiastides, Ratcliff, & Sajda, 2006), the interest in and the application of this methodological approach has increased considerably during the past decade. The general purpose of diffusion models is to decompose the cognitive processes underlying a binary decision. As one of the major advantages of the diffusion model, the estimation procession is not limited to single mean or median values but takes the whole RT distribution into account. Specifically, the resulting separation of processing components offers an enormous potential to provide more detailed descriptions of cognitive processes and to generate more accurate predictions for behavioral and neurophysiological data (Turner, Rodriguez, Norcia, McClure, & Steyvers, 2016; Ratcliff & McKoon, 2008).
The diffusion model assumes that, in order for a decision to be made and a reaction to be executed, evidence for either response is accumulated in the course of a noisy process until it reaches either the decision boundary of response A or response B (see Figure 2 in Voss et al., 2013 for an illustration of this evidence accumulation process). The basic diffusion model includes the following parameters: The drift rate v describes the speed at which evidence is accumulated (or “the rate of accumulation of information”; Ratcliff & McKoon, 2008, p. 3), with higher drift rates resulting in shorter RTs and fewer errors. Threshold separation a indicates the amount of information considered until a decision is made. That is, conservative response criteria that are associated with slower but more accurate responses result in large estimates of a, whereas more liberal response criteria result in smaller estimates of a. Threshold separation and drift rate have been shown to be negatively correlated due to the fact that individuals with higher drift rates tend to allow more liberal response criteria (i.e., smaller threshold separation values; Schmiedek, Oberauer, Wilhelm, Süß, & Wittmann, 2007). A priori biases toward one of the decision thresholds are reflected by starting point z. Beyond that, the model also includes non-decisional processing, such as response execution, working memory access, or stimulus encoding. The latter is indicated by the non-decision time constant t0. Typically, older adults show a slowing in this decision-unrelated domain (Ratcliff, Thapar, Gomez, & McKoon, 2004; Ratcliff, Thapar, & McKoon, 2001). Finally, trial-to-trial variability in drift rate (sv), non-decision time (st0), starting point (sz), and the proportion of contaminated trials (pdiff; e.g., underlying non-diffusion-like processes) can be accounted for.
In summary, here we aimed at characterizing the relation between electrophysiological correlates of attentional orienting within a complex auditory scene (i.e., alpha lateralization and N2ac) and sound localization performance, which was assessed by classical RT and accuracy measures as well as by diffusion modeling parameters. We hypothesized that, if the cognitive processes reflected by alpha power modulations and N2ac amplitudes contribute to the successful selection of the target from a sound array containing simultaneously present distractors, they should in turn contribute to the information accumulation process that results in the localization of the target. Hence, alpha power modulations and N2ac amplitudes should predict drift rate (i.e., the speed of information accumulation) and, in turn, RT and accuracy.
The data analyzed here were taken from a separate study on effects of auditory training on cocktail party listening performance in younger and older adults (Hanenberg, Getzmann, & Lewald, unpublished). Exclusively pretraining data of this study were used. The sample analyzed here included both age groups. Although we did not primarily aim at the investigation of age effects, age differences with respect to sound localization performance, alpha lateralization and N2ac, as well as the relation between these electrophysiological correlates of attentional orienting and sound localization performance were considered. Irrespective of the expected age-related decline, we proposed the latter brain–behavior relationship to be true for both age groups.
METHODS
Participants
The original sample included 28 older adults and 24 younger adults. Data for three younger participants were discarded because of technical problems with the EEG recording. In addition, two older participants were excluded from analysis because their performance was below (14% correct) or very close to (30%) chance level (25%). Consequently, the final sample included 26 older adults (mean age = 69 years, range = 56–76 years, 13 women) and 21 younger adults (mean age = 24 years, range = 19–29 years, 11 women). All participants were right-handed as assessed by the Edinburgh Handedness Inventory (Oldfield, 1971).
An audiometry, including 11 pure-tone frequencies (0.125–8 kHz; Oscilla USB100, Inmedico) was conducted. Hearing thresholds in the speech frequency range (<4 kHz) indicated normal hearing (≤25 dB) for all younger participants and mild impairments for older participants (≤40 dB). The study was conducted in accordance with the Declaration of Helsinki and was approved by the Ethical Committee of the Leibniz Research Centre for Working Environment and Human Factors. All participants gave their written informed consent for participation.
Experimental Setup, Procedure, and Stimuli
The original study, in which data were collected (Hanenberg et al., unpublished), comprised three training sessions on 3 days, with three experimental blocks per session (15 min pretraining, 15 min posttraining, 1 hr posttraining) and with intervals of 1–3 weeks between sessions. For the present reanalysis, exclusively the data obtained in the pretraining blocks, pooled across the three sessions, were used. The experiment was conducted in a dimly lit, echo-reduced, sound-proof room. Participants were seated in a comfortable chair that was positioned with equal distances to the left, right, and front wall of the room. Participants' head position was stabilized by a chin rest. A semicircular array of nine broadband loudspeakers (SC5.9; Visaton; housing volume 340 cm3) was mounted in front of the participant at a distance of 1.5 m from the participant's head. Only four loudspeakers, located at azimuthal positions of −60°, −20°, 20°, and 60°, were used for the experimental setup of this study. A red light-emitting diode (diameter = 3 mm, luminous intensity = 0.025 mcd) was attached right below the central loudspeaker in the median plane of the participant's head at eye level. The light-emitting diode was continuously on and served as a central fixation point.
The sound localization task applied in this study was a modification of the multiple-sources approach that has been used in several previous studies on auditory selective spatial attention in “cocktail party scenarios” (Lewald, 2016, 2019; Lewald & Getzmann, 2015; Zündorf, Karnath, & Lewald, 2011, 2014; Zündorf, Lewald, & Karnath, 2013). Details of the present task version have been previously described (Lewald et al., 2016). Briefly, participants indicated the position of a predefined target numeral that was presented simultaneously with three distractor numerals. The target was kept constant for each participant and was counterbalanced across participants and age groups such that each numeral served as a target an equal number of times within the overall experiment. Four 1-syllable numerals (“eins,” 1; “vier,” 4; “acht,” 8; “zehn,” 10), spoken by two male (mean pitch = 141 Hz) and two female (mean pitch = 189 Hz) native German speakers, served as sound stimuli (Lewald et al., 2016). All numerals were presented equally often at each of the four possible loudspeaker positions (located at −60°, −20°, 20°, 60° azimuth). Numerals presented in each trial were spoken by four different speakers. The overall sound pressure level of the sound arrays was 66 dB(A), as measured at the position of the participant's head using a sound-level meter with a 0.5-in. free-field measuring microphone (Types 2226 and 4175, Brüel & Kjær).
The target was present in each trial, with target position, distractor positions, and speakers varying between trials following a fixed pseudorandom order. The stimulus duration was 600 msec, followed by a response period of 2 sec and an intertrial interval of 525 msec, resulting in a total trial duration of 3.125 sec. The response was given by pressing one out of four response buttons with the index finger of the right hand. The response buttons were arranged in semicircular array, related to the four possible target locations (i.e., far left, inner left, inner right, far right). Each block consisted of 288 trials, resulting in a duration of 15 min per block. As already mentioned above, data from three blocks, assessed on different days, were pooled. Thus, there was a total of 864 trials per participant. On each of the 3 days, participants completed a short training sequence of 10 trials before the experiment to familiarize themselves with the task.
EEG Recording and Preprocessing
The continuous EEG was recorded from 58 passive Ag/AgCl electrodes at a sampling rate of 1 kHz using a QuickAmp-72 amplifier (Brain Products). The electrode montage was arranged according to the international 10/10 system. Two electrodes were positioned on the left and right mastoid, respectively. In addition, two horizontal and two vertical EOG electrodes were placed below and above the right eye and at the outer canthi of the left and right eye, respectively. The ground electrode was positioned right above the nasion, in the center of the forehead. The average of all electrodes served as an online reference. Electrode impedances were kept below 10 kΩ.
Offline preprocessing of the data was conducted using the open-source toolbox EEGLAB (v14.1.2b; Delorme & Makeig, 2004) for MATLAB (R2018a). The continuous EEG data were high-pass filtered at 0.5 Hz (6601-point FIR filter, bandwidth = 0.5 Hz, cutoff frequency = 0.25 Hz) and low-pass filtered at 30 Hz (441-point FIR filter, transition bandwidth = 7.5 Hz, cutoff frequency = 33.75 Hz). Using the automated channel rejection procedure implemented in EEGLAB, channels with a normalized kurtosis greater than 5 SDs of the mean were rejected. The data were re-referenced to the average of all remaining EEG electrodes (including two mastoid electrodes) and segmented into epochs ranging from −1000 to 3125 msec, relative to sound array onset. For epoched data, the 200 msec time window before sound array onset served as a baseline. Independent component analysis was run on a subset of the original data, downsampled to 200 Hz and containing only every second trial. The derived independent component (IC) decomposition was then projected onto the original data set with a 1-kHz sampling rate comprising all trials. Using the DIPFIT plugin of the EEGLAB toolbox, a single equivalent current dipole model was computed for each of the IC scalp maps by means of a spherical head model (Kavanagh, Darcey, Lehmann, & Fender, 1978). Artifactual ICs were identified and excluded in two subsequent steps: The automated algorithm ADJUST (Mognon, Jovicich, Bruzzone, & Buiatti, 2011) was applied to identify and reject components related to blinks, eye movements, and generic discontinuities. In addition, because artifactual ICs usually do not resemble the projection of a single dipole (Onton & Makeig, 2006), all components with a residual variance exceeding 40% of the dipole solution were rejected. The resulting IC solution was visually inspected for any additional artifactual components that were not detected by this automated rejection procedure. On average, 28 ICs (out of 51–58 ICs) were rejected for each participant (range = 12–40). Finally, the automated artifact rejection procedure implemented in EEGLAB (threshold limit = 1000 μV, probability threshold = 5 SDs) was performed. On average, the procedure rejected 207 trials (range = 92–317), that is, 23% of trials (range = 10.6–36.6%). Only trials with correct responses were submitted to further analyses of EEG data (cf. ERP Analysis section and Time–Frequency Data section). Data from channels, which were originally rejected, were reconstructed using EEGLAB's spherical interpolation procedure.
Data Analyses
Behavioral Data
Behavioral performance was quantified by means of mean RTs and mean accuracy (proportion of correct trials), as well as diffusion model parameters. Proportion of correct trials included only responses that were given within the maximum response period of 2 sec. A total of 83 trials (i.e., on average two trials per participant, range = 0–16, median = 1) were rejected as incorrect due to missing responses and responses that exceeded the maximum response period.
The diffusion modeling framework was applied to the present auditory localization task, in which participants were instructed to localize the position of a given target within a four-sound array of one-syllable spoken numerals. Because the diffusion model is originally based on a two-choice decision task, the decision process is here assumed to present a continuous accumulation of evidence for the true target location relative to the three nontarget locations. Previous applications of the diffusion model have shown that it can validly describe decision-making in four-choice alternative response tasks (Schubert et al., 2015, 2019). To eliminate outliers that could bias model results (Voss, Voss, & Lerche, 2015) extremely fast (<150 msec) and extremely slow (>3000 msec) RTs were discarded. Subsequently, data were log-transformed and z-standardized to exclude all trials with RTs exceeding ± 3 SDs of the mean for each individual participant.
The free software fast-dm (Voss & Voss, 2007) was used to fit a diffusion model to the RT distributions of the present data. The model parameters were estimated based on an iterative permutation process using the Kolmogrov-Smirnov test statistic. The starting point z was set to 0.5, presuming that participants were not biased toward one of the two response categories (correct target location vs. distractor locations). The parameters a, v, and t0 were allowed to vary freely. In addition, parameters sv and st0 were estimated because they led to a notable improvement of model fit. Trial-to-trial variability of starting point (sz), the difference in speed of response execution (d), as well as the measure for the percentage of contaminants (pdiff) were set to 0. To graphically evaluate model fit, we plotted observed versus predicted accuracy as well as observed versus predicted values of the RT distribution for the first (.25), second (.50), and third (.75) quartile. Predicted parameter values were derived using the construct-samples tool of fast-dm (Voss & Voss, 2007). That is, 500 data sets were generated for each participant based on each individual's empirical parameter values and number of trials. Finally, the mean quartile values and mean response accuracy were calculated for each participant. Pearson correlations were calculated to quantify the relationship between empirical data and model predictions for both age groups. If the majority of data points lie close to the line of perfect correlation, good model fit can be assumed.
ERP Analysis
To investigate the N2ac component (Gamble & Luck, 2011), we computed the mean contralateral and ipsilateral ERP amplitude at frontocentral electrodes FC3/4 for older adults and FC5/6 for younger adults. The contralateral portion comprised the average signal at left hemispheric electrodes in right target trials and right hemispheric electrodes in left target trials, whereas the ipsilateral portion included the average signal at left hemispheric electrodes in left target trials and right hemispheric electrodes in right target trials. Mean amplitude was measured from 477 to 577 msec relative to sound array onset. The measurement window was based on a 100-msec time window set around the 50% fractional area latency (FAL; Luck, 2014; Hansen & Hillyard, 1980) in the grand-averaged contralateral minus ipsilateral difference curve averaged across age groups and electrodes (50% FAL = 527 msec). To determine the FAL, the area under the difference curve was measured in a broad time window ranging from 200 to 800 msec relative to sound array onset. The latency at which this area is divided in two equal halves denotes the 50% FAL. We determined a common analysis time window for both age groups because a prior control analysis did not reveal any significant differences between the 50% FAL for younger (M = 525.86 msec) and older adults (M = 517.50 msec), Z = 0.26, p = .80, U3 = 0.48. The respective electrodes of interest (i.e., FC3/4 and FC5/6) were chosen to include the scalp sites with the most pronounced asymmetry (i.e., peak asymmetry in the age-specific grand-averaged waveform) for each age group. This age-specific mean amplitude was measured in the time window specified above.
Time–Frequency Data
Multiple Regression
To investigate to what extent alpha lateralization and N2ac amplitudes predict behavior in the given auditory localization task, we applied regression analyses. Separate multiple linear regression models were evaluated for mean RT, drift rate v, threshold separation a, and non-decision time t0 as response variables, using the fitlm function implemented in the MATLAB Statistics and Machine Learning toolbox (R2018a). To account for the fact that accuracy proportions range inbetween 0 and 1, a beta regression was calculated for accuracy as a response variable using the R betareg package by Cribari-Neto and Zeileis (2010). For all five regression analyses, N2ac amplitudes, ALI, and age group served as predictors. In addition, to assess whether the relationship between electrophysiological correlates and behavioral outcomes differed between age groups, two interaction terms were also included (i.e., age:N2ac, age:ALI). Effects coding was used as a contrast scheme for the age group variable to enable a proper interpretation of lower and higher order effects. Model assumptions were verified by examination of residuals plots: Pearson residuals were plotted against fitted values and against predictor variables to assess nonconstant error variance (heteroscedasticity) and deviations from linearity, respectively. In addition, normal probability plots were examined to evaluate normality of residuals. In case of a nonsignificant Durbin–Watson test, returning a test statistic close to 2, residuals were assumed to be uncorrelated. Variation inflation factors were inspected for signs of multicollinearity. Finally, to check for influential cases, leverage and cook's distance were examined. Values exceeding 1 for cook's distance (Cook & Weisberg, 1982) or 3 × (with k indicating the number of predictors and n indicating the sample size) for leverage (Pituch & Stevens, 2016) were set as cutoffs for further inspection. The inspection of residuals plots indicated deviations from normality for the drift rate regression model. Refitting the model with a log transformation (to base 10) of the drift rate values (v + 1; a constant was added to avoid negative values) resulted in approximately normally distributed residuals. Thus, ordinary least square regression was applied. For the models regarding threshold separation, non-decision time, and RT, the residual probability plots indicated some outliers. To reduce outlier effects, we fitted a robust regression model, using an iterative reweighted least squares procedure and a bisquare weight function. Adjusted R-squared (denoted as R2) is reported as a goodness-of-fit statistic. To correct for the fact that we conducted separate multiple regression analyses for each of the five dependent variables, p values for regression coefficients were corrected using a Bonferoni–Holm procedure (Holm, 1979). Note that in each case the five p values belonging to the same type of estimate (i.e., intercept, N2ac fixed effect, ALI fixed effect, age fixed effect, N2ac:age interaction term, or ALI:age interaction term) were corrected for multiple testing. To visualize the relationship between single predictors and outcomes, marginal effects plots (ggeffect function from ggeffects package; Lüdecke, 2018) and adjusted response functions (plotInteraction and plotAdjustedResponse functions) were used for the beta regression model (in R) and linear regression models (in MATLAB), respectively. Adjusted response functions describe the relationship between the fitted response and a specific predictor, whereas the other predictors are averaged out by averaging the fitted values over the data used in the fit. Adjusted response values are computed by adding the residual to the adjusted fitted value for each observation (The MathWorks, 2019). When plotting marginal effects using “ggeffect,” the other factors are held constant at an average value (Lüdecke, 2018).
Statistical Tests and Effect Sizes
Data were considered normally distributed if the Lilliefors test (Lilliefors, 1967) yielded insignificant results (p > .05). For normally distributed data, parametric two-sample Welch's t tests were applied. Degrees of freedom were estimated using Satterthwaite's approximation, assuming unequal variances. Wilcoxon rank-sum test served as the nonparametric counterpart in case of nonnormality. To test for significance within an age group, a parametric one-sample t test or the nonparametric Wilcoxon signed-rank test was applied. Measures of effect sizes were calculated using the MES toolbox provided by Hentschke and Stüttgen (2011). For parametric one- and two-sample t tests, g1 and Hedge's g (in the following referred to as g) are reported, respectively. For both measures, effect sizes of ±0.2 are typically referred to as small, values of ±0.5 as medium, and values of ±0.8 as large. For nonparametric t tests, Cohen's U3 is reported. Cohen's U3 is a measure of overlap of two distributions, with 0.5 indicating minimal overlap and 0 or 1 indicating maximal overlap. The significance of effects was assessed at a significance level of α = .05. The Bonferroni–Holm correction procedure was applied to correct for multiple comparisons when appropriate (Holm, 1979). Adjusted p values are denoted as padj.
Given that p values from standard inferential statistics do not allow any conclusions on whether or not the null hypothesis is true, we additionally report the Bayes factor (BF) to strengthen the interpretability of effects in this study. In essence, the BF provides a “continuous” measure, which indicates how much more likely the observed results are under a given hypothesis, compared with an alternative hypothesis (for an introduction to Bayesian statistics, see Quintana & Williams, 2018; Wagenmakers et al., 2018). A BF of 1 indicates that the results are equally likely under both hypotheses (i.e., the null and the alternative hypothesis). A BF < 1 provides increasing evidence in favor of the null hypothesis relative to the alternative hypothesis, whereas a BF > 1 provides increasing evidence favoring the alternative hypothesis over the null hypothesis (Dienes, 2014). To facilitate the interpretation of BFs, the classification scheme originally proposed by Jeffreys (1961) is applied: The latter suggests that a BF > 3 and > 10 provide moderate and strong evidence for the alternative hypothesis, respectively, whereas a BF < 0.33 or < 0.1 suggests moderate and strong evidence in favor of the null hypothesis, respectively. Finally, BFs between 0.33 and 3 are interpreted in terms of anecdotal evidence. However, it should be noted that those cutoffs have no absolute meaning (Dienes, 2014) in that evidence is continuous and it is directly interpretable in terms of an odds ratio (Quintana & Williams, 2018). The notation BF10 indicates the Bayes factor for the alternative hypothesis (i.e., that the means of the samples are different). BF functions implemented in MATLAB by Krekelberg (2019) and the BayesFactor package implemented in R (function: linearReg.R2stat) by Morey and Rouder (2018) were used to calculate BFs for t tests and regression, respectively. To obtain a BF for a specific coefficient in our regression model (BFcoef), the BF for the full model and the restricted model were compared according to the following formula: BFfull/BFrestr. BFfull indicates the BF for the full model, including all predictors, whereas BFrestr indicates the BF for the restricted model, omitting the coefficient of interest. Default priors, that is, the Jeffrey–Zellner–Siow Prior for t tests and a mixture of g-priors according to Liang, Paulo, Molina, Clyde, and Berger (2008) for regression, were applied. Because those packages do not support the calculation of BFs for beta regression, no Bayesian statistics are provided for the regression analysis of accuracy data.
RESULTS
Behavioral Results
Figure 1 shows the proportion of correct responses (Figure 1A) as well as mean RTs (Figure 1B) separately for both age groups. Diffusion parameters are depicted in Figure 2. On average, younger adults showed higher accuracy (t(43.05) = −3.36, p = .002, padj = .01, g = 0.92, BF10 = 14.21) and faster responses than older adults (t(38.56) = 2.80, p = .008, padj = .038, g = −0.83, BF10 = 6.93). The BFs indicated that the alternative model was around 14 times and six times more likely than the null model, respectively, thus providing strong and moderate support for a difference between age groups.
Although mean RTs do not offer any insights into the underlying causes of prolonged RTs, diffusion parameters allow for a closer look at different possible explanations for the observed difference between age groups, including a slowdown of information update (i.e., higher drift rate v), a more conservative response criterion (i.e., higher threshold separation a), or delayed response execution (i.e., higher response constant t0). In our sample, older adults showed a significantly reduced drift rate (t(44.89) = −2.51, p = .016, padj = .047, g = 0.70, BF10 = 3.01), higher non-decision time (t(41.31) = 2.81, p = .008, padj = .038, g = −0.82, BF10 = 6.59), as well as higher variability of non-decision time (t(40.26) = 5.25, p < .001, padj < .001, g = −1.43, BF10 = 153.9). Threshold separation values (t(44.24) = −0.66, p = .513, padj = .513, g = 0.19, BF10 = 0.35) and trial-to-trial variability of drift rate (Z = −1.21, p = .226, padj = .453, U3 = 0.29, BF10 = 0.35) did not differ significantly between age groups. Although the BFs supported classical inferential statistics for significant results (BFs > 3), for insignificant results they fell short of the criterion for moderate evidence for equivalence (BFs > 0.33). To graphically assess the fit of the estimated diffusion models, observed RT quartiles (.25, .5, .75) and observed accuracy were plotted against the corresponding value of the predicted distributions. As can be seen in Figure 3, the majority of data points lie close to the line of perfect correlation, indicating adequate model fit.
N2 Anterior Contralateral Component
Figure 4 presents the ERPs at frontocentral electrodes FC3/4 for older adults and electrodes FC5/6 for younger adults. In addition, the corresponding topographies based on the contralateral minus ipsilateral difference wave in the analysis time window are depicted. N2ac amplitudes (i.e., contralateral minus ipsilateral differences) did not differ significantly between younger (M = −0.37, SD = 0.47) and older adults (M = −0.38, SD = 0.40, t(39.37) = −0.09, p = .926, g = 0.03, BF10 = 0.29). The BF of 0.29 can be interpreted as insufficient evidence, supporting neither the null nor the alternative hypothesis. A one-sample t test confirmed that across both age groups, N2ac amplitudes were significantly different from zero (t(46) = −6.02, p < .001, padj < .001, g1 = −0.88, BF10 > 1000). However, it should be noted that the original analysis time window was based on the 50% FAL in the grand-averaged difference waveform across both age groups; thus, this procedure favors a significant result when testing overall N2ac amplitudes against zero. To avoid this problem of “double dipping,” we performed a second one-sample t test, using a broader analysis time window of 400–600 msec post sound array onset. The latter yielded comparable results (t(46) = −4.41, p < .001, padj < .001, g1 = −0.64, BF10 > 1000). Consistently, the BF provided strong evidence in favor of the presence of an N2ac component across both age groups.
Alpha Lateralization
The time–frequency plots in Figure 5 illustrate the asymmetric modulation of alpha power (8–12 Hz) at electrodes PO7/8 time-locked to sound array onset for younger (Figure 5A) and older adults (Figure 5B), respectively. In addition, the corresponding topographies based on the normalized ipsilateral minus contralateral difference in alpha power are depicted. Although younger adults appeared to show larger alpha power lateralization than older adults, the analysis revealed no significant difference in alpha power lateralization between age groups (t(41.23) = −1.43, p = .161, g = 0.42, BF10 = 1.13). The BF suggested that the data were insensitive to distinguish the null (no amplitude difference between groups) from the alternative hypothesis (difference in amplitudes between age groups). Yet, a one-sample t test confirmed that alpha lateralization across both age groups was significantly different from zero (t(46) = 6.07, p < .001, padj < .001, g1 = 0.89, BF10 > 1000), and the BF consistently suggested strong evidence for the alternative hypothesis. As mentioned above (cf. N2 Anterior Contralateral Component section), the analysis time window (determined based on the 50% FAL in the grand-averaged waveform) favors a significant result when testing across age groups, against zero. Thus, a second one-sample t test was performed, based on a broader analysis time window of 600–900 msec post sound array onset, yielding comparable results (t(46) = 5.91, p < .001, padj < .001, g1 = 0.86, BF10 > 1000).
Regression Analyses
We examined the relationship between mean alpha power lateralization, N2ac amplitudes, and behavioral performance (including diffusion model parameters) using multiple linear regression. The estimated parameters are provided in Table 1. Participants with greater N2ac amplitudes showed higher accuracy (Z = −3.93, p < .001, padj < .001) and higher drift rate (t(41) = −2.79, p = .008, padj = .032, BFcoef = 7.75), whereas there was no significant effect of alpha lateralization on those performance outcomes (accuracy: Z = −1.54, p = .124, padj = .499; drift rate: t(41) = −0.37, p = .712, padj = 1.067, BFcoef = 0.43). For both models, there was no significant interaction with age (all padj > .160). The corresponding BFs (only available for the drift rate model; cf. Statistical Tests and Effect Sizes section) were below 3 (BFcoef ≤ 0.65) but above 0.33, thus lending insufficient evidence for the null or the alternative hypotheses. The full models, including all predictors, explained 26% and 36% of variance in drift rate (R2adj = .26, F(5, 41) = 4.15, p = .004) and accuracy (pseudo-R2 = .36, precision parameter phi = 9.73, SE = 1.96, z = 4.97, Pr(>|z|) < .001), respectively. For all other models tested, neither N2ac amplitudes nor alpha power lateralization or their interaction with age groups served as statistically significant predictors (all padj > .095; cf. Table 1). For all but one parameter, the corresponding BFs were inconclusive (3 < BFcoef > 0.33), providing no substantial support for the alternative hypothesis, but neither for the null hypothesis. However, for the regression model predicting non-decision time, the BF for the interaction term N2ac*Age (p = .095) lend moderate evidence in favor of the alternative hypothesis (BFcoef = 5.92), suggesting that, in older adults, less pronounced N2ac amplitudes were associated with higher non-decision times. In contrast, the latter relationship appeared absent in younger adults. Age group, not surprisingly, significantly predicted non-decision time (t(41) = 3.00, p = .005, padj = .018, BFcoef = 15.27), accuracy (Z = 3.03, p = .002, padj = .012), and drift rate (t(41) = −2.86, p = .007, padj = .020, BFcoef = 8.78). Although age group failed to serve as a significant predictor for RT in the regression model framework (t(41) = 1.82, p = .075, padj = .151, BFcoef = 2.38), the results largely confirm the behavioral age differences reported in the Behavioral Results section. The BF of 2.38 suggests that the data may simply be underpowered to reveal a relation between RT and age group in the present regression model. Figure 6 visualizes the reported results for those outcomes that were significantly predicted by N2ac amplitudes.
Outcome . | v . | a . | t0 . | Accuracy . | RT . | |||||
---|---|---|---|---|---|---|---|---|---|---|
Predictors . | b (SE) [95% CI] . | t . | b (SE) [95% CI] . | t . | b (SE) [95% CI] . | t . | b (SE) . | z . | b (SE) [95% CI] . | t . |
Intercept | 0.22***(0.05) [0.12 0.31] | 4.43, p < .001 padj < .001 | 1.55***(0.09) [1.36 1.74] | 16.47, p < .001 padj < .001 | 0.77***(0.04) [0.70 0.86] | 19.77, p < .001 padj < .001 | 0.89***(0.16) | 5.63, p < .001 padj < .001 | 1.24***(0.04) [1.15 1.33] | 28.95, p < .001 padj < .001 |
N2ac | −0.22*(0.08) [−0.38 −0.06] | −2.79, p = .008 padj = .032 | −0.32 (0.15) [−0.62 −0.01] | −2.10, p = .042 padj = .126 | 0.08 (0.06) [−0.05 0.21] | 1.29, p = 0.203 padj = .407 | −1.02***(0.26) | −3.93, p < .001 padj < .001 | 0.08 (0.07) [−0.06 0.22] | 1.14, p = .261 padj = .407 |
ALI | −2.52 (6.77) [−16.20 11.16] | −0.37, p = .712 padj = 1.067 | −10.87 (13.05) [−37.22 15.49] | −0.83, p = .410 padj = 1.230 | 10.18 (5.45) [−0.83 21.19] | 1.87, p = .069 padj = .345 | −33.48 (21.81) | −1.54, p = .124 padj = .499 | 3.73 (5.94) [−8.27 15.74] | 0.63, p = .533 padj = 1.230 |
Age | −0.14*(0.05) [−0.24 −0.04] | −2.86, p = .007 padj = .020 | −0.07 (0.09) [−0.26 0.12] | −0.76, p = .452 padj = .452 | 0.12*(0.04) [0.03 0.20] | 3.00, p = .005 padj = .018 | −0.48*(0.16) | 3.03, p = .002 padj = .012 | 0.08 (0.04) [−0.01 0.16] | 1.82, p = .075 padj = .151 |
N2ac*Age | −0.16 (0.08) [−0.32 0.00] | −2.02, p = .050 padj = .160 | −0.11 (0.15) [−0.41 0.20] | −0.72, p = .474 padj = .474 | 0.15 (0.06) [0.03 0.28] | 2.44, p = .018 padj = .095 | −0.53 (0.26) | −2.05, p = .039 padj = .160 | 0.09 (0.04) [−0.04 0.24] | 1.43, p = .160 padj = .319 |
ALI*Age | −4.67 (6.78) [−18.35 9.02] | −0.69, p = .495 padj = .965 | −10.06 (13.05) [−36.41 16.30] | −0.77, p = .445 padj = 1.336 | 11.23 (5.45) [0.22 22.24] | 2.06, p = .046 padj = .229 | −15.30 (21.78) | −0.70, p = .483 padj = 1.336 | 9.19 (5.94) [−2.81 21.19] | 1.55, p = .130 padj = .519 |
Adjusted/pseudo-R2 | .26 | .03 | .32 | .36 | .14 | |||||
F-statistic | F(5, 41) = 4.15, p = .004 | F(5, 41) = 1.25, p = .302 | F(5, 41) = 5.29, p = .001 | – | F(5, 41) = 2.46, p = .048 |
Outcome . | v . | a . | t0 . | Accuracy . | RT . | |||||
---|---|---|---|---|---|---|---|---|---|---|
Predictors . | b (SE) [95% CI] . | t . | b (SE) [95% CI] . | t . | b (SE) [95% CI] . | t . | b (SE) . | z . | b (SE) [95% CI] . | t . |
Intercept | 0.22***(0.05) [0.12 0.31] | 4.43, p < .001 padj < .001 | 1.55***(0.09) [1.36 1.74] | 16.47, p < .001 padj < .001 | 0.77***(0.04) [0.70 0.86] | 19.77, p < .001 padj < .001 | 0.89***(0.16) | 5.63, p < .001 padj < .001 | 1.24***(0.04) [1.15 1.33] | 28.95, p < .001 padj < .001 |
N2ac | −0.22*(0.08) [−0.38 −0.06] | −2.79, p = .008 padj = .032 | −0.32 (0.15) [−0.62 −0.01] | −2.10, p = .042 padj = .126 | 0.08 (0.06) [−0.05 0.21] | 1.29, p = 0.203 padj = .407 | −1.02***(0.26) | −3.93, p < .001 padj < .001 | 0.08 (0.07) [−0.06 0.22] | 1.14, p = .261 padj = .407 |
ALI | −2.52 (6.77) [−16.20 11.16] | −0.37, p = .712 padj = 1.067 | −10.87 (13.05) [−37.22 15.49] | −0.83, p = .410 padj = 1.230 | 10.18 (5.45) [−0.83 21.19] | 1.87, p = .069 padj = .345 | −33.48 (21.81) | −1.54, p = .124 padj = .499 | 3.73 (5.94) [−8.27 15.74] | 0.63, p = .533 padj = 1.230 |
Age | −0.14*(0.05) [−0.24 −0.04] | −2.86, p = .007 padj = .020 | −0.07 (0.09) [−0.26 0.12] | −0.76, p = .452 padj = .452 | 0.12*(0.04) [0.03 0.20] | 3.00, p = .005 padj = .018 | −0.48*(0.16) | 3.03, p = .002 padj = .012 | 0.08 (0.04) [−0.01 0.16] | 1.82, p = .075 padj = .151 |
N2ac*Age | −0.16 (0.08) [−0.32 0.00] | −2.02, p = .050 padj = .160 | −0.11 (0.15) [−0.41 0.20] | −0.72, p = .474 padj = .474 | 0.15 (0.06) [0.03 0.28] | 2.44, p = .018 padj = .095 | −0.53 (0.26) | −2.05, p = .039 padj = .160 | 0.09 (0.04) [−0.04 0.24] | 1.43, p = .160 padj = .319 |
ALI*Age | −4.67 (6.78) [−18.35 9.02] | −0.69, p = .495 padj = .965 | −10.06 (13.05) [−36.41 16.30] | −0.77, p = .445 padj = 1.336 | 11.23 (5.45) [0.22 22.24] | 2.06, p = .046 padj = .229 | −15.30 (21.78) | −0.70, p = .483 padj = 1.336 | 9.19 (5.94) [−2.81 21.19] | 1.55, p = .130 padj = .519 |
Adjusted/pseudo-R2 | .26 | .03 | .32 | .36 | .14 | |||||
F-statistic | F(5, 41) = 4.15, p = .004 | F(5, 41) = 1.25, p = .302 | F(5, 41) = 5.29, p = .001 | – | F(5, 41) = 2.46, p = .048 |
v, a, t0, and RT denote drift rate, threshold separation, non-decision time, and mean RTs, respectively. SE = standard error, CI = confidence interval. Adjusted R2 is given for linear regression models (v, a, t0, and RT); pseudo-R2 is given for beta-regression (accuracy). p denotes uncorrected p values; padj denotes p values corrected for multiple comparison using a Bonferroni–Holm correction procedure (Holm, 1979). Asterisks denote significant estimates with adjusted p values as *padj < .05, ***padj < .001.
DISCUSSION
In this study, we investigated the contribution of poststimulus alpha power lateralization and N2ac amplitudes to sound localization performance in a sample of younger and older adults. Both measures have been associated with the deployment of attention in auditory space. We hypothesized that if the cortical processes reflected by alpha lateralization and N2ac amplitudes contribute to successful target selection, their magnitudes should be related to the information accumulation process (i.e., drift rate; cf. diffusion model framework, as outlined in the Introduction) and in turn to localization accuracy and RTs. In fact, what we found only partially confirmed this hypothesis: N2ac amplitudes significantly predicted both drift rate and accuracy, whereas alpha lateralization was not associated with any of the behavioral outcomes. We thus proposed that N2ac and alpha lateralization reflect distinct aspects of attentional orienting in auditory scenes. Classical frequentist inferential statistics suggested that the observed relationship did not depend on age and that both age groups showed comparable neural signatures. However, Bayesian alternatives to classical hypotheses testing raised doubts about these claims, suggesting that the data is inconclusive with respect to age effects in the electrophysiological data. Age differences in behavioral performance are briefly reviewed below.
Cocktail Party Sound Localization in Older and Younger Adults
As expected, older adults showed fewer correct responses and slower RTs than younger adults. This is in line with the often-described difficulties of older people to follow a conversation in noisy (“cocktail party”) environments, which depends on the integrity of both sensory and cognitive functions (Shinn-Cunningham, 2017). Declined performance in older adults in the present task is likely to be related to age-related deficits in concurrent sound segregation (Hanenberg, Getzmann, & Lewald, 2019; Alain & McDonald, 2007; Snyder & Alain, 2005). Traditionally, such deficits have been interpreted as a result of a general sensory-cognitive decline (e.g., Pichora-Fuller, Alain, & Schneider, 2017), assuming all aspects of processing in an experimental task to be globally slowed in aging adults (Myerson, Hale, Wagstaff, Poon, & Smith, 1990). The diffusion model allows to differentiate between different aspects of processing that might be affected by age (Ratcliff, Spieler, & McKoon, 2000): Consistent with previous results (Ratcliff, Thapar, & McKoon, 2003, 2011; Ratcliff et al., 2001), we found an increase in non-decision time for older adults. In addition, older participants varied more strongly in their non-decision time from trial to trial, indicating that this process was noisier in older adults (Spaniol, Madden, & Voss, 2006). However, rather untypically, the two age groups did not differ in their threshold separation values. This contradicts the wide-spread assumption that older adults usually aim to minimize errors (leading to more conservative response criteria) whereas younger adults focus on balancing speed and accuracy (Starns & Ratcliff, 2010). The observed lack of differences in response criteria between older and younger adults could be due to the relatively long response period in this study, potentially inducing a change in task goals in younger adults. Alternatively, as the corresponding BFs were rather inconclusive, we cannot exclude that the data are simply underpowered and therefore fail to reveal significant differences in our sample. Furthermore, supporting a line of evidence that showed differences in the rate of information accumulation in some contexts (Ratcliff et al., 2004, 2011; Spaniol et al., 2006), older adults had significantly decreased drift rates. Given the current state of research, the conditions under which drift rate decreases with age are still hard to grasp. Here, drift rate was significantly predicted by N2ac amplitudes. In participants with higher N2ac amplitudes (i.e., more negative difference waves) drift rates were higher, whereas participants with lower N2ac amplitudes tended to have lower drift rates. Hence, differences in drift rate may reflect the differences in the ability to extract relevant information from a perceptual scene (in this case, an array of concurrently presented sounds). In the following section, we will discuss this relationship in more detail.
N2ac Amplitudes Predict Drift Rate and Accuracy
To date, little is known about the functional relevance of the N2ac component. The regression analysis conducted here revealed that N2ac amplitudes significantly predicted variations in accuracy as well as drift rate, while they were unrelated to mean RTs, threshold separation, or non-decision time. These findings add to the sparse literature that has so far investigated the N2ac component in different contexts (Klatt et al., 2018b; Lewald et al., 2016; Gamble & Woldorff, 2015a, 2015b; Lewald & Getzmann, 2015; Gamble & Luck, 2011). In addition, to our best knowledge, this is the first study to show an N2ac component in a sample of older adults. Gamble and Luck (2011) originally proposed that the N2ac arises to resolve the competition between simultaneously present stimuli and reflects the attentional orienting toward a target. They further elucidated that this may be based on the biasing of neural coding toward the attended stimulus, as observed in the visual modality. In fact, the observed relationship of N2ac amplitudes and drift rate may support this line of reasoning: Drift rate conceptually reflects the quality of relevant information derived from sensory input that eventually drives the decision process (Ratcliff et al., 2000). Hence, the better participants may be able to resolve competition between concurrent sounds by focusing on the target (i.e., N2ac amplitude), the better the quality of information that prompts participants to make a decision (i.e., drift rate; or in other words, the higher the rate of evidence accumulation in favor of a given response). In turn, it logically follows that the better or more consistently participants are able to focus their attention onto a relevant target sound (i.e., N2ac amplitude), the higher their overall accuracy.
Interestingly, in addition to the similar N2ac amplitudes for younger and older adults, we found no significant interactions between N2ac amplitudes and age, neither for accuracy nor for drift rate (cf. Table 1). This may suggest that the variances within age groups contribute more strongly to the observed relationship than the variance between age groups. However, the difficulties of interpreting a null effect, such as a missing interaction with age, need to be considered as a caveat here. Although regression lines in Figure 6 show a trend toward an interaction of N2ac amplitude and age group, the calculated BFs (cf. Regression Analyses section) suggest the data to be insensitive to age group differences, providing no substantial evidence in favor of the null or alternative hypothesis. Nevertheless, one may raise the question, if lower N2ac amplitudes result in lower drift rates and decreased performance, why did older adults not show reduced N2ac amplitudes, given that they performed significantly worse than the younger adults? On the one hand, the well-pronounced N2ac component in older adults may, at least in part, have resulted from the recruitment of additional top–down resources to allow for more efficient target selection. This interpretation would be in line with the decline-compensation hypothesis (Cabeza, Anderson, Locantore, & McIntosh, 2002; for a review, see Schneider, Pichora-Fuller, & Daneman, 2010), proposing that age-related declines in peripheral and central auditory processing are compensated for by increased allocation of cognitive resources. Increases in attentional focusing, however, might not be sufficient to completely compensate for the reduced performance of the older group. On the other hand, we cannot exclude that we simply failed to find a significant difference in N2ac amplitudes due to a lack of power, as the calculation of BFs provided no substantial evidence in favor of the null hypothesis.
Is Poststimulus Alpha Power Lateralization Functionally Relevant?
This study also investigated alpha lateralization as a measure of attentional orienting within an auditory scene. Typically, alpha lateralization manifests in a bilateral decrease of alpha power, which is more pronounced over the contralateral hemisphere (relative to a target or a cue). This spatially specific modulation of oscillatory activity has been repeatedly associated with the top–down controlled voluntary allocation of attention (Ikkai et al., 2016; Haegens et al., 2011; Thut et al., 2006; Foxe, Simpson, & Ahlfors, 1998). Here, we replicated this consistently observed response in the alpha frequency band in a sample of younger and older participants who performed an auditory localization task, requiring them to indicate the location of a predefined target stimulus among three concordantly presented distractors. Our results suggested that older adults may, in principle, be able to recruit the same oscillatory mechanisms as younger adults when searching for a target among simultaneously present distractors (Klatt et al., 2018b). Although Bayesian statistics were indecisive in whether the nonsignificant difference in alpha lateralization between age groups presents a true null effect, the preserved poststimulus alpha lateralization corroborated a number of studies, showing intact alpha lateralization in older adults when anticipating an upcoming (lateralized) stimulus (Heideman et al., 2018; Leenders, Lozano-Soldevilla, Roberts, Jensen, & De Weerd, 2018; Tune et al., 2018). However, recent studies did not find alpha lateralization in older adults, although they were still able to perform the task as well as their younger counterparts (van der Waal, Farquhar, Fasotti, & Desain, 2017; Hong et al., 2015). This poses the question to what extent lateralized alpha dynamics are functionally relevant for behavior.
It is relatively undisputed that alpha power lateralization tracks the locus and timing of spatial attention (Bae & Luck, 2018; Foster et al., 2017; Samaha, Iemi, & Postle, 2017). In addition, a growing body of evidence supports the notion that the alpha rhythm as a correlate of spatial attention, so far predominantly investigated in the visual attention literature, analogously operates in different modalities (Klatt et al., 2018a, 2018b; Wöstmann et al., 2016, 2018; Thorpe, D'Zmura, & Srinivasan, 2012; Haegens et al., 2011). Yet, what remains a matter of debate is (1) how alpha power lateralization aids selective spatial attention and (2) whether it reflects a necessary prerequisite for successful behavioral performance. Regarding the how, two prevailing views exist: The gating by inhibition theory, proposed by Jensen and Mazaheri (2010), suggested that the relative increase of alpha power over the ipsilateral hemisphere inhibits regions processing irrelevant information. Alternatively, it has been suggested that the relative decrease of alpha power over the contralateral hemisphere results in increased cortical excitability, allowing for enhanced processing of the targets (Noonan et al., 2016; Yamagishi et al., 2005). Both mechanisms are not necessarily mutually exclusive. In fact, Foster and Awh (2019) just recently pointed out that a lot of the empirical evidence is compatible with either the target enhancement or the distractor suppression account. Recent evidence suggested that both mechanisms might independently contribute to attentional orienting (Schneider et al., 2019). In line with those latter findings, Capilla, Schoffelen, Paterson, Thut, and Gross (2014) proposed distinct sources and behavioral correlates for the ipsilateral and contralateral portion of the alpha power signal.
Adressing the second question—Does alpha lateralization reflect a necessary prerequisite for successful behavioral performance?—a range of spatial-cueing studies has provided compelling evidence showing behavioral performance to be predicted by the degree of alpha lateralization (Haegens et al., 2011; Kelly et al., 2009; Thut et al., 2006). On the contrary, our findings question the notion that alpha power lateralization reflects a behaviorally relevant attentional mechanism: Surprisingly, we did not find any association between alpha lateralization and diffusion model parameters, mean RTs, or accuracy. This could be explained by the fact that this study differed from the majority of previous studies in that it investigated alpha power modulations following stimulus presentation. That is, although alpha lateralization may in fact be necessary to successfully shift one's attention in anticipation of an upcoming stimulus, it does not appear to be a required neural response in the attentional processing following the presentation of a multisound array. This is in line with the proposal previously made by van Ede et al. (2014), who similarly concluded that the relevance of attentional modulations might be “restricted to situations in which attention influences perception through anticipatory processes” (p. 139). However, in contrast to our results, these authors found that alpha lateralization was completely abolished during the processing of an ongoing tactile stimulus.
Alternatively, the lack of a relationship with behavioral performance may be due to the fact that we calculated a relative measure of alpha amplitudes, that is, the difference between ipsilateral and contralateral alpha power. In a cued somatosensory detection task, van Ede et al. (2014) found only contralateral alpha power amplitudes to be related to tactile detection performance, whereas fluctuations in the contralateral minus ipsilateral difference failed to predict performance. Similarly, other studies using a relative index of alpha power modulations did not find a strong relationship with behavioral performance (Tune et al., 2018; Limbach & Corballis, 2017). These findings or rather null findings might strengthen the emerging view that both target enhancement (i.e., contralateral alpha power decrease) and distractor suppression (i.e., ipsilateral alpha power increase) differentially contribute to task performance (Schneider et al., 2019) and that this should be taken into account when analyzing the contribution of alpha power oscillations to behavior. Yet, it should be noted that there are studies that successfully demonstrated an effect of the relative strength of alpha lateralization on task performance (Haegens et al., 2011; Kelly et al., 2009), suggesting the reasons for those diverging results are likely to be more complex than just a methodological artifact. Also, it has to be noted that the respective BFs (below 1, but above 0.33) were rather indecisive; thus, although our data do not seem to support a significant relationship between alpha lateralization and behavioral performance, they cannot provide compelling evidence for a true null effect either.
Critically, one question remains unanswered: If alpha lateralization is not a necessary component of poststimulus attentional processing in an auditory scene, what does it reflect? It might be that poststimulus alpha lateralization is an “optional response” that may result in more effective target enhancement or distractor inhibition, when a specific strategy is applied. Hence, because of different strategies used by different participants, there might be no overall relationship between alpha lateralization and behavior when analyzed across all participants (Limbach & Corballis, 2017; Rihs, Michel, & Thut, 2009). Alternatively, as shown in a previous study using a very similar task design, auditory poststimulus alpha lateralization might be more closely related to the spatial specificity of the task (Klatt et al., 2018b). In the latter study, a lateralization of alpha power was only evident when participants were instructed to localize (instead of to simply detect) a target sound within a multisound array. Hence, we proposed that, in poststimulus attentional processing, the lateralization of alpha power indexes the access to a spatiotopic template that is used to generate a spatially specific response (Klatt et al., 2018b). If alpha lateralization reflects such a process, one may argue that there should be no or a substantially reduced alpha lateralization in incorrect trials, and thus, alpha lateralization should in fact be associated with behavioral performance. Such differences in ALI amplitudes for correct versus incorrect trials have in fact been reported (Tune et al., 2018; Wöstmann et al., 2016, 2018). The fact that we calculated ALIs based on each participant's mean alpha power in correctly answered trials may explain why we fail to capture such differences for a rather coarse, dichotic measure of behavioral performance such as accuracy.
Conclusion
In summary, fluctuations in N2ac amplitude predicted the rate of information accumulation (i.e., drift rate) as well as overall accuracy. We conclude that the N2ac component reflected the participants' ability to resolve competition between co-occurring sounds by focusing on the target. This, in turn, determined the quality of the information accumulated during the decision-making process and thereby affected overall accuracy levels. In contrast, alpha lateralization was unrelated to behavioral performance, suggesting that successful attentional orienting within an auditory scene (as opposed to in anticipation of an upcoming target sound), does not rely on alpha lateralization. Our findings strengthen the proposal that alpha lateralization is not specific to the visual domain but may reflect a supramodal attentional mechanism that generalizes to the auditory domain (Thorpe et al., 2012; Kerlin, Shahin, & Miller, 2010). Yet, we highlight that it is important to distinguish between cue-related, anticipatory modulations of alpha power and poststimulus alpha power lateralization.
Acknowledgments
This work was supported by the German Federal Ministry of Education and Research in the framework of the TRAIN-STIM project (Grant Number 01GQ1424E). The authors are grateful to David Schmude, Jonas Heyermann, Stefan Weber, and Michael-Christian Schlüter for their help in running the experiments; to Peter Dillmann and Tobias Blanke for preparing software and parts of the electronic equipment; and to two anonymous reviewers for valuable comments on a previous version of this manuscript.
Reprint requests should be sent to Laura-Isabelle Klatt, Leibniz Research Centre for Working Environment and Human Factors, Ardeystraße 67, 44139 Dortmund, Germany, or via e-mail: [email protected].