Attentional capture by previously reward-associated stimuli has predominantly been measured in the visual domain. Recently, behavioral studies of value-driven attention have demonstrated involuntary attentional capture by previously reward-associated sounds, emulating behavioral findings within the visual domain and suggesting a common mechanism of attentional capture by value across sensory modalities. However, the neural correlates of the modulatory role of learned value on the processing of auditory information has not been examined. Here, we conducted a neuroimaging study on human participants using a previously established behavioral paradigm that measures value-driven attention in an auditory target identification task. We replicate behavioral findings of both voluntary prioritization and involuntary attentional capture by previously reward-associated sounds. When task-relevant, the selective processing of high-value sounds is supported by reduced activation in the dorsal attention network of the visual system (FEF, intraparietal sulcus, right middle frontal gyrus), implicating cross-modal processes of biased competition. When task-irrelevant, in contrast, high-value sounds evoke elevated activation in posterior parietal cortex and are represented with greater fidelity in the auditory cortex. Our findings reveal two distinct mechanisms of prioritizing reward-related auditory signals, with voluntary and involuntary modes of orienting that are differently manifested in biased competition.
Attention can be biased toward features that align with task goals (e.g., Folk, Remington, & Johnston, 1992; Wolfe, Cave, & Franzel, 1989), physically salient stimuli (e.g., Theeuwes, 1991, 1992), and stimuli that have previously been prioritized by attention, now commonly referred to as selection history (e.g., Awh, Belopolsky, & Theeuwes, 2012). Within the context of selection history, reward-associated stimuli receive elevated attentional priority (e.g., Hickey, Chelazzi, & Theeuwes, 2010; Della Libera & Chelazzi, 2006), and such attentional biases can persist even when previously reward-associated stimuli are nonsalient and task-irrelevant as has been demonstrated using the value-driven attentional capture (VDAC) paradigm (Anderson, Laurent, & Yantis, 2011). The influence of reward history on attention has primarily been investigated in the visual domain (see works of Anderson, 2016a, 2019, for reviews), and a mechanistic understanding of learning-dependent attentional biases in other sensory systems is limited.
Cross-modal designs have investigated interactions across multiple sensory networks in attentional processing, commonly to understand how bimodal stimulation is processed and integrated (e.g., Stormer, McDonald, & Hillyard, 2009; McDonald, Teder-Salejarvi, Di Russo, & Hillyard, 2005; McDonald, Teder-Salejarvi, & Hillyard, 2000). Behavioral evidence suggests that reward associations can influence how competition between sensory modalities is resolved (Sanz, Vuilleumier, & Bourgeois, 2018; Anderson, 2016b). However, recent evidence suggests that the reward value of visual stimuli can dominate over that of auditory stimuli when in direct competition (Cheng, Saglam, Andre, & Pooresmaeili, 2020), highlighting the importance of measuring attentional biases specifically in the auditory domain to uncover the full breadth of the underlying mechanisms.
In a task using only auditory stimuli, attention has been shown to be preferentially allocated to task-relevant auditory streams as a function of reward history (Asutay & Västfjäll, 2016). We have expanded on these initial findings by showing that previously reward-associated but currently task-irrelevant auditory stimuli interfere with auditory target identification (Kim, Lee, & Anderson, 2021), demonstrating involuntary attentional capture by previously reward-associated sounds. This result provides a parallel to demonstrations of VDAC in the visual domain (Anderson, 2016a; Anderson et al., 2011) using exclusively auditory stimuli. Such converging behavioral evidence suggests a common principle of involuntary attentional prioritization of previously reward-associated stimuli across sensory modalities. However, the neural mechanisms supporting such value-based prioritization in the auditory system have not been clarified.
The neurobiology of value-driven attention has been widely investigated in the visual domain, consistently revealing a value-driven attention network of regions in the brain in which high-value stimuli evoke elevated responses, including the early and ventral visual cortex, posterior parietal cortex, and caudate tail (see the work of Anderson, 2019, for a review). Furthermore, value has been shown to modulate the amplitude of stimulus-evoked activity and tune neuronal population profiles in favor of more valuable stimuli within the spatially selective areas of the early visual cortex (Itthipuripat, Vo, Sprague, & Serences, 2019; Serences & Saproo, 2010; Serences, 2008). In electrophysiological studies, reward-associated sounds have been shown to produce an elevated N1 response over auditory cortex (Folyi & Wentura, 2019; Folyi, Liesefeld, & Wentura, 2016), suggesting a potential parallel to these findings in the visual domain, although the stimulus specificity of this response and the contribution of other brain regions to it remain unexplored. The influence of reward on auditory processing has been robustly examined in nonhuman species (see works of Irvine, 2018; Kraus & White-Schwoch, 2015, for reviews). Reward has been shown to influence neural responses in the auditory cortex as a function of value in ferrets (David, Fritz, & Shamma, 2012) and nonhuman primates (Brosch, Selezneva, & Scheich, 2011). Furthermore, an fMRI study on rhesus macaques demonstrated that such reward-associated activity in the auditory cortex interacts with neural structures that are associated with dopaminergic (nucleus accumbens) and cholinergic (nucleus basalis) pathways (Wikman, Rinne, & Petkov, 2019). Interestingly, in humans, listening to music has consistently been shown to engage neural networks of reward via the dopaminergic system, further supporting the role of projections between the limbic system and the auditory cortex in representing the value of sounds (e.g., Ferreri et al., 2019; Gold, Pearce, Mas-Herrero, Dagher, & Zatorre, 2019; Salimpoor, Zald, Zatorre, Dagher, & McIntosh, 2015; Salimpoor et al., 2013). However, these investigations have been limited to the role of reward in auditory perception or evaluation, and the modulatory influence of reward on the cognitive processing of auditory stimuli (e.g., attentional control) in humans is yet unclear.
In this study, we conducted an fMRI experiment using our previously established behavioral paradigm (Kim et al., 2021) to elucidate the neural correlates of value-driven auditory attention. An advantage of this paradigm, unlike paradigms frequently used to investigate value-driven attention in the visual domain (e.g., Anderson & Halpern, 2017; Anderson, 2016a; Anderson et al., 2011), is that robust effects of reward on attention are evident in response to both task-relevant and task-irrelevant stimuli, permitting measurement of reward's influence on both voluntary and involuntary attention. The neural correlates of the processing of different sounds as a function of learned value was probed both as a function of the amplitude of stimulus-evoked response as well as the information contained within the pattern of activation as revealed by multivariate pattern analysis (MVPA).
In our previous behavioral study, a sample size of 35 participants found significant effects of response time (RT) in both the training and test phases (Kim et al., 2021). To replicate these results, we proposed to again recruit 35 participants for this study. This sample size is similar to and, in most cases, exceeds prior fMRI studies of VDAC (e.g., Kim & Anderson, 2019b, 2020a, 2020b; Itthipuripat et al., 2019; Anderson, 2017; Barbaro, Peelen, & Hickey, 2017; Anderson et al., 2016; Anderson, Laurent, & Yantis, 2014; Hickey et al., 2010). Thirty-nine participants were recruited from the university community. All participants were English-speaking and reported normal or corrected-to-normal visual acuity and normal color vision. However, two participants did not meet the safety eligibility criteria for neuroimaging and two participants voluntarily withdrew from the study before completion. Thus, 35 participants were fully scanned and completed the experiment. Following data exclusion procedures (see Data Analysis and Exclusion Criteria section), we analyzed 31 complete behavioral and neuroimaging data sets (13 women, ages 18–35 [M = 23.1 years, SD = 4.0 years]).
All procedures were approved by the university institutional review board and were conducted in accordance with the principles expressed in the Declaration of Helsinki. Written informed consent was obtained for each participant.
Participants were scheduled for an initial in-laboratory visit for 1 hr, and each eligible participant underwent fMRI in a single 1.5-hr session at the scan center on the following day. During their initial appointment, participants came into the laboratory for consenting, MRI safety screening, and screening for adequate performance on the behavioral task. During the initial in-laboratory session, the experiment began with a brief hearing test in which participants indicated when they perceived five tones of 300–700 Hz (sin waveform, increments of 100 Hz), which were presented at intervals that randomly varied between 3000 and 11,000 msec (increments of 2000 msec). Each tone was played to each ear separately, in random order, and volume was adjusted if needed until the participant was 100% correct in identifying the tones. The computer volume was originally set to ∼56 dB, and all participants were 100% accurate in the hearing test without adjustment, resulting in the original intensity being retained for the entire experiment in all cases. Then, participants first completed the test phase task once (to become familiar with the task procedure without interfering with prior learning) and then the training phase 2 times to establish learning of the stimulus–reward associations (Kim & Anderson, 2020a, 2020b). During the fMRI session, participants completed two runs of the training phase, three runs of the test phase, an anatomical scan, and an addition run of the training phase (to mitigate possible extinction effects) and two runs of the test phase. Participants were compensated the total monetary reward accumulated at the end of the last training phase or the combined amount of $10/hr spent in the initial appointment session and $20/hr spent in the fMRI session, whichever amount was higher.
All auditory stimuli were recorded using a Spark SL condenser microphone (Baltic Latvian Universal Electronics LLC.), with an Arrow audio interface (Universal Audio Inc.), on a 2017 MacBook Pro (Apple Inc.). The recordings were sampled and modified using the built-in functions on the Logic Pro X software (Apple Inc.). All recorded samples of the numbers and letters were cut to begin at the same time, compressed to make the sound intensity equal, and condensed to be 300 msec in duration to ensure acoustic similarities across all stimuli. Importantly, all letter-to-value assignments were counterbalanced across participants so that acoustical differences between stimuli would not bias any of our statistical comparisons. The auditory stimuli were the same as those used in Kim et al.'s study (2021).
During the initial in-laboratory visit, all tasks were completed on a Dell OptiPlex 7040 computer equipped with MATLAB software (The Mathworks, Inc.) and Psychophysics Toolbox extensions (Brainard, 1997). Stimuli were presented on a Dell P2717H monitor. The participants viewed the monitor from a distance of approximately 70 cm in a dimly lit room. Participants also wore Beyerdynamic DT 770 Pro 250Ω professional studio headphones to listen to all sounds.
For the fMRI portion of the experiment, stimulus presentation was controlled by an Invivo SensaVue display system. The eye-to-screen distance was approximately 125 cm. Key responses were entered using two Cedrus Lumina two-button response pads. Output sounds were sent to a Pyle PCA1 stereo power amplifier connected to Sensimetrics Model S14 fMRI earphones (Sensimetrics Corporation).
Each run of the training phase consisted of 72 trials. Each trial began with a fixation display (1800 msec), followed by the target/distractor (300 msec), an ISI, auditory/visual feedback (1500 msec), and an intertrial interval (ITI; see Figure 1). Throughout each trial, a fixation cross (0.7° × 0.7° visual angle) was presented at the center of the screen. During the presentation of the target/distractor, participants would simultaneously hear a spoken letter played to one ear and a spoken number played to the other ear. The possible letters were U, I, and O, and the possible numbers were 1, 2, 3, and 4 (participants were informed of these possibilities beforehand). These letters and numbers were chosen based on their phonetics (not rhyming and similar intonation) and their close proximity on the keyboard. The possible letter–number combinations and what side they were presented on the earphones were fully counterbalanced, and the order of trials was randomized in each run. Participants were instructed to listen for the letter they heard and press the respective key on the keyboard. They were told that correct responses could result in monetary reward, but no information was given about reward–letter contingencies. We also specified to participants that they would receive the total monetary reward attained throughout the task or the base rate ($10/hr spent in the initial appointment session and $20/hr spent in the fMRI session), whichever was higher. Furthermore, participants were informed that if they did not complete the full experiment, they would be paid at the base rate regardless of task earnings. The ISI lasted for 1500, 2700, or 3900 msec (equally often, order randomized). Next, participants were given feedback based on what key they pressed. If the participant did not respond before the end of the ISI, they were presented with the words “Too Slow” and their accumulated total earnings, and if they pressed the wrong key, they were presented with the words “Incorrect” and their accumulated total earnings (no sound was presented during such feedback). For each participant, each letter was associated with high (20 cents), low (4 cents), or no reward (0 cents). The letter-to-value mapping was counterbalanced across participants. For correct responses, participants were shown their corresponding reward earnings and their accumulated total earnings, in addition to an audible cue for 500 msec (sine waveform, high reward = 650 Hz, low reward = 500 HZ, no reward = 350 Hz). The visual feedback remained on the screen for the entire duration of the feedback (1500 msec) whereas the audible cue was followed by silence for the remainder of the feedback period. We included the auditory feedback to help ensure that participants robustly processed the feedback, because it was possible to perform the task without actually looking at or otherwise processing the visual display. Lastly, the ITI lasted for 900, 2700, or 4500 msec (exponentially distributed, with the shorter time lengths being more frequent). The fixation cross disappeared for the last 200 msec of the ITI to indicate to the participant that the next trial was about to begin. The auditory stimuli and timing of trial events exactly matched the training phase of Kim et al.'s study (2021).
Each run of the test phase consisted of 72 trials. Each trial began with a fixation display (1800 msec), followed by the target/distractor (300 msec), and an ITI (see Figure 1). Throughout each trial, a fixation cross (0.7° × 0.7° visual angle) was presented at the center of the screen. During the presentation of the target/distractor, participants would again simultaneously hear a letter and a number (design identical to the training phase). However, participants were now instructed to listen for the number they heard and press the respective number key on the keyboard, with the letters now serving as value-associated but task-irrelevant distractors. Lastly, the ITI lasted for 2100, 3900, or 5700 msec (exponentially distributed, with the shorter time lengths being more frequent). The fixation cross again disappeared for the last 200 msec of the ITI to indicate to the participant that the next trial was about to begin. The auditory stimuli and timing of trial events exactly matched the test phase of Kim et al.'s study (2021). The trial sequence (including order of trials, ISIs, and ITIs) was fully randomized and newly created for each run for each participant (for both training and test phases) to ensure variability across participants.
MRI Data Acquisition
Images were acquired using a Siemens 3-Tesla MAGNETOM Verio scanner with a 32-channel head coil. High-resolution whole-brain anatomical images were acquired using a T1-weighted magnetization prepared rapid gradient echo pulse sequence (150 coronal slices, voxel size = 1 mm isotropic, repetition time = 7.9 msec, echo time = 3.65 msec, 8° flip angle). Whole-brain functional images were acquired using a T2*-weighted EPI multiband pulse sequence (56 axial slices, repetition time = 600 msec, echo time = 29 msec, 52° flip angle, image matrix = 96 × 96, field of view = 240 mm, slice thickness = 2.5 mm with no gap). Each EPI pulse sequence began with dummy pulses to allow the magnetic resonance signal to reach steady state and concluded with an additional 6-sec blank epoch.
MRI Data Analyses
All preprocessing was conducted using the AFNI software package (Cox, 1996). Each EPI run for every participant was motion corrected using 3dvolreg, utilizing the first image following the anatomical scan as a reference. The anatomical image was skull-stripped using 3dskullstrip and nonlinearly registered to the Talairach brain (Talairach & Tournoux, 1988) using auto_warp.py. EPI images were coregistered to the corresponding anatomical image for each participant using align_epi_anat.py, and the EPI then converted to percent signal change normalized to the mean of each run. Lastly, EPI images were nonlinearly warped to the Talairach brain by applying the warp parameters from the anatomical image using 3dNwarpApply and then spatially smoothed to a resulting 5-mm FWHM using 3dBlurToFWHM.
All statistical analyses were performed using the AFNI software package (Cox, 1996). To determine whether stimulus-evoked responses were elevated or suppressed by value, we used a general linear model (GLM) approach to analyze the training and test phase data. The GLM for the training phase included the following regressors of interest: (1) high-value target letter on left side, (2) low-value target letter on left side, (3) no-value target letter on left side, (4) high-value target letter on right side, (5) low-value target letter on right side, (6) no-value target letter on right side, (7) high-reward feedback, (8) low-reward feedback, and (9) no-reward feedback. The GLM for the test phase included the following regressors of interest: (1) high-value distractor on left side, (2) low-value distractor on left side, (3) no-value distractor on left side, (4) high-value distractor on right side, (5) low-value distractor on right side, and (6) no-value distractor on right side. Each of these regressors was modeled using 16 finite impulse response functions, beginning at the onset of the respective stimulus (Kim & Anderson, 2020a, 2020b); that is, target/distractor regressors were time-locked to the target/distractor and feedback regressors to the feedback display/sound. Six degrees of head motion and drift in the scanner signal were modeled using nuisance regressors. The peak beta value for each condition from 3 to 6 sec post stimulus presentation was extracted (e.g., Kim & Anderson, 2020a, 2020b). Incorrect trials were not excluded from analysis because there were too few from which to derive a modeled response and participants were still exposed to the same stimulus input, the processing of which was of primary interest.
Given that no significant behavioral differences were found between low- and no-value conditions in RT, replicating prior results using this paradigm (Kim et al., 2021), we averaged the peak beta values for these two conditions (henceforth referred to as lesser-value) in both the training and test phase data for ANOVAs, such that the conditions used in these analyses correspond to the behavioral effects of reward evident in this paradigm. In the training and test phase, using the AFNI program 3dANOVA3, a three-way ANOVA was conducted comparing the peak response on trials with Value-Associated Target/Distractor (high vs. lesser) and Side (left vs. right) as fixed effects and Participant as a random effect. Multiple comparison corrections were implemented using the AFNI program 3dClustSim, with the smoothness of the data estimated using the auto-correlation function method via the AFNI program 3dFWHMx (clusterwise α < .05, voxelwise p < .005). Of interest for both the training and test phase was the ANOVA contrast on the effect of high-value target/distractor versus lesser-value target/distractor.
Next, we more specifically probed for effects of reward on the strength of stimulus-evoked activity in the test phase within the frontoparietal attention network (FPN) and the auditory cortex using an ROI approach. ROIs were identified from the training phase ANOVA, which provided independent data. The ROI for the FPN was taken from the main contrast on the effect of high-value target versus lesser-value targets and included four clusters identified in the parietal cortex and the FEFs given their overlap with regions previously identified in the value-driven attention network (Anderson, 2019; see Figure 3). The ROI for the auditory cortex was also identified from the GLM of the training phase. In this case, we calculated the average response for each of the six target conditions in the GLM. We then averaged over the six conditions and determined the ROI based on the intersection of the functional activation and the Talairach Atlas' definition of auditory cortex (see Figure 4). We used the AFNI program 3dmaskave to extract peak beta values from each of the six conditions (Reward × Side) in the test phase and submitted this to the same ANOVA as the voxelwise whole-brain analysis.
Lastly, to determine whether information about the high-value distractor is more robustly represented than information about lesser-value distractors in the auditory cortex after they become task-irrelevant, we conducted two MVPAs. For this purpose, the aforementioned GLMs for the training and test phase were separated by run and the peak beta value for each condition was calculated in the same way, providing one beta-weight estimate per condition per run (Anderson, 2017). In our first MVPA, we confirmed whether different target sounds were reliably associated with different patterns of activation during the training phase. Beta-weights for high-, low-, and no-value targets in the training phase for each side were extracted from the contralateral ROI in the mask created for the auditory cortex. These beta-weights were then standardized (z scored), separately for each run and participant, and subjected to MVPA using the linear support vector machine classifier (fitscvm) in MATLAB. Linear support vector machine (SVM) was performed using leave-one-run-out cross-validation, such that the SVM was iteratively trained on the beta-weights from n − 1 of the runs and tested on the left out run for each participant resulting in three classification accuracies (as in Anderson, 2017; Xu et al., 2017) for each hemisphere (high- vs. low-value, high- vs. no-value, low- vs. no-value). These accuracies were averaged over the two hemispheres to generate the mean classification accuracy per participant, which were then averaged across participants to compute a grand mean. The probability of the observed grand mean classification accuracy under the null hypothesis was determined using a randomization procedure in which a distribution of mean classification accuracy was computed under conditions in which the training labels were randomly shuffled for each participant in 10,000 iterations (Anderson, 2017; Xu et al., 2017); the order of trials in the random condition assignment was matched with the original sequence, but the six conditions (high-value target on the right, low-value target on the right, no-value target on the right, high-value target on the left, low-value target on the left, and no-value target on the left) were permuted for each run.
Next, we used a similar MVPA approach to investigate whether the high-value distractor in the test phase produced a more robust pattern of activation in the auditory cortex compared to lesser-value stimuli. To the degree that information about the distractor is represented in spite of its task-irrelevance, the pattern of activation that it evokes should be more similar to the pattern evoked by the same stimulus during the training phase when it is actually attended. In addition to the acquired beta-weights for the high-, low-, and no-value targets in the training phase, beta-weights for the high-, low-, and no-value distractors in the test phase were similarly extracted. We utilized a similar MVPA and randomization test approach as for the training phase. However, in this case, linear SVM was performed by training the classifier on the beta-weights for the four training-phase runs and predicting on the four test-phase runs. In this case, higher classification accuracy means that a distractor sound was processed more similarly to when it was task-relevant and attended. As with our ANOVAs, to acquire classification accuracies with respect to lesser-value distractors in order to match our behavior results but in this case avoid biases in classification accuracy because of averaging of conditional beta values, we classified high- versus low-value and high- versus no-value distractors separately and subsequently averaged the results of the two for each participant to obtain an overall estimate of high- versus lesser-value discriminability. Classification accuracies between conditions were directly compared using the same randomization approach. A randomization procedure using random sign flipping on the resulting classification accuracies (either above/below 0.5 or the difference score when comparing two classification accuracies) was also performed on all MVPA results, which achieved the same conclusions with respect to statistical significance and is therefore not reported.
Data Analysis and Exclusion Criteria
Behavior results are presented from only the fMRI session and not the in-laboratory session the day before such that the behavioral and fMRI data correspond to the same trials.1 RT was measured from the onset of the target/distractor. Only correct trials were included in the RT analyses. RTs more than 2.5 SDs above and below the mean for a given condition for a given participant were trimmed (Kim et al., 2021; Kim & Anderson, 2020a, 2020b). In addition, we excluded two participants' data whose mean accuracy or RT exceeded 2.5 SDs below or above the group mean as outliers (see the work of Anderson, 2016b). Furthermore, two participants made significant head movements during their neuroimaging scan and 25.8% and 6.4% of images, respectively, would have needed to be censored because of excessive motion exceeding half the width of a voxel during a single dynamic (in comparison to an average of 0.2% of images among all other participants); these participants were also excluded from the final analysis. In the end, 31 complete behavior and neuroimaging data sets were submitted to final analyses.
Data and Code Availability Statement
The full data set, including the raw MRI data, are available upon reasonable request made to the corresponding author and will be provided under the provision that the data be used strictly for academic research purposes and not be shared with others without the expressed written approval of the corresponding author. Data sharing for this article complies with the requirements of the funding agencies and the stipulations of the university institutional review board approvals.
In the training phase, a repeated-measures ANOVA revealed that RTs significantly differed among the three target conditions, F(2, 60) = 17.73, p < .001, ηp2 = .371. Post hoc comparisons revealed that participants were significantly faster to report high-value targets compared to both unrewarded targets, t(30) = 5.42, p < .001, dz = 0.973, and low-value targets, t(30) = 4.02, p < .001, dz = 0.720, but only a marginally significant difference was found comparing low-value and unrewarded targets, t(30) = 1.96, p = .060 (see Figure 2A). Accuracy did not significantly differ among the three target conditions, F(2, 60) = 1.11, p = 0.338 (see Figure 2B).
In the test phase, a repeated-measures ANOVA revealed that RTs differed significantly among the three distractor conditions, F(2, 60) = 3.69, p = .031, ηp2 = .110. Post hoc comparisons revealed that RTs were significantly slower on high-value distractor trials compared to both no-value distractor trials, t(30) = 2.40, p = .023, dz = 0.432, and low-value distractor trials, t(30) = 2.24, p = .032, dz = 0.404, but no significant differences were found comparing low-value and no-value distractor trials, t(30) = 0.58, p = .564 (see Figure 2C). Accuracy did not significantly differ among the three distractor conditions, F(2, 60) = 0.40, p = 670 (see Figure 2D).
From our ANOVA contrast on the effect of high-value versus lesser-value (low- and no-value) in the training phase, we identified significantly less neural activity by the high-value target in the insula (INS), inferior frontal gyrus (IFG), middle frontal gyrus (MFG), cingulate gyrus (CG), intraparietal sulcus (IPS), and FEFs (see Figure 3A, Figure 4, and Table 1). From our ANOVA contrast on the effect of high-value distractor versus lesser-value distractors (low- and no-value) in the test phase, we identified elevated neural activity by the high-value distractor in the posterior parietal cortex (see Figure 3B, Figure 4, and Table 1). To specifically probe for an effect of reward on the magnitude of stimulus-evoked activity within the FPN and auditory cortex during the test phase that might have been too subtle to detect at the whole-brain level, we first created masks for these regions of interest (see Methods section; see also Figure 5). For the FPN, a 2 × 2 ANOVA over factors Reward (high- vs. lesser-value) and Side (left vs. right) found no main effects or interaction: main effect of Side, F(1, 30) = 3.08, p = .089, other Fs < 1.03, ps >.31. For the auditory cortex, the same 2 × 2 ANOVA similarly found no main effects or interaction, Fs < 1.93, ps > .175.
|Region .||x .||y .||z .||Volume (ml) .|
|Right anterior INS||−28.8||−18.8||11.2||0.406|
|Left inferior frontal gyrus||38.8||−6.2||21.2||0.469|
|Left parietal cortex||41.2||38.8||43.8||0.328|
|Left posterior parietal cortex||26.2||73.8||36.2||0.422|
|Region .||x .||y .||z .||Volume (ml) .|
|Right anterior INS||−28.8||−18.8||11.2||0.406|
|Left inferior frontal gyrus||38.8||−6.2||21.2||0.469|
|Left parietal cortex||41.2||38.8||43.8||0.328|
|Left posterior parietal cortex||26.2||73.8||36.2||0.422|
x, y, and z refer to the Talairach coordinates of the peak voxel of the cluster.
We first established that the differently valued stimuli in the training phase produced reliably different patterns of activation in the auditory cortex. Our MVPA significantly classified each pair of conditions (high- vs. low-value, accuracy = 55%, high- vs. no-value, accuracy = 56%, low- vs. no-value, accuracy = 57%, ps < .014). Then, we trained classifiers on these patterns, when the stimuli were task-relevant and attended, to determine whether and to what degree this pattern was maintained when the same stimuli were presented as task-irrelevant distractors. Classification was significantly above chance in distinguishing high-value distractors from lesser-value distractors (accuracy = 53%, p = .003), but not for distinguishing between low-value and no-value distractors (accuracy = 49%, p = .479). The difference between these two accuracies was also significant, p = .031 (see Figure 6).
RTs were significantly faster in the training phase when the target was associated with high value, demonstrating a voluntary attentional bias driven by the motivational effects of reward. In addition, RTs were significantly slower in the test phase when the same stimulus was presented as a task-irrelevant distractor, demonstrating an involuntary attentional bias driven by learned value or VDAC. Thus, we replicated behavioral evidence of two forms of value-based attentional biases in the auditory domain (Kim et al., 2021). Our fMRI data permitted an assessment of the neural correlates of each of these two influences on the control of attention, with implications for our understanding of mechanisms of value-driven attention in the auditory domain and the extent to which voluntary and involuntary modes of value-based orienting are distinct.
In the test phase, we found elevated stimulus-evoked activity in the posterior parietal cortex by the high-value distractor, consistent with studies of VDAC using visual stimuli (e.g., Kim & Anderson, 2020b; Anderson, 2017; Hickey & Peelen, 2015; Anderson et al., 2014; Qi, Zeng, Ding, & Li, 2013). The posterior parietal cortex plays a broad role in attentional selection and is a core region of the dorsal frontoparietal network, which is thought to selectively bias the representation of task-relevant or otherwise pertinent stimuli (see works of Corbetta, Patel, & Shulman, 2008; Behrmann, Geng, & Shomstein, 2004; Corbetta & Shulman, 2002, for reviews). In contrast, we did not find evidence for priority signals in the caudate tail, which plays a central role in value-driven attention in the visual domain (e.g., Kim & Anderson, 2020a, 2020b; Anderson, 2017, 2019; Anderson et al., 2014, 2016; Kim & Hikosaka, 2013; Yamamoto, Kim, & Hikosaka, 2013; Yamamoto, Monosov, Yasuda, & Hikosaka, 2012; Orban, Van Essen, & Vanduffel, 2004), or any other region of the brain implicated in VDAC by visual stimuli, arguing against a supramodal network supporting value-based attention.
In the training phase, in contrast, we found reduced priority signals by the high-value target in the FEFs, IPS, MFG, IFG, CG, and INS. That is, the representation of high-value sounds demonstrated a relative suppression of neural activity within the dorsal FPN typically implicated in the processing of visual information (Corbetta et al., 2008; Corbetta & Shulman, 2002), without any apparent increases in the auditory system. Such a finding echoes an earlier behavioral demonstration of increased interference from previously reward-associated sounds in a visual task (Anderson, 2016b), suggesting that behavioral impairments in such cross-modal designs may at least in part reflect reduced priority to sensory input outside of the auditory system. Targeted ROI analyses of the test phase data provide no evidence for the maintenance of this pattern of reduced priority signals into the test phase. It is also possible that, in at least some regions identified, particularly those that have been linked to affective information processing such as the INS (e.g., Berntson et al., 2011; Winston, Gottfried, Kilner, & Dolan, 2005; Norris, Chen, Zhu, Small, & Cacioppo, 2004), the observed difference in activation is the result of elevated processing of lesser-value stimuli, possibly because of these stimuli being represented as comparatively aversive. The relative nature of the BOLD response is necessarily ambiguous with respect to the distinction between selective enhancement versus suppression of an evoked response.
Collectively, these contrasting results indicate divergent mechanisms for modulating attentional priority depending on the relationship between value and task goals. Whereas both the high-value target in the training phase and high-value distractor in the test phase received elevated attentional priority, neural activity was generally suppressed in relative terms when processing task-relevant reward cues and elevated when processing task-irrelevant, previously reward-associated distractors. Our results offer neural evidence against the idea that VDAC merely reflects a persistence of motivated attentional processes, which would have predicted a similar pattern of stimulus-evoked activity across phases. Such a finding is broadly consistent with behavioral evidence for an independent role for (implicit) associative learning and target history effects in the control of attention (e.g., Grégoire, Kim, & Anderson, 2021; Kim & Anderson, 2019a, 2021; Anderson & Britton, 2019; Anderson, Chiu, DiBartolo, & Leal, 2017; see also the work of Kim & Anderson, 2019b) as well as inhibitory accounts of the selective processing of a target (e.g., Gaspelin & Luck, 2018a, 2018b, 2018c, 2019; Gaspelin, Leonard, & Luck, 2015, 2017; Moher, Lakshmanan, Egeth, & Ewen, 2014).
In the visual domain, stimulus-specific information pertaining to previously reward-associated stimuli has been demonstrated in early sensory cortices (Itthipuripat et al., 2019; Serences & Saproo, 2010). In this study, we provide a parallel demonstration of this phenomenon in the auditory system, suggesting that early sensory enhancement of reward-associated signals reflects a modality-general process at play across multiple sensory systems. When task-irrelevant, only information pertaining to the identity of previously high-value stimuli could be decoded from the other distractor conditions, suggesting that stimulus-specific information about high-value sounds was maintained across phases of the experiment whereas this was not the case for lesser-value sounds, which were more effectively ignored. Future research could explore the nature of this stimulus-specific information enhancement in more detail, isolating feature-specific components (e.g., frequency, as with color in the visual system, or location) and complex identity information (with parallel to visual objects) to isolate different stages of information processing.
In probing the neural mechanisms of reward's influence on attention in the auditory domain, several broader principles emerge. Value-driven auditory attention, like value-driven attention in the visual domain, is reflected in biased competition within sensory systems, consistent with an early-stage influence. The posterior parietal cortex seems to play a role in value-driven attention that spans sensory modalities, but beyond this, our results do not suggest a widespread supramodal network of value-based attentional prioritization, as could be suggested from common behavioral influences of reward across vision and audition and the neural correlates of goal-directed and stimulus-driven orienting across modalities throughout the dorsal and ventral attention networks (see works of Macaluso, 2010; Macaluso & Driver, 2005, for reviews). Our findings also highlight a notable distinction between reward's influence on motivated attention and involuntary attentional capture in the case of audition, with motivated attention reflecting the selective suppression of information in other sensory systems and learning-dependent prioritization reflected in priority signals in the parietal cortex along with stimulus-specific sensory enhancement. In this respect, our findings provide neural evidence for a distinction between value-driven attentional processes and the perseveration of motivated attention, which has been a topic of controversy in the attention literature (e.g., Anderson, 2016a; Kim & Anderson, 2019a).
This study was supported by grants from the National Institutes of Health (R01-DA046410) to B. A. A. We thank David S. Lee for assistance in creating the auditory stimuli.
The corresponding author has transitioned to a new position. Reprint requests should be sent to Andy J. Kim, Department of Gerontology, University of Southern California, 3715 McClintock Ave. Los Angeles, CA 90089, or or via e-mail: email@example.com.
Andy J. Kim: Conceptualization; Data curation; Formal analysis; Writing—Original draft. Laurent Grégoire: Data curation; Writing—Review & editing. Brian A. Anderson: Conceptualization; Formal analysis; Writing—Review & editing.
Brian A. Anderson, National Institute on Drug Abuse (https://dx.doi.org/10.13039/100000026), grant number: R01-DA046410.
Diversity in Citation Practices
A retrospective analysis of the citations in every article published in this journal from 2010 to 2020 has revealed a persistent pattern of gender imbalance: Although the proportions of authorship teams (categorized by estimated gender identification of first author/last author) publishing in the Journal of Cognitive Neuroscience (JoCN) during this period were M(an)/M = .408, W(oman)/M = .335, M/W = .108, and W/W = .149, the comparable proportions for the articles that these authorship teams cited were M/M = .579, W/M = .243, M/W = .102, and W/W = .076 (Fulvio et al., JoCN, 33:1, pp. 3–7). Consequently, JoCN encourages all authors to consider gender balance explicitly when selecting which articles to cite and gives them the opportunity to report their article's gender citation balance.
The main effect of target value was also significant for RT in the in-laboratory portion, demonstrating a significant effect of learning.