It has long been known that listening to speech activates inferior frontal (pre-)motor regions in addition to a more dorsal premotor site (dPM). Recent work shows that dPM, located adjacent to laryngeal motor cortex, responds to low-level acoustic speech cues including vocal pitch, and the speech envelope, in addition to higher-level cues such as phoneme categories. An emerging hypothesis is that dPM is part of a general auditory-guided laryngeal control circuit that plays a role in producing speech and other voluntary auditory–vocal behaviors. We recently reported a study in which dPM responded to vocal pitch during a degraded speech recognition task, but only when speech was rated as unintelligible; dPM was more robustly modulated by the categorical difference between intelligible and unintelligible speech. Contrary to the general auditory–vocal hypothesis, this suggests intelligible speech is the primary driver of dPM. However, the same pattern of results was observed in pitch-sensitive auditory cortex. Crucially, vocal pitch was not relevant to the intelligibility judgment task, which may have facilitated processing of phonetic information at the expense of vocal pitch cues. The present fMRI study (n = 25) tests the hypothesis that, for a multitalker task that emphasizes pitch for talker segregation, left dPM and pitch-sensitive auditory regions will respond to vocal pitch regardless of overall speech intelligibility. This would suggest that pitch processing is indeed a primary concern of this circuit, apparent during perception only when the task demands it. Spectrotemporal modulation distortion was used to independently modulate vocal pitch and phonetic content in two-talker (male/female) utterances across two conditions (Competing, Unison), only one of which required pitch-based segregation (Competing). A Bayesian hierarchical drift-diffusion model was used to predict speech recognition performance from patterns of spectrotemporal distortion imposed on each trial. The model's drift rate parameter, a d′-like measure of performance, was strongly associated with vocal pitch for Competing but not Unison. Using a second Bayesian hierarchical model, we identified regions where behaviorally relevant acoustic features were related to fMRI activation in dPM. We regressed the hierarchical drift-diffusion model's posterior predictions of trial-wise drift rate, reflecting the relative presence or absence of behaviorally relevant acoustic features from trial to trial, against trial-wise activation amplitude. A significant positive association with overall drift rate, reflecting vocal pitch and phonetic cues related to overall intelligibility, was observed in left dPM and bilateral auditory cortex in both conditions. A significant positive association with “pitch-restricted” drift rate, reflecting only the relative presence or absence of behaviorally relevant pitch cues, regardless of the presence or absence of phonetic content (intelligibility), was observed in left dPM, but only in the Competing condition. Interestingly, the same effect was observed in bilateral auditory cortex but in both conditions. A post hoc mediation analysis ruled out the possibility that decision load was responsible for the observed pitch effects. These findings suggest that processing of vocal pitch is a primary concern of the auditory-cortex–dPM circuit, although during perception core pitch, processing is carried out by auditory cortex with a potential modulatory influence from dPM.

Regions of the left premotor cortex involved in speech production are also activated during listening to speech (Skipper, Devlin, & Lametti, 2017; Buchsbaum et al., 2011; Pulvermuller et al., 2006; Wilson, Saygin, Sereno, & Iacoboni, 2004; Buchsbaum, Hickok, & Humphries, 2001). While numerous authors have suggested this supports a role for motor mechanisms in speech perception (Skipper et al., 2017; Schwartz, Basirat, Ménard, & Sato, 2012; Tremblay & Small, 2011; Pulvermuller & Fadiga, 2010; Rizzolatti & Craighero, 2004), several recent studies show that left premotor cortex responds to auditory speech features including the spectrotemporal envelope, pitch contour, rhythmic phrasal structure, and auditory-phonetic features (Berezutskaya, Baratin, Freudenburg, & Ramsey, 2020; Keitel, Gross, & Kayser, 2018; Cheung, Hamilton, Johnson, & Chang, 2016). This raises the possibility that at least some portion of the left premotor cortex processes speech in auditory rather than (or in addition to) articulatory-phonetic terms. This most likely reflects the fact that, for some speech motor circuits, (i) the sensory “targets” for production (Hickok, Houde, & Rong, 2011; Tourville, Reilly, & Guenther, 2008) are primarily auditory (i.e., as opposed to visual, somatosensory, or multimodal) and (ii) these “targets” can be understood in terms of simple acoustic properties, without requiring a phonological abstraction (Indefrey & Levelt, 2004) or nonlinear transformation to motor coordinates (Parrell, Ramanarayanan, Nagarajan, & Houde, 2019).

Recent studies showing premotor responses to acoustic speech features have consistently identified a more dorsal region of the precentral gyrus adjacent to laryngeal motor cortex (cf. Belyk et al., 2021; Correia, Caballero-Gaudes, Guediche, & Carreiras, 2020; Eichert, Papp, Mars, & Watkins, 2020; Simonyan, 2014; Brown, Ngan, & Liotti, 2008) rather than more ventral precentral regions associated with the orofacial articulators (e.g., lips and tongue). Studies showing motor activation during speech perception tasks, including somatotopic effects and potential contributions to task performance, have focused on these more ventral regions (e.g., lips and tongue; Schomers, Kirilina, Weigand, Bajbouj, & Pulvermuller, 2015; D'Ausilio et al., 2009; Mottonen & Watkins, 2009; Meister, Wilson, Deblieck, Wu, & Iacoboni, 2007; Pulvermuller et al., 2006; Watkins & Paus, 2004; Watkins, Strafella, & Paus, 2003; Fadiga, Craighero, Buccino, & Rizzolatti, 2002). Several groups, including our own, have suggested the orofacial motor regions may be recruited to process speech that is embedded in background noise or otherwise degraded, or when the listening task involves phonological working memory or decision-related processes (Nuttall, Kennedy-Higgins, Hogan, Devlin, & Adank, 2016; Du, Buchsbaum, Grady, & Alain, 2014; Mottonen, van de Ven, & Watkins, 2014; Krieger-Redwood, Gaskell, Lindsay, & Jefferies, 2013; D'Ausilio, Bufalari, Salmas, & Fadiga, 2012; Hervais-Adelman, Carlyon, Johnsrude, & Davis, 2012; Osnes, Hugdahl, & Specht, 2011). In contrast, perhaps a defining feature of the more dorsal premotor site is its tendency to activate during passive listening when no explicit response or motor planning is required (Chen, Penhune, & Zatorre, 2009; Wilson et al., 2004; but see also Panouillères, Boyles, Chesters, Watkins, & Möttönen, 2018) and, as in temporal lobe speech regions, when speech is more rather than less intelligible (Okada et al., 2010). One possibility, as noted above, is that laryngeal motor circuits are modulated by rhythmic and/or prosodic acoustic cues, many of which are associated with voicing and vocal pitch (Dichter, Breshears, Leonard, & Chang, 2018; D'Ausilio, Bufalari, Salmas, Busan, & Fadiga, 2011). Indeed, recent work suggests that articulatory-phonetic information (e.g., place of articulation) is not encoded in dorsal or ventral premotor activations during listening to speech (Arsenault & Buchsbaum, 2016; Evans & Davis, 2015), which supports the notion that these regions function differently when speech is processed in auditory terms. Notably, the acoustic-to-articulatory mapping for pitch is likely much simpler than that for phonetic cues (Parrell et al., 2019), perhaps facilitating auditory–motor interactions in dorsal laryngeal circuits during listening.

Using fMRI, we recently identified a region in the left dorsal premotor cortex (dPM) that responds preferentially to intelligible speech, encodes vocal pitch, is functionally connected with early auditory regions, and is proximal to the dorsal laryngeal motor cortex (Venezia, Richards, & Hickok, 2021). Using a single-talker, yes–no intelligibility judgment task with a spectrotemporal modulation filtering paradigm that independently degraded acoustic features related to phonetic content versus vocal pitch, we further showed that, paradoxically, left dPM and pitch-sensitive regions of auditory cortex (AC) responded to vocal pitch only when speech was rated as unintelligible by listeners. The dominant response in left dPM and pitch-sensitive auditory regions was to phonetic content, which was also strongly associated with intelligibility judgments, whereas vocal pitch was not. Therefore, by definition, activation was driven largely by the categorical difference between intelligible and unintelligible speech. We speculated that left dPM and pitch-sensitive auditory regions responded to pitch only during unintelligible trials because pitch information was irrelevant to the task when phonetic content was also present—namely, because intelligibility judgments relied only on phonetic content and not vocal pitch, brain regions involved in processing vocal pitch activated to pitch-related features only when phonetic content was unavailable to the listener.

This dual response profile, dominated by a response to phonetic content with a secondary response to pitch, was somewhat unsurprising given that previous work, including our own, had shown a similar response profile in left dPM and pitch-sensitive auditory regions (Hamilton, Oganian, Hall, & Chang, 2021; Venezia, Martin, Hickok, & Richards, 2019; Cheung et al., 2016). Nonetheless, we assumed that the response to pitch was a defining characteristic of these regions given that no such response was observed in other temporal and frontal lobe regions that responded robustly to phonetic content. An extension of this assumption, related to the discussion above, is that pitch-sensitive regions in left dPM and AC comprise an auditory–motor network driven primarily by pitch and related low-level auditory cues, whereas other regions in the lateral superior temporal lobe and inferior frontal cortex form a speech-motor network driven primarily by phonetic content and other high-level speech information (e.g., lexical representations). However, an alternative possibility is that the response to vocal pitch in left dPM and/or pitch-sensitive auditory regions is truly secondary, occurring only when phonetic information is unavailable and therefore speech is not processed qua speech. Stated in terms of motor control, this alternative would imply that the auditory-cortex–dPM circuit is dedicated to speech production, as opposed to a more general auditory–vocal control function that supports speech. We hypothesized that for a listening task that requires extraction of vocal pitch, dPM and pitch-sensitive auditory regions would respond to pitch-related features regardless of the presence of phonetic content (i.e., overall speech intelligibility), consistent with a primary or general (although not necessarily exclusive) role in processing pitch and related auditory cues.

Here, we use fMRI to test that hypothesis. We again employ our spectrotemporal modulation filtering technique, termed Auditory Bubbles (Venezia, Hickok, & Richards, 2016), to independently degrade spectrotemporal features related to phonetic content versus vocal pitch. However, whereas our previous study used a single talker in a quiet background, this study uses a competing speech task (Bolia, Nelson, Ericson, & Simpson, 2000) with target female and competing male talkers (see Venezia, Leek, & Lindeman, 2020). The male and female talkers are generated synthetically from the same original waveforms using a vocal gender-morph procedure, thus allowing two conditions: Competing, in which the female and male talkers generate similar utterances that differ by several keywords, and Unison, in which the female and male talkers generate the exact same utterance. The crucial difference is that only the Competing condition requires the listener to segregate the talkers using vocal pitch.

Twenty-five listeners completed 400 trials of a speeded, three-alternative forced-choice (3-AFC) speech recognition task in the Competing and Unison conditions during fMRI scanning. A hierarchical Bayesian drift diffusion model (HDDM; Vandekerckhove, Tuerlinckx, & Lee, 2011) was fitted to RT data to predict trial-by-trial fluctuations in speech recognition performance (the “drift rate” parameter of the HDDM) from the pattern of spectrotemporal distortion imposed via Auditory Bubbles on each trial. The model confirmed that drift rate was more strongly modulated by vocal pitch for Competing versus Unison and phonetic content for Unison versus Competing. A second hierarchical Bayesian model was then used to examine the relation between HDDM-predicted, trial-wise drift rate and trial-wise fMRI activation. Two versions of trial-wise drift rate were generated: overall drift rate and “pitch-restricted” drift rate, where the latter reflected only the pitch-related features that modulated task performance and the former reflected any/all acoustic features (pitch and/or phonetic content) that modulated task performance.

Crucially, trial-wise drift rates represented only the relative presence or absence, on a given trial, of the spectrotemporal modulations associated with drift rate across trials, thus capturing trial-by-trial variance in behaviorally relevant stimulus features. Also crucial was that the majority of trials were intelligible in both the Competing (67% correct) and Unison (> 80% correct) conditions. Therefore, we hypothesized that: (i) overall drift rate, which captures stimulus features related to overall intelligibility, would be correlated with fMRI activation in dPM and auditory speech regions in the Competing and Unison conditions and (ii) pitch-restricted drift rate, which captures stimulus features related to vocal pitch regardless of overall intelligibility, would be correlated with fMRI activation in dPM and pitch-sensitive auditory regions only in the Competing condition. The logic behind the latter hypothesis was that, following our previous work (Venezia et al., 2021), pitch-responsive regions should be activated by pitch only for unintelligible trials in the Unison condition. Because only a small minority of trials (< 20%) were incorrect in the Unison condition, a correlation across all trials between pitch-restricted drift rate and fMRI activation should not be observed in this condition. Conversely, in the Competing condition, we assumed that, given the essential role of pitch in performing the Competing task, pitch-sensitive regions would be activated by pitch for both intelligible and unintelligible trials, and thus, a correlation across all trials between pitch-restricted drift rate and fMRI activation should be observed in this condition.

In fact, Hypothesis (i) was supported by the data for both dPM and auditory-cortical regions. Hypothesis (ii) was supported by the data only for dPM; counter to expectations, activation in pitch-sensitive auditory regions was correlated with pitch-restricted drift rate in the Competing and Unison conditions. We speculate as to the explanation for this below. Post hoc analyses were then conducted to (i) confirm that the observed pattern of results could not be explained by trial-wise fluctuations in task difficulty that might be correlated with trial-wise, pitch-restricted drift rate, and (ii) examine how drift rate correlations differed by response category (correct, incorrect) across conditions (Competing, Unison).

The data in this study were reported previously by Herrera et al. (2021). Some general methodological details of that study are recapitulated here. This study is a re-analysis of the same data to answer a different experimental question.

Participants

Inclusion Criteria

Participants were previously deployed veterans between the ages of 18 and 60 years, with no restrictions on gender, ethnicity, or race. All participants were screened via phone and/or medical record review to verify: (1) age, (2) history of deployment, (3) no history of auditory conductive pathology including ear infections or ear surgeries, (4) English as a first language, and (5) no history of major neurological disorder such as stroke, tumor, epilepsy, or moderate-to-severe traumatic brain injury. Once screened, potential participants underwent a full in-person audiologic evaluation to verify: (1) air-conduction pure-tone average at 0.5, 1, 2, and 4 kHz was 35 dB HL or better in both ears; (2) air-conduction pure-tone thresholds were no worse than 40 dB HL at any two audiometric frequencies below 8 kHz; (3) air-conduction pure-tone thresholds did not exceed 10 dB HL asymmetry between the ears at more than one audiometric frequency below 4 kHz; and (4) no evidence of conductive pathology. Participants meeting these audiometric criteria can be described as having at least “near normal” hearing—that is, normal hearing in the speech frequency range and no more than mild hearing loss at higher frequencies, without significant inter-ear asymmetries or evidence of damage to the middle ear. In addition, all participants were required to score at least 25 on the Mini-Mental State Examination. No participants were excluded for active or historical psychiatric diagnoses or use of psychotropic medications, so long as these did not preclude the participant from completing the task.

Sample Characteristics

Twenty-five veterans (22 men, 3 women) completed the study. The average age of the participants was 49.4 years (SD = 9.2 years, range = 30–60 years). Sixteen participants had clinically significant symptoms of post-traumatic stress disorder (PTSD) (via medical record review or the PTSD checklist—military version; Wilkins, Lang, & Norman, 2011), eight had a history of mild traumatic brain injury (via medical record review or subject report), and 16 were diagnosed with another psychiatric disorder such as depression, anxiety, or substance use disorder (medical record review). Potential participants with major neurological (e.g., stroke, tumor, epilepsy, or moderate-to-severe brain injuries) or psychiatric (e.g., schizophrenia, antisocial personality disorder, or any disorders resulting in manifestly violent or aggressive behavior) were not enrolled in the study. Notably, the prevalence of psychiatric disorders in our sample (64%) is not drastically different from the prevalence among college-aged adults in the United States (50%; Blanco et al., 2008), who commonly serve as research participants in studies on “neurotypical” brain function.

Sample Size Justification

Our previous study showed that dPM responded significantly to spectrotremporal modulations in the vocal pitch region of the speech modulation power spectrum (MPS; Venezia et al., 2021). Using data from that study, we averaged spectrotemporal receptive field coefficients from each participant (n = 10) across all significant voxels in dPM and all MPS pixels likely to contain vocal pitch. The standardized effect size (one-sample t test) was 1.15. We then simulated 10,000 data sets in which each “participant's” dPM response was drawn from a normal distribution with mean = 1.15 and standard deviation = 1. For different sample sizes, we calculated the proportion of simulated data sets for which the Bayes Factor (one-sample t test; prior width = 0.707) was greater than 100 (see Trial-Wise Predicted Drift Rate section below). A sample size of at least n = 23 was required to achieve 80% power. Our sample size is n = 25 (estimated power = 89.7%).

Ethics Authorization

This study, all documentation, and procedures were reviewed and approved by the VA Loma Linda Healthcare System (VALLHCS) institutional review board. Before enrollment, all participants provided their written informed consent. All study procedures were completed at either the VALLHCS Auditory Research Lab or at the Loma Linda University East Campus Radiology Department. All participants received monetary compensation for their participation.

Procedures

Audiologic Evaluation

Audiologic case history, otoscopy, middle ear immittance testing including tympanometry and ipsilateral acoustic reflex testing at 1 kHz, air-conduction thresholds from 0.25 to 8 kHz, and bone conduction thresholds from 0.25 to 4 kHz were acquired. Equipment for testing included the Madsen Astera2 audiometer by Natus, Inc. for audiometric thresholds and the Tympstar from Grason-Stadler Inc. for immittance testing. Conductive pathology was ruled out combining tympanometry, acoustic reflex testing, and restricting audiometric air- and bone-conduction gaps to 10 dB or less at a given frequency.

fMRI

Speech stimuli.

Items from the Coordinate Response Measure (CRM) speech corpus were obtained from source audio files (20-kHz sampling rate, 16-bit quantization) with a single male talker (Talker 0). Two versions of the items were derived from these audio files using a vocal-gender shifting procedure described by Venezia et al. (2020): (1) items spoken by a synthetic female talker with a mean fundamental frequency scaled by 1.7 and vocal tract length scaled by 0.84 relative to the original values, and (2) items spoken by the original (male) talker processed through the gender shift algorithm with mean fundamental frequency and vocal tract length each scaled by 1.0. Vocal-gender shifting was accomplished using the pitch-synchronous overlap-add procedure as implemented in Praat (Boersma, 2002), ensuring that synthetic female and male talkers would differ only in fundamental frequency and vocal tract length and not by other talker-specific speech patterns (i.e., speech rate or intonation contour). The difference in vocal pitch between the synthetic female and male talkers can be visualized on the MPS (Figure 1), which is obtained as the modulus of the 2-D Fourier transform of the log-magnitude speech spectrogram. On the MPS, the synthetic female talker appears at lower spectral modulation rates (higher pitch; ∼5–7.5 cyc/kHz) and the male talker appears at higher spectral modulation rates (lower pitch; ∼10–12.5 cyc/kHz). Phonetic speech content (formant-scale spectral components) shared across the two talkers appears at the lowest spectral modulation rates (< 2.5 cyc/kHz). Figure 1 can thus be used as a reference to the acoustic feature space underlying later plots showing the relative contributions of these features to task performance (see Figure 3); the figure further shows that the long-term average MPS is well matched across conditions (Unison, Competing).

Figure 1.

Average MPS over 400 randomly generated CRM items in the Unison (left panel) and Competing (right panel) conditions. The temporal modulation rate (Hz) is plotted along the abscissa, and the spectral modulation rate (cyc/kHz) is plotted along the ordinate. Speech energy clusters in three regions (marked by colored bars) that naturally separate along the ordinate: shared phonetic content (blue bar), female-talker vocal pitch (green bar), and male-talker vocal pitch (red bar).

Figure 1.

Average MPS over 400 randomly generated CRM items in the Unison (left panel) and Competing (right panel) conditions. The temporal modulation rate (Hz) is plotted along the abscissa, and the spectral modulation rate (cyc/kHz) is plotted along the ordinate. Speech energy clusters in three regions (marked by colored bars) that naturally separate along the ordinate: shared phonetic content (blue bar), female-talker vocal pitch (green bar), and male-talker vocal pitch (red bar).

Close modal

In the CRM corpus, each item takes the form “Ready [call sign] go to [color] [number] now.” There are eight possible call signs (arrow, baron, charlie, eagle, hopper, laker, ringo, and tiger), four possible colors (blue, green, red, and white), and eight possible numbers (1–8), yielding 256 possible items. The color and the number are the targets for speech recognition, yielding 32 possible color/number targets. Here, we present two-talker (synthetic female and male) mixtures at a + 3-dB target-to-masker ratio favoring the female talker. We further employ two conditions: Competing, in which the synthetic female and male talkers utter sentences with different call signs, colors, and numbers; and Unison, in which the synthetic female and male talkers utter sentences with the exact same call sign, color, and number (i.e., identical waveforms aside from the vocal gender-shift procedure). Listeners are asked to identify the color/number spoken by the female talker in the Competing condition and either/both talkers in the Unison condition. Crucially, only the Competing condition requires segregation of the target (synthetic female) talker from a competing (male) talker.

Spectrotemporal modulation filtering.

During fMRI scanning, two-talker mixtures in the Unison and Competing conditions were further modified through a spectrotemporal modulation filtering procedure called Auditory Bubbles (Venezia et al., 2016). Briefly, in Auditory Bubbles, a quasirandomly generated, binary multiplicative filter is applied to the MPS of a two-talker mixture such that some spectrotemporal patterns are retained in the signal whereas others are removed. Here, filters were applied only to MPS regions with spectral modulations ≤ 15 cyc/kHz and temporal modulations ≤ 50 Hz. The bubbles filters were created by: (1) generating an all-zero image of dimensions equal to the MPS, (2) setting a randomly-selected subset of pixels to a value of one (where the number of pixels is the number of “bubbles” in the filter), (3) applying a 2-D Gaussian filter with standard deviation 0.5 cyc/kHz in the spectral modulation dimension and 2 Hz in the temporal modulation dimension, and (4) binarizing the resulting image with a cutoff value of 0.1. The filters were applied to individual CRM mixtures as follows: (1) obtain a log-magnitude spectrogram with Gaussian windows, (2) perform a two-dimensional fast Fourier transform on the spectrogram, (3) multiply the bubbles filter with the modulus of the 2-D fast Fourier transform, (4) invert the modified 2-D Fourier spectrum to generate a filtered magnitude spectrogram, and (5) invert the magnitude spectrogram via phase gradient heap integration to obtain a filtered waveform (Venezia et al., 2020).

Task, procedure, and equipment.

In the scanner, each listener performed a modified 3-AFC version of the CRM task. After hearing a filtered two-talker mixture (see the sections Speech Stimuli and Spectrotemporal Modulation Filtering), listeners were presented immediately upon sound offset with three possible color/number targets displayed on a computer screen in a horizontal array (left, center, right) as filled squares (blue, green, red, or white) with a single digit in the center (1–8 in black font) with a uniform gray background. Of the three possible responses, one was always the color/number spoken by the target female talker. In the Competing condition, a second color/number was that spoken by the male talker and the third color/number was selected from one of five categories with equal probability: (i) same color as the female talker; (ii) same color as the male talker; (iii) same number as the female talker; (iv) same number as the male talker; or (v) neither the same color nor the same number as the female/male talkers. In the Unison condition, the second alternative was selected as if the male talker had uttered a different color/number than the female talker and the third alternative was chosen as described for the third alternative in the Competing condition.

Each trial was synchronized to the onset of an image acquisition period (repetition time [TR]). The auditory stimulus was presented 200 msec after TR onset and lasted 1.96 sec. The participant then had 2.5 sec to choose a response via button pad with the dominant hand. The next trial began 140 msec after the response period, yielding a total trial duration of 4.8 sec (four TRs). The correctness of the response and the RT were logged for each trial. Each listener completed 400 trials in the Competing condition on a first day of scanning and 400 trials in the Unison condition on a second day of scanning. One listener completed the Competing and then the Unison trials in separate sessions on the same day with a 2-hr break in between.

Listeners completed eight blocks of 50 trials in each condition with a break at the end of each block. In the Competing condition, the number of bubbles (see the section Spectrotemporal Modulation Filtering) applied on each trial was adjusted using a weighted staircase procedure (Kaernbach, 1991) targeting 66.67% correct performance. Two interleaved staircases with starting values of 50 and 150 bubbles, respectively, proceeded without being restarted at the beginning of each block. The number of bubbles was decreased by five (more acoustic distortion) with a correct response and increased by 10 (less acoustic distortion) with an incorrect response. In the Unison condition, the number of bubbles was set to exactly match the trial-by-trial trajectory in the Competing condition, thus ensuring an equivalent amount of overall acoustic distortion between the conditions. However, given the Unison task is easier than the Competing task at baseline, this also yielded better overall performance in the Unison condition (mean = 84.3% correct, SD = 4.4%) than the Competing condition (mean = 66.8% correct, SD = 1.3%).

At the scanner, speech signals were delivered diotically via an external sound card (Scarlett 2i2, Focusrite, Inc.), amplifier (AMP100VS, AudioSource, Inc.), and MRI-compatible insert earphones (Model S14, Sensimetrics, Inc.). Each stimulus was upsampled to 44100 Hz, processed with a digital equalization filter provided by Sensimetrics to produce a flat frequency response, attenuated to an overall level of 85 dB SPL, and gated on and off with a 10-msec raised cosine ramp. Listeners were fitted with isolating foam eartips (Comply Canal Tips, Hearing Components Inc.) and silicone-backed foam earmuffs (Hibermate, Inc.) or custom, low-profile earmuffs (foam backed with engineered wood). Visual stimuli were presented on an MRI-compatible digital display (SensaVue, Philips, Inc.) and viewed on an angled-front-surface mirror mounted on the head coil. Responses were collected with a four-button MRI-compatible response pad (LS-LINE, Cedrus, Inc.) and button response unit (Lumina, Cedrus Inc.). Psychtoolbox v3 (Kleiner et al., 2007) was used to control stimulus presentation, timing, scanner synchronization, and delivery of instructions.

Each listener also completed a practice session outside the scanner including verbal and written instructions, a single CRM trial with no bubbles and an unlimited response period, 10 CRM trials with no bubbles and a 2.5-sec response period, and 50 trials otherwise identical to the trial blocks performed inside the scanner including use of the same adaptive staircase procedure described above for the in-scanner task. Recorded MRI-scanner noise was played in the background to simulate the acoustic environment in the scanner. At the practice session, acoustic stimuli were delivered diotically via the internal sound card (C610/X99 chipset, Intel, Inc.) of a Linux desktop (Silverback Workstation, System76, Inc.) to insert earphones (ER-4, Etymotic, Inc.). Visual stimuli were displayed on an LCD monitor (VX2457-MHD, ViewSonic, Inc.), and responses were collected via mechanical keyboard (Dell, Inc.). The practice trials were controlled via Psychtoolbox v3.

Image acquisition.

Images were acquired on a 3 T MAGNETOM Skyra (Siemens, Inc.) with a 32-channel head coil located at the Loma Linda University East Campus Radiology Center. At each scan session, an anatomical scout, T1 anatomical volume, field map scan, and eight runs of BOLD functional volumes were acquired in that order. The functional volumes were acquired with a simultaneous multislice (SMS) EPI sequence without partial Fourier or in-plain acceleration (Smith et al., 2013) and with a 10% slice gap to minimize interslice signal leakage (Cauley, Polimeni, Bhat, Wald, & Setsompop, 2014); there were 56 axial slices oriented parallel to the anterior commissure-posterior commissure axis with interleaved slice order, multiband acceleration factor = 4, TR = 1.2 sec, echo time (TE) = 33 msec, flip angle = 45°, echo spacing = 0.61 msec, phase encoding direction = anterior-posterior, bandwidth = 2055 Hz/Px, voxel size = 2.5 × 2.5 × 2.75 mm (including 10% gap), and matrix = 84 × 84. In each functional run, 210 volumes were acquired to allow for 50 CRM trials (four volumes per trial) plus 10 additional volumes to capture the hemodynamic response to the final trials. Thus, each scan session produced 1680 functional volumes (400 total CRM trials). The field map scan parameters were TR = 0.557 sec, TE1 = 4.92 msec, TE2 = 7.38 msec, flip angle = 60°, and bandwidth = 510 Hz/Px, with the same slice prescription, slice thickness, voxel size, and matrix size as the functional volumes. The T1 anatomical volume was acquired with a magnetization prepared rapid gradient echo sequence with 176 sagittal slices, TR = 1.95 sec, TE = 2.32 msec, TI = 917 msec, flip angle = 8°, generalized autocalibrating partially parallel acquisitions acceleration factor = 2, bandwidth = 200 Hz/Px, voxel size = 0.9 × 0.938 × 0.938 mm, and matrix = 256 × 256.

Analysis

Image Preprocessing

Automated cortical surface reconstruction from the T1 anatomical volume collected during the Competing scan session was performed in Freesurfer v6.0 (https://surfer.nmr.mgh.harvard.edu). Right and left hemisphere cortical surface meshes were then converted to AFNI/SUMA (https://afni.nimh.nih.gov/Suma) format, co-registered to the native-space T1 volume, resampled to a standard topology via linear icosahedral tessellation with 64 edge divides, and merged into a single surface containing 81,924 nodes using the prep_afni_surf.py function of the surfing toolbox v0.6 (https://github.com/nno/surfing; Oosterhof, Wiestler, Downing, & Diedrichsen, 2011). A standard topology surface mesh was also generated from the Colin 27 template brain. The Unison T1 volume was aligned to the Competing T1 volume using a six-parameter rigid body transformation (antsRegsistrationSyn.sh) in ANTs v 3.0.0 (https://stnava.github.io/ANTs/). The preprocessed functional volumes from the Unison scan session were brought into alignment with the Competing data using the Unison-to-Competing T1 transformation matrix, and then co-registered to the Competing T1 volume to allow mapping to the standard topology cortical surface.

The images from the Competing and Unison scan sessions were otherwise preprocessed identically. Following conversion to NIFTI format using dcm2niix v1.0.2 (https://github.com/rordenlab/dcm2niix), images were slice-timing corrected in SPM12 (https://www.fil.ion.ucl.ac.uk/spm/software/spm12/) using the nii_SliceTime.m MATLAB script (https://www.mccauslandcenter.sc.edu/crnl/tools/stc). Motion and distortion (field map) correction were then performed in SPM12 via the CONN toolbox (RRID:SCR_009550 www.nitrc.org/projects/conn) release 18.b (Whitfield-Gabrieli & Nieto-Castanon, 2012). Automated tissue segmentation of the Competing T1 volume was performed in CONN, and the Unison data were aligned with the Competing data as described above. Co-registration (12-parameter affine) to the Competing T1 volume was then performed in SPM12 via CONN followed by regression-based denoising with quadratic detrending and the following covariates: six parameter rigid body motion-correction parameters and their temporal derivatives, five principle components extracted from the signals within each of the white matter and cerebrospinal fluid regions of interest defined during tissue segmentation, and scrubbing covariates determined from the CONN Artifact Rejection Tool with the liberal setting. Next, the preprocessed functional data were mapped to the standard topology surface mesh and smoothed to a target level of 4 mm FWHM using AFNI v19.0 (https://afni.nimh.nih.gov/). Finally, trial-wise beta activation time series were generated separately for the Unison and Competing conditions using the Least Squares-Separate (LSS; Mumford, Turner, Ashby, & Poldrack, 2012) algorithm as implemented in AFNI (3dLSS). The final product was two length-400 time series volumes in the cortical surface domain, one for Unison and one for Competing.

Behavioral Analysis: Bayesian HDDMs

The Wiener diffusion model, also known as the drift diffusion model, aims to characterize the processes underlying speeded binary decisions (e.g., a 2-AFC task). In a binary choice task, the model assumes that evidence is accumulated over time until a decision is made when an upper (Choice 1) or lower (Choice 2) decision boundary is reached. Two-choice RTs can thus be modeled using four parameters (Figure 2, left): β, the initial bias toward the upper or lower boundary; τ, the non-decision time capturing basic sensorimotor processes unrelated to the decision itself; α, the degree of separation between the upper and lower boundaries; and δ, the drift rate or the average rate and direction (i.e., slope) of evidence accumulation. The drift rate, δ, is somewhat analogous to the signal detection measure, d′, whereas α and β are somewhat analogous to the signal detection measures of decision criterion and response bias, respectively. In the context of speeded responses, α reflects the speed-accuracy tradeoff, where larger α will lead to more accurate but slower responding. Notably, the model is extensible to speeded decision tasks with more than two choices if the responses are reduced to two alternatives, namely, “correct” and “incorrect.” Referred to as accuracy coding, this will be employed in the present study to model 3-AFC responses (cf. Muraskin et al., 2018). In general, it is valid to employ accuracy coding with more than two alternatives when it can be assumed that the same processes underlie the different responses and when performance (RT, accuracy) is similar across responses (Voss, Nagler, & Lerche, 2013). In our case, the responses options are defined by location on the screen (left, center, right), and there is no reason to expect location to influence speech processing. Even if we assume a difference in difficulty between responses (e.g., center easier than left/right) or a prepotent response bias (preferred location-button mapping), it seems sensible to “average out” such effects by accuracy coding the responses. Moreover, it is useful here to recall that target responses and each type of nontarget response (counterbalanced across trials) were equally likely to appear in each response position. Thus, we argue that accuracy coding is justifiable for the present experiment. In the accuracy-coded model, the β parameter must be fixed to 0.5 (midway between the upper and lower boundaries; Voss, Voss, & Lerche, 2015).

Figure 2.

Analysis schematic. Trial-wise bubbles filters (top, black/white) from all participants are used as predictors in a Bayesian HDDM. The fixed effects regression coefficients of the HDDM compose a Drift Rate Classification Image. Trial-wise posterior predictions of the drift rate are then entered in a Bayesian hierarchical linear model relating the drift rate to trial-wise fMRI activation (LSS Time Series and Drift Rate Time Series shown for one example participant). The output of the Bayesian Hierarchical Linear Model is a logBF Map showing the strength of the evidence favoring a correlation between the trial-wise drift rate and the trial-wise brain response across all participants. The drift diffusion model figure (left) is taken from Wabersich and Vandekerckhove (2014; CC-BY License).

Figure 2.

Analysis schematic. Trial-wise bubbles filters (top, black/white) from all participants are used as predictors in a Bayesian HDDM. The fixed effects regression coefficients of the HDDM compose a Drift Rate Classification Image. Trial-wise posterior predictions of the drift rate are then entered in a Bayesian hierarchical linear model relating the drift rate to trial-wise fMRI activation (LSS Time Series and Drift Rate Time Series shown for one example participant). The output of the Bayesian Hierarchical Linear Model is a logBF Map showing the strength of the evidence favoring a correlation between the trial-wise drift rate and the trial-wise brain response across all participants. The drift diffusion model figure (left) is taken from Wabersich and Vandekerckhove (2014; CC-BY License).

Close modal
Vandekerckhove et al. (2011) describe a hierarchical framework for fitting the drift diffusion model to data from multiple participants. In this HDDM, each of the parameters described above is characterized by a hierarchical linear regression equation with fixed (population-level) and random (participant-level) effects. Given computational difficulties with the drift diffusion model itself and with hierarchical models generally, application of the HDDM is accomplished using Bayesian statistical methods. Here, we implement the HDDM within the brms package v2.14.4 (Bürkner, 2017) in the R statistical computing environment (R Core Team, 2019). At the highest level, we describe the model as:
(1)
where Yi is the random vector (Xi, Ti) denoting the ith response (correct, incorrect) and RT, and Wiener is shorthand for the joint density function of hitting the boundary Xi at time Ti. Each of the individual parameters is then modeled using a hierarchical generalized linear regression with fixed- (population-level) and random (participant-level) design matrices as implemented by brms (Bürkner, 2017).

Our goal is to predict the HDDM parameters, namely, α and δ, from trial-wise patterns of idiosyncratic acoustic distortion introduced by Auditory Bubbles (see the Spectrotemporal Modulation Filtering section; Figure 2, “HDDM”). The result is a “classification image” (Venezia et al., 2016, 2020; Venezia, Martin, et al., 2019) for each parameter showing how the parameter changes with the relative presence or absence of acoustic energy at different locations in the MPS (Figure 2, “Drift Rate Classification Image”; see also Figure 1). We produce these classification images separately in the Unison and Competing conditions. We then later use trial-wise posterior predictions of the HDDM parameters to predict trial-wise fMRI activation time series in the Unison and Competing conditions (see the sections Image Preprocessing and Regression of Trial-Wise Drift Rate on Trial-Wise Brain Response). To ease the computational burden of generating HDDM-based classification images, a set of 48 continuous “bubbles” predictors was generated by down-sampling and vectorizing the binary, 2-D bubbles filters presented to each listener on each trial. Before vectorizing, the down-sampled bubbles filters were 12 pixels in the spectral modulation dimension (0–12 cyc/kHz, 1 cyc/kHz/pixel) and 4 pixels in the temporal modulation dimension (0–20 Hz, 5 Hz/pixel). Below, we describe individual HDDMs using R model notation shorthand, wherein fixed and random effects are specifically enumerated; we refer to the length-48 bubbles feature vector as “Bub.”

Model specification.
The response vectors (accuracy coded RTs) from all trials (n = 400) in each condition (Unison, Competing) and each listener (n = 25) were submitted to HDDM analysis. Four different HDDMs were fitted to the data. A “full model,” in which both α and δ depend on Bub, was specified as follows:
(2)
where “1” denotes the intercept, “0” denotes suppression of the intercept, Cond denotes a factor variable encoding the experimental condition (Unison, Competing), Bub denotes the 48 “bubbles” features described above, “:” denotes an interaction, and terms enclosed in (…|Sub) denote random effects whereas unenclosed terms denote fixed effects. The term α0 was included as a nonlinear parameter in the regression equation for α so that separate population-level priors could be placed on the intercept and condition effects (1 + Cond, α) versus the stimulus effects (Cond:Bub; α0). The logit link function was imposed on α (bracketed terms only) as 3.4 * inv_logit(α) + 0.6 and τ as 0.4 * inv_logit(τ) + 0.1, and the identity link function was imposed on δ, β, and α0. For each model parameter, the fixed effects were assigned custom priors and the random effects were assigned the default brms priors. The fixed effects priors were set as follows: For δ, the intercept was assigned a Cauchy(0, 5) prior and the remaining fixed effects were jointly assigned the horseshoe prior (Carvalho, Polson, & Scott, 2009) with df = 3 and par_ratio = 3 (Bürkner, 2017); for α, the fixed effects were assigned a Normal(0, 1) prior; for α0, the fixed effects were jointly assigned the horseshoe prior with df = 3 and par_ratio = 3; for τ, the fixed intercept was assigned Normal(0, 1) prior; and for β, the fixed effect intercept was assigned a constant value of 0.5. For α and τ, the use of Normal(0, 1) priors with scaled/shifted logit link functions sets hard upper and lower limits on the range of parameter values. For example, 3.4 * inv_logit(α) + 0.6 constrains α to the range [0.6, 4]. The use of horseshoe priors on the Cond:Bub terms for δ and α0 provides regularization to the model by shrinking the coefficients toward zero. Note, the Cond:Bub term, which estimates the “bubbles” classification images described above, is included only as a fixed effect; that is, a single classification image is estimated for the entire group of listeners, rather than individualized classification images for each listener. This was necessary to ensure model convergence, but also justified given that listeners were expected to rely on very similar acoustic cues to perform the task (Venezia et al., 2020).

In addition to the full model, “drift rate” and “threshold” models were generated in which only the drift rate and threshold parameters, respectively, were allowed to be modulated by bubbles features. This allowed for the selection of the most parsimonious model via formal model comparison. The “drift rate” model was generated according to Equation 2, but the α0 term was removed from the model entirely. Thus, a bubbles classification image was only generated for the δ parameter. Similarly, a “threshold model” was generated according to Equation 2, but the Cond:Bub term was removed from the δ component and the fixed and random intercepts were suppressed such that the remaining Cond term was estimated as condition-specific intercepts for Unison and Competing; the Cauchy(0, 5) prior was applied to the fixed intercept for both conditions. Thus, in the threshold model, a bubbles classification image was only generated for the α parameter. Finally, a “null model” was generated according to Equation 2, but here, the α0 term was removed and the Cond:Bub term was removed from the δ component as just described for the drift rate and threshold models. Thus, the null model estimated only condition means for the δ and α parameters and did not allow these parameters to vary from trial to trial according to the acoustic distortion patterns imposed by bubbles filtering. The null model was included to test whether inclusion of bubbles predictor in the full, drift rate, and threshold models considerably improved the model fit relative to a means-only model.

Model fitting and comparison.

Model fitting was carried out in R using brms with the cmdstanr backend (v0.3.0.9000) and CmdStan v2.26.1 (Carpenter et al., 2017). Model comparison was carried out in R using brms to call the loo package v2.3.1 (Vehtari, Gelman, & Gabry, 2017). For all models (full, drift rate, threshold, null), the No-U-Turn sampling procedure (Hoffman & Gelman, 2014) was used to estimate the posterior distribution of model parameters, and 10 independent sampling chains were obtained with 1000 warmup samples and 3000 postwarmup samples, resulting in 30,000 total draws from the posterior. Such a high number of posterior samples was obtained because of high autocorrelation in the sampling chains for a subset of the parameters (in particular, fixed effects on α0 and τ). The values of adapt_delta and max_treedepth were set to 0.99 and 12, respectively, and all parameters were initialized to a randomly selected value within a manually defined range optimized for each individual parameter. Across all models, no divergent transitions were observed and no samples exceeded the maximum tree depth. Following Houpt and Bittner (2018), two criteria were used to ensure the samples adequately represented the posterior distribution: (1) Visual checks indicated good convergence and good mixing, and the Gelman-Rubin Rˆ was less than 1.01 for all parameters; and (2) the smallest bulk and tail effective sample sizes (Carpenter et al., 2017) across all parameters were still adequately large to characterize the posterior density.

The best fitting model was determined via approximate leave-one-out (LOO) cross-validation with exact LOO subsampling (Magnusson, Vehtari, Jonasson, & Andersen, 2020). A random sample of 1000 trials was drawn for LOO subsampling, and the same trial indices were maintained across all models to ensure a fair comparison. This method identifies the best fitting model as that with the maximum expected log predictive density (elpd_loo), which is to say the model with the best predictive performance and greatest ability to generalize to new, unseen data. A given model was taken to have significantly better predictive performance if its elpd_loo exceeded that of another model by more than four standard errors of the estimated difference in elpd_loo.

Trial-wise predicted drift rate.

Upon model comparison, the drift rate model was identified as the best model considering the combination of predictive performance and number of free parameters. An estimate of drift rate for each trial in each condition and each listener was then obtained from the model as the mean of the posterior samples of the expected value of the posterior predictive distribution (Figure 2, “Posterior Predict”). Notably, the only predictors that would cause the predicted drift rate to vary from trial to trial were the elements of the Cond:Bub term (Equation 2, δ), which captures the effect of the 48 “bubbles” acoustic distortion features (see section Behavioral Analysis: Bayesian Hierarchical Drift Diffusion Models) separately for each condition. In other words, the predicted drift rate reflects the relative presence or absence of behaviorally relevant stimulus features. A second drift rate estimate was then obtained in the same way except that all bubbles features other than those associated with the pitch of the target (female) talker were “masked” (set to all zero in the design matrix) from the analysis. We refer to this as the “pitch-restricted” drift rate, which reflects only those components of the drift rate that were modulated by the relative presence or absence of acoustic cues related to the vocal pitch of the female talker.

Regression of Trial-wise Drift Rate on Trial-wise Brain Response

A hierarchical Bayesian linear regression model (Figure 2, “Bayesian Hierarchical Linear Model”) was used to estimate the relation between trial-wise drift rate (see section Trial-Wise Predicted Drift Rate) and trial-wise fMRI activation (see section Image Preprocessing) across all listeners, separately for each condition (Unison, Competing). Henceforth, we refer to this model as a brain-behavior-stimulus model, because the model reflects the relation between stimulus features and the brain response after imposing a “behavioral relevance” filter (the HDDM) on the stimulus features. Before the analysis, drift rate and fMRI activation estimates were z-scored within each condition and each listener. Thus, the hierarchical linear model was essentially a hierarchical correlation analysis. The analysis was performed using the BayesFactor package v0.9.12–4.2 (Morey & Rouder, 2018). For the analysis, the z-scored drift rate estimates across all trials and listeners were separated by condition. Next, at each cortical surface node, the z-scored fMRI activation time series from all participants were concatenated separately for each condition, and the following model was estimated separately for the Unison and Competing data (here again using R model notation shorthand, as in Equation 2):
(3)
where y is trial-wise fMRI activation at a given surface node, dr is trial-wise drift rate, and (…|sub) denotes the random (participant-level) effect of drift rate whereas the unenclosed term denotes the fixed (group-level) effect of drift rate. A second, “null” model was then estimated without the fixed effect of drift rate. The key outcome was the Bayes Factor expressing the ratio of the evidence in favor of the model described in Equation 3 to the evidence in favor of the null model (BF10). In other words, the BF10 expresses the evidence in favor of a group-level effect of drift rate on fMRI activation. The fixed-effect drift rate regression coefficient was also saved, primarily to capture the direction of the correlation between drift rate and fMRI activation. The BF10 was estimated at each cortical surface node separately for Unison and Competing (Figure 2, “Bayes Factor Map”). The resulting Unison and Competing BF10 maps were thresholded at BF10 > 100 to determine cortical surface nodes at which the correlation between drift rate and fMRI activation was statistically significant. This threshold, which can be expressed in terms of the log Bayes Factor (logBF) as |logBF| > 4.61, is in line with recommendations from Han and Park (2018), who suggest a second-level cutoff of logBF > 5 for the directional hypothesis testing for positive activation against no or negative activation. The entire brain-behavior-stimulus regression analysis was run separately with the overall drift rate and the “pitch-restricted” drift rate as the predictor in Equation 3.

Post Hoc Analysis: Effect of Trial-wise Difficulty

Although trial-wise predicted drift rates reflect primarily information about the stimulus, it is nonetheless likely that predicted trial-wise drift rates are correlated with trial-wise difficulty because, by definition, predicted drift rates reflect those stimulus features that were relevant to task performance. In the case of a common decision boundary across trials, as is true for our drift rate model, trial-wise difficulty (inverse confidence) is proportional to the square root of the total decision time (RT minus non-decision time; Philiastides, Heekeren, & Sajda, 2014). Thus, for each trial, i, and participant, j, a “difficulty proxy” (DP) was obtained as:
(4)
where RTi is the RT observed on a given trial and τj is the posterior mean of the participant-specific non-decision time parameter obtained from the HDDM. To determine whether the DP could account for the effect of predicted drift rate on the fMRI response in the brain-behavior-stimulus model, a mediation model was developed in which DP was treated as the mediator and pitch-restricted drift rate was treated as the effect of interest. We focus on pitch-restricted drift rate because this was the case for which it was most critical to disambiguate effects of trial-wise stimulus features from those of trial-wise difficulty. As such, the following multivariate regression model was developed:
(5)
where DP is the trial-wise DP, dr is the trial-wise pitch-restricted drift rate, resp is a categorical variable representing whether the response on a given trial was correct or incorrect, y is the trial-wise fMRI activation amplitude in an ROI identified by the brain-behavior-stimulus model, “*” represents an expanded interaction term, and (…|sub) denotes random (participant-level) effects. Separate models were estimated for the Competing and Unison conditions. The continuous variables were all z-scored within participant and condition for the analysis. The model was fitted using a multivariate, hierarchical-linear Bayesian framework in brms with uncorrelated response variables. Four independent sampling chains each with 1000 warmup samples and 3000 postwarmup samples were obtained. A Normal(0, 5) prior was set on all fixed effects, and brms default priors were set on all random effects. Evaluation of model convergence was performed just as for the HDDMs.

Trials were split by response category (resp; correct/incorrect) because it was expected that the relation between pitch-restricted drift rate and DP would be moderated by response category. Namely, we expected difficulty to increase with increasing pitch-restricted drift rate for incorrect trials and decrease with increasing pitch-restricted drift rate for correct trials, particularly in the Competing condition. The logic here is that speech on incorrect trials was likely to be heavily distorted, so absence of pitch (low pitch-restricted drift rate) on these trials would result in random guessing (low RT and thus low DP) because of a failure to segregate the already-distorted talkers, whereas presence of pitch (high pitch-restricted drift rate) would result in informed guessing (higher RT and thus higher DP). On the other hand, on correct trials for which speech was likely less distorted, presence of pitch would enable easier identification of the target talker (lower RT and thus lower DP), whereas absence of pitch would make it more difficult to identify the target talker (higher RT and thus higher DP). A similar effect might be expected in the Unison condition wherein presence of pitch would modulate DP when speech is heavily distorted (incorrect trials) but not when speech is clear (correct trials).

As such, if response category moderates the effect of pitch-restricted drift rate on DP, and if DP mediates the effect of pitch-restricted drift rate on fMRI response amplitude, then we expect a moderated mediation effect. The posterior distribution of the moderated mediation effect was obtained following Wang and Preacher (2015; their Equation 4). Briefly, the indirect (moderated mediation) effect of pitch-restricted drift rate on fMRI activation (i.e., that mediated by DP) is obtained as the product of the resp * dr effect on DP and the resp * DP effect on y in Equation 5. The direct effect of pitch-restricted drift rate on fMRI activation is obtained as the resp * dr effect on y in Equation 5. The moderated mediation was considered significant if the posterior 95% credible interval of the indirect effect did not contain zero. In the case of a significant mediation, a point estimate (posterior median) of the proportion of the effect mediated (indirect effect divided by the sum of the direct and indirect effects) was also obtained.

Bayesian HDDMs

Briefly, the primary goal of HDDM analysis was to determine which drift diffusion model parameters, if any, covaried with trial-by-trial distortion in the acoustic speech signal as characterized by the shape of spectrotemporal modulation filters generated quasirandomly on each trial (see section Behavioral Analysis: Bayesian Hierarchical Drift Diffusion Models). Table 1 shows the results of model comparisons for HDDMs in which the drift rate and threshold parameters (full model), drift rate parameter only (drift rate model), threshold parameter only (threshold model), or no parameters (null model) were allowed to be modulated by the acoustic distortion predictors. If the drift rate and/or threshold parameters covary significantly with acoustic distortion, then at least one of the full, drift rate, or threshold models should outperform the null model. Ultimately, the full model had the highest elpd_loo, which means that it was best able to predict trial-by-trial, accuracy-coded RTs in an LOO cross-validation procedure (i.e., on unseen data). In Table 1, the elpd_loo of the best (full) model is listed as zero, and the elpd_loo of the remaining models is described relative to the best model (elpd_diff; best model minus comparison model). For a given comparison model, the full model is taken to generate significantly better predictions on unseen data when the elpd_diff is more than 4 times its standard error. Thus, the full model made significantly better predictions than the remaining three models. However, inspection of model-predictive performance by condition (Unison, Competing) revealed that the difference between the full model and the drift rate model was driven almost entirely by the Unison condition (elpd_diff ∼ 140) as opposed to the Competing condition (elpd_diff ∼ 10). Thus, we proceeded with the simpler drift rate model whose predictions were significantly better than the threshold and null models in both conditions (all condition-wise elpd_diff > 900, all total elpd_diff > 20 SE). The trial-wise posterior mean estimates of drift rate were either perfectly correlated (Competing, r = 1) or nearly perfectly correlated (Unison, r = .9997) between the full and drift rate models, indicating that our subsequent analyses would be unaffected by choosing the drift rate model over the full model.

Table 1.

Results of Bayesian Model Comparisons Using Approximate LOO Cross-Validation with Subsampling

Model Nameelpd_diff (SE)
Full 0.0 
Drift rate 149.6 (18.8) 
Threshold 2230.1 (77.4) 
Null 2233.4 (77.4) 
Model Nameelpd_diff (SE)
Full 0.0 
Drift rate 149.6 (18.8) 
Threshold 2230.1 (77.4) 
Null 2233.4 (77.4) 

To examine precisely which acoustic speech features modulated the drift rate parameter, we examine the fixed-effects regression coefficients for the acoustic distortion predictors (Equation 2, δ: Cond:Bub). Recall, these predictors were defined in the domain of the two-dimensional MPS (Figure 1), with 48 predictors spanning temporal modulations from 0 to 20 Hz in 5-Hz steps and spectral modulations from 0 to 12 cyc/kHz in 1 cyc/kHz steps. The regression coefficients thus define a “classification image” showing the relative direction and magnitude of the effect that filtering these discrete segments of the MPS exerted on the drift rate parameter. Large positive drift rates correspond to fast, accurate responses, and small positive drift rates correspond to slow, accurate responses. Correspondingly, large negative drift rates correspond to fast, inaccurate responses and small negative drift rates correspond to slow, inaccurate responses. Therefore, a positive regression coefficient suggests that the presence of a given MPS segment leads to more positive drift rates (better performance) and the absence of that segment leads to more negative drift rates (worse performance). Correspondingly, a negative regression coefficient suggests that the presence (absence) of a given MPS segment leads to worse (better) performance. For visualization in the figures that follow, the drift rate classification images were upsampled from 4 × 12 in the acoustic predictor space to 40 × 121, the original dimensions of the MPS spanning 0–20 Hz and 0–12 cyc/kHz.

Figure 3 plots the posterior predictive distribution of the drift rate parameter (A) along with drift rate classification images in the Competing (B) and Unison (C) conditions, and finally a “difference classification image” showing the contrast “Competing minus Unison” (D). Each of the classification images is shown unthresholded on the left and thresholded on the right such that pixels are colored white when the 95% highest density interval of the regression coefficient contains zero; otherwise, the color scale reflects the posterior mean of the regression coefficient. In Figure 3A, the drift rate distributions reflect all trials across all participants, but the participant-specific means have been subtracted and the grand mean added back into the data; thus, the distributions reflect trends in within-participant spread around the grand mean. From Figure 3A, we see that the distribution of overall drift rate (blue) was broad in both conditions (Unison, Competing), reflecting a range of performance levels across trials, but centered on more positive drift rates overall in the Unison condition, which is the expected pattern given the Unison task was easier overall. Furthermore, we see that the distribution of pitch-restricted drift rate (red) was narrower in both conditions, reflecting the fact that pitch-restricted drift rate is only a subcomponent of the overall drift rate. However, the distribution of pitch-restricted drift rate was especially narrow in the Unison condition, which reflects the fact that pitch contributed relatively little to performance in the Unison condition, as will become apparent from Figure 3BD.

Figure 3.

Results of HDDM analysis. (A) Posterior predictive distributions (PPDs; histograms) of the overall (blue) and pitch-restricted drift rate (red) across all trials in all participants, shown for the Unison (left) and Competing (right) conditions. (B) Drift rate classification image (CImg) in the Competing condition (left), also shown thresholded (right) such that only pixels for which the 95% Bayesian highest density interval (HDI) does not contain zero are assigned a color from the heat map. (C) As in B, but for the Unison condition. (D) Direct contrast between the Competing drift rate CImg and the Unison drift rate CImg (left), also thresholded (right) such that only pixels for which the 95% HDI of the contrast coefficient does not contain zero are assigned a color from the heat map. Note, B–D are plotted in the MPS domain and marked with blue (shared phonetic content), green (female-talker vocal pitch), and red (male-talker vocal pitch) bars along the ordinate just as in Figure 1; the color scale reflects normalized regression coefficients (beta weights; B–C) or the linear contrast thereof (D).

Figure 3.

Results of HDDM analysis. (A) Posterior predictive distributions (PPDs; histograms) of the overall (blue) and pitch-restricted drift rate (red) across all trials in all participants, shown for the Unison (left) and Competing (right) conditions. (B) Drift rate classification image (CImg) in the Competing condition (left), also shown thresholded (right) such that only pixels for which the 95% Bayesian highest density interval (HDI) does not contain zero are assigned a color from the heat map. (C) As in B, but for the Unison condition. (D) Direct contrast between the Competing drift rate CImg and the Unison drift rate CImg (left), also thresholded (right) such that only pixels for which the 95% HDI of the contrast coefficient does not contain zero are assigned a color from the heat map. Note, B–D are plotted in the MPS domain and marked with blue (shared phonetic content), green (female-talker vocal pitch), and red (male-talker vocal pitch) bars along the ordinate just as in Figure 1; the color scale reflects normalized regression coefficients (beta weights; B–C) or the linear contrast thereof (D).

Close modal

From Figure 3B, we see in the Competing condition that MPS regions associated with shared phonetic content (blue bar, ordinate) and vocal pitch of the target talker (green bar, ordinate) made strong, significant positive contributions to drift rate. The vocal pitch of the competing talker (red bar, ordinate) and a subset of the phonetic content region at higher spectral modulation rates (> 1 cyc/kHz) made significant, but relatively small, negative contributions to drift rate, meaning that presence of this acoustic information in the signal tended to disrupt performance. From Figure 3C, we see in the Unison condition that, again, MPS regions associated with shared phonetic content and vocal pitch of the target talker made significant positive contributions to drift rate, but the magnitude of the pitch-related contributions was far lower than in the Competing condition; no MPS regions made significant negative contributions to drift rate. Figure 3D, the direct contrast between conditions, confirms that the vocal pitch of the target talker made a significantly larger, positive contribution to drift rate in the Competing condition than the Unison condition, reflected by a significant positive contrast for Competing versus Unison. Conversely, shared phonetic content made a significantly larger, positive contribution to drift rate in the Unison condition than the Competing condition, reflected by a significant negative contrast for Competing versus Unison. Finally, the MPS regions that made significant negative contributions to drift rate in the Competing condition but not the Unison condition—vocal pitch of the competing talker and shared phonetic content at higher spectral modulation rates—were also significantly negative in the direct contrast.

Together, these results suggest: (i) vocal pitch of the target talker contributed relatively more to performance in Competing than Unison; (ii) shared phonetic content contributed relatively more to performance in Unison than Competing; and (iii) interference from the competing talker was observed in Competing but not Unison. Crucially, posterior predictions of pitch-restricted drift rate were generated from the within-condition classification images using only MPS regions associated with the vocal pitch of the target talker that were significantly larger in the Competing condition than the Unison condition—that is, those regions for which the presence of target-talker vocal pitch allowed participants to perform better in the Competing condition by perceptually segregating the two talkers, whereas such segregation was not necessary to perform well in the Unison condition. Therefore, the range of predicted pitch-restricted drift rates was relatively narrow across trials in the Unison condition (Figure 3A). On the other hand, posterior predictions of overall drift rate were generated from the entire within-condition classification images, thus reflecting the totality of the influence that different MPS regions exerted on performance in each condition. It is also crucial to bear in mind that predicted drift rates reflect information about the stimulus features available to the listener on a given trial, subject to the constraint that such features are modified by a “behavioral relevance” filter—in concrete terms, predicted drift rates are the product of trial-wise bubbles filters with the drift rate classification images plotted in Figure 3BC (left column).

Bayesian Brain-Behavior-Stimulus Analysis

The key aim of this study was to map out brain regions for which posterior predictions of trial-wise drift rate (overall and pitch-restricted) were correlated with trial-wise fMRI activation, separately for each condition. This was accomplished via a Bayesian hierarchical linear model relating trial-wise predicted drift rate to trial-wise activation. We refer to the model as a brain-behavior-stimulus model because it maps the relation of behaviorally relevant stimulus features (predicted drift rates) to fMRI activation. The brain-behavior-stimulus model was fitted to all trials from all participants, separately for each cortical surface node, and included fixed and random effect terms. Recalling from the Introduction, we predicted that: (i) overall drift rate would be correlated with fMRI activation in dPM and temporal lobe speech regions in the Competing and Unison conditions and (ii) pitch-restricted drift rate would be correlated with fMRI activation in dPM and temporal regions sensitive to pitch and/or voice only in the Competing condition.

Figure 4A shows the results of the brain-behavior-stimulus model using the overall drift rate as the predictor. In both the Unison and Competing conditions, there was a significant (BF10 > 100) positive correlation between overall drift rate and the brain response in a bilateral network including early supratemporal auditory regions (Heschl's gyrus and surrounds); superior temporal gyrus; anterior temporal lobe; parieto-occipital junction; midline frontal, parietal, and occipital regions; and peri-central (pre-)motor and somatosensory regions extending into the parietal cortex (Table 2 gives the Montreal Neurological Institute [MNI] coordinates of cluster peak correlations with drift rate). Positive correlations in midline areas were generally stronger and more widespread in the Unison condition, as were positive correlations in the parieto-occipital junction including the angular gyrus. Positive correlations in AC were generally stronger and more widespread in the Competing condition, with significant correlations spreading into the posterior and anterior superior temporal gyrus/sulcus and the posterior middle temporal gyrus. Significant positive correlations also spanned more ventrally in the left speech-premotor cortex in the Competing condition, and, notably, positive correlations in the midline occipital cortex extended into the calcarine sulcus in the Competing condition but not the Unison condition. Finally, significant positive correlations were observed in left dPM and temporal lobe speech regions in both conditions, confirming our Hypothesis (i). In the Competing condition, the peak positive correlation in the motor cortex was located in left dPM.

Figure 4.

Bayesian Brain-Behavior-Stimulus Analysis. (A) Regression of overall drift rate on fMRI response amplitude in the Unison (left) and Competing (right) conditions. The heatmap shows the magnitude of the fixed effect regression coefficient, which, given that drift rate and fMRI response amplitude were z-scored across trials within participants and conditions, are essentially Pearson correlation values (r). Correlations are thresholded at BF10 > 100 for the fixed effect regression coefficient. (B) Regression of pitch-restricted drift rate on fMRI response amplitude in the Unison (left) and Competing (right) conditions. Heat map and thresholding as in (A). All plots displayed on an inflated cortical surface rendering of the Colin 27 template in MNI space.

Figure 4.

Bayesian Brain-Behavior-Stimulus Analysis. (A) Regression of overall drift rate on fMRI response amplitude in the Unison (left) and Competing (right) conditions. The heatmap shows the magnitude of the fixed effect regression coefficient, which, given that drift rate and fMRI response amplitude were z-scored across trials within participants and conditions, are essentially Pearson correlation values (r). Correlations are thresholded at BF10 > 100 for the fixed effect regression coefficient. (B) Regression of pitch-restricted drift rate on fMRI response amplitude in the Unison (left) and Competing (right) conditions. Heat map and thresholding as in (A). All plots displayed on an inflated cortical surface rendering of the Colin 27 template in MNI space.

Close modal
Table 2.

Location of Cluster Peaks for Brain-behavior-stimulus Model

Overall Drift Rate (Competing)Overall Drift Rate (Unison)
Pos. CorrSA mm2xyzrlogBFPos. CorrSA mm2xyzrlogBF
L STG (TE 3) 2974 −61 −16 0.29 433 L STG (TE 3) 3429 −60 −17 0.26 359 
L Ling G (17) 2961 −12 −83 0.08 28 R STG (TE 3) 2703 66 −10 0.22 250 
R STG (TE 3) 1877 63 −6 −10 0.23 271 L Calc S (18) 2061 −10 −58 0.08 33 
R Calc S (17) 1484 17 −80 0.08 28 R Pos Cing G 1831 −44 19 0.08 31 
L PrCG (6) 925 −49 −5 42 0.11 54 L ANG (IPC PGa) 1813 −38 −55 20 0.09 41 
L MTG (IPC PGp) 891 −52 −67 18 0.07 24 L STG (TE 1.2) 1491 −53 −2 0.08 26 
R PreCun (17/18) 701 −49 14 0.06 18 L Ant Cing G 1110 −6 40 0.07 20 
L SMA (6) 521 −11 −2 49 0.05 R ANG (IPC PGp) 1029 42 −61 26 0.07 25 
R PreCun (SPL 5 L) 455 13 −59 70 0.07 19 R G Rectus 962 49 −19 0.07 22 
R MTG 301 58 −4 −22 0.06 16 L Med Front G 722 −5 53 36 0.07 19 
  
Neg. CorrSA mm2xyzrlogBFNeg. CorrSA mm2xyzrlogBF
R IFGtr (45) 2958 53 24 10 −0.09 37 L INS 6007 −28 26 −3 −0.18 154 
R Mid Cing G 2813 24 34 −0.11 56 R INS 4590 31 28 −2 −0.17 135 
L Med Front G 2313 −6 21 41 −0.11 62 R SMA (6) 2921 17 48 −0.14 98 
L MFG 1685 −34 45 14 −0.07 20 L Med Front G 2372 −6 23 40 −0.16 122 
L INS 1192 −32 16 −1 −0.10 42 L IPL (IPC PF) 1567 −51 −37 40 −0.11 56 
L S Occ G (17) 1075 −16 −87 15 −0.08 28 L Mid Occ G (18) 990 −22 −86 15 −0.07 22 
R STG (OP 1) 806 44 −34 10 −0.07 25 R ANG (SPL 7A) 834 28 −65 49 −0.09 38 
L IPL (IPC PF) 654 −56 −42 44 −0.07 21 L PreCun (SPL 7P) 476 −8 −78 50 −0.08 30 
L MFG 631 −40 37 31 −0.06 17 L Mid Cing G 451 −8 −23 33 −0.07 21 
R INS (ld1) 457 40 −10 −12 −0.10 44 R SPL (SPL 7P) 413 16 −79 48 −0.07 23 
  
Pitch Restricted Drift Rate (Competing)Pitch Restricted Drift Rate (Unison)
Pos. CorrSA mm2xyzrlogBFPos. CorrSA mm2xyzrlogBF
R STG (TE 3) 854 63 −4 −8 0.12 68 R STG (TE 3) 692 63 −6 −10 0.09 42 
L STG (TE 1.2) 812 −56 −8 0.12 66 L STG (TE 1.2) 596 −53 −11 0.09 41 
L PrCG (6) 54 −48 −4 48 0.04               
  
Neg. CorrSA mm2xyzrlogBFNeg. CorrSA mm2xyzrlogBF
L IFGtr (44) 1010 −37 16 25 −0.06 13 R IFGtr (45) 31 48 19 23 −0.04 
L Med Front G 842 −6 21 41 −0.08 30               
R Mid Cing G 764 24 35 −0.07 19               
R IFGtr (45) 598 47 25 26 −0.07 21               
L INS 240 −31 17 −0.06 13               
R INS/IFGop 218 31 27 −4 −0.05 10               
R PreCun (SPL 7A) 190 10 −58 47 −0.05 11               
L IFGtr (44/45) 116 −44 21 13 −0.05 12               
L Mid Occ G (18) 102 −24 −91 −0.04               
R MFG 77 37 33 34 −0.06 13               
Overall Drift Rate (Competing)Overall Drift Rate (Unison)
Pos. CorrSA mm2xyzrlogBFPos. CorrSA mm2xyzrlogBF
L STG (TE 3) 2974 −61 −16 0.29 433 L STG (TE 3) 3429 −60 −17 0.26 359 
L Ling G (17) 2961 −12 −83 0.08 28 R STG (TE 3) 2703 66 −10 0.22 250 
R STG (TE 3) 1877 63 −6 −10 0.23 271 L Calc S (18) 2061 −10 −58 0.08 33 
R Calc S (17) 1484 17 −80 0.08 28 R Pos Cing G 1831 −44 19 0.08 31 
L PrCG (6) 925 −49 −5 42 0.11 54 L ANG (IPC PGa) 1813 −38 −55 20 0.09 41 
L MTG (IPC PGp) 891 −52 −67 18 0.07 24 L STG (TE 1.2) 1491 −53 −2 0.08 26 
R PreCun (17/18) 701 −49 14 0.06 18 L Ant Cing G 1110 −6 40 0.07 20 
L SMA (6) 521 −11 −2 49 0.05 R ANG (IPC PGp) 1029 42 −61 26 0.07 25 
R PreCun (SPL 5 L) 455 13 −59 70 0.07 19 R G Rectus 962 49 −19 0.07 22 
R MTG 301 58 −4 −22 0.06 16 L Med Front G 722 −5 53 36 0.07 19 
  
Neg. CorrSA mm2xyzrlogBFNeg. CorrSA mm2xyzrlogBF
R IFGtr (45) 2958 53 24 10 −0.09 37 L INS 6007 −28 26 −3 −0.18 154 
R Mid Cing G 2813 24 34 −0.11 56 R INS 4590 31 28 −2 −0.17 135 
L Med Front G 2313 −6 21 41 −0.11 62 R SMA (6) 2921 17 48 −0.14 98 
L MFG 1685 −34 45 14 −0.07 20 L Med Front G 2372 −6 23 40 −0.16 122 
L INS 1192 −32 16 −1 −0.10 42 L IPL (IPC PF) 1567 −51 −37 40 −0.11 56 
L S Occ G (17) 1075 −16 −87 15 −0.08 28 L Mid Occ G (18) 990 −22 −86 15 −0.07 22 
R STG (OP 1) 806 44 −34 10 −0.07 25 R ANG (SPL 7A) 834 28 −65 49 −0.09 38 
L IPL (IPC PF) 654 −56 −42 44 −0.07 21 L PreCun (SPL 7P) 476 −8 −78 50 −0.08 30 
L MFG 631 −40 37 31 −0.06 17 L Mid Cing G 451 −8 −23 33 −0.07 21 
R INS (ld1) 457 40 −10 −12 −0.10 44 R SPL (SPL 7P) 413 16 −79 48 −0.07 23 
  
Pitch Restricted Drift Rate (Competing)Pitch Restricted Drift Rate (Unison)
Pos. CorrSA mm2xyzrlogBFPos. CorrSA mm2xyzrlogBF
R STG (TE 3) 854 63 −4 −8 0.12 68 R STG (TE 3) 692 63 −6 −10 0.09 42 
L STG (TE 1.2) 812 −56 −8 0.12 66 L STG (TE 1.2) 596 −53 −11 0.09 41 
L PrCG (6) 54 −48 −4 48 0.04               
  
Neg. CorrSA mm2xyzrlogBFNeg. CorrSA mm2xyzrlogBF
L IFGtr (44) 1010 −37 16 25 −0.06 13 R IFGtr (45) 31 48 19 23 −0.04 
L Med Front G 842 −6 21 41 −0.08 30               
R Mid Cing G 764 24 35 −0.07 19               
R IFGtr (45) 598 47 25 26 −0.07 21               
L INS 240 −31 17 −0.06 13               
R INS/IFGop 218 31 27 −4 −0.05 10               
R PreCun (SPL 7A) 190 10 −58 47 −0.05 11               
L IFGtr (44/45) 116 −44 21 13 −0.05 12               
L Mid Occ G (18) 102 −24 −91 −0.04               
R MFG 77 37 33 34 −0.06 13               

Top 10 clusters shown for regions with positive and negative fixed-effects correlations (r); if fewer than 10 clusters, all clusters with surface area (SA) greater than 30 mm2 are shown. Cytoarchitectonic regions shown in parentheses after region name when available.

A large network of areas with significant negative correlations between overall drift rate and the brain response was observed in both conditions. This included bilateral middle and inferior frontal regions, inferior fronto-opercular regions, inferior parietal cortex, planum temporale, and supplementary motor area (Table 2). Negative correlations within this network were generally stronger and more widespread in the Unison condition. We have previously referred to this network broadly as the “distortion network” because it tends to respond in situations that require greater listening effort (Herrera et al., 2021), consistent with the negative drift rate correlation observed here.

Figure 4B shows the results of the brain-behavior-stimulus model using the pitch-restricted drift rate as the predictor. In both conditions, there was a significant positive correlation between pitch-restricted drift rate and the brain response in anterolateral Heschl's gyrus and the immediately surrounding superior temporal gyrus. This effect tended to be stronger and more widespread in the Competing condition. A significant positive correlation was also observed in bilateral calcarine sulcus, but only in the Competing condition. Significant negative correlations were observed within components of the “distortion network,” but only in the Competing condition. Crucially, a significant positive correlation between pitch-restricted drift rate and the brain response was observed in left dPM, but only in the Competing condition. Thus, our Hypothesis (ii) was confirmed for left dPM but not temporal lobe areas sensitive to pitch, which demonstrated significant positive correlations with pitch-restricted drift rates in both conditions. Although we did not perform a direct contrast between correlations in the Competing and Unison conditions, the geometric mean of BF10 in the region of left dPM for which a significant correlation with pitch-restricted drift rate was observed (Figure 4B, Competing) was 254 in the Competing condition, suggesting strong evidence for the presence of a correlation, and 0.22 in the Unison condition, suggesting modest evidence against the presence of a correlation (i.e., a Bayes Factor of 4.5 in favor of a null effect).

Post Hoc Analysis: Effect of Trial-wise Difficulty

As we have described, the brain-behavior-stimulus model is designed to identify brain regions that respond to behaviorally relevant stimulus features. Indeed, the predictor in the model, trial-wise drift rate, reflects primarily those stimulus features that are heavily weighted in the drift rate classification images produced by the HDDM (Figure 3B/C). This is perhaps especially true for pitch-restricted drift rate, which further ignores stimulus features related to overall speech intelligibility (shared phonetic content) and focuses only on the relative presence or absence of vocal pitch on a given trial. However, it remains possible that pitch-restricted drift rate is correlated with task difficulty across trials, particularly in the Competing condition where pitch makes a strong contribution to task performance (Figure 3D). To determine whether task difficulty mediated the effects of pitch-restricted drift rate in the brain-behavior-stimulus model (Figure 4B), we conducted a post hoc mediation analysis as described in section Post-Hoc Analysis: Effect of Trial-Wise Difficulty. The analysis focused on three ROIs identified by the brain-behavior-stimulus model of pitch-restricted drift rate in the Competing condition: (i) bilateral AC, (ii) left dPM, and (iii) left middle pFC. These ROIs were selected because they appeared in our a priori hypotheses (i/ii) or were shown in our previous work (Herrera et al., 2021) to be modulated by task difficulty (iii). ROIs were selected from the results in the Competing condition because, as expected, this yielded more robust and extensive responses to pitch. Trial-wise activation time series were extracted from each ROI and averaged across all surface nodes in the ROI.

The mediation analysis proceeded in three steps. First, a trial-wise DP was extracted from the HDDM as described in the section Post-Hoc Analysis: Effect of Trial-Wise Difficulty. Next, the effect of trial-wise pitch-restricted drift rate on trial-wise DP was estimated using a hierarchical Bayesian linear model. Finally, the effects of trial-wise DP and trial-wise pitch restricted drift rate on trial-wise fMRI activation were estimated using a hierarchical Bayesian linear model. This was performed separately for each condition (Competing, Unison) and ROI. The effects of the predictors were allowed to interact with response category (correct vs. incorrect trials) because we expected the relation between pitch-restricted drift rate and trial-wise DP would be modulated by response category. Thus, the crucial question was whether the interactive effect of response category and pitch-restricted drift rate on fMRI activation was mediated by an interactive effect of response category and pitch-restricted drift rate on DP—that is, was there evidence for a moderated mediation effect?

The results of the moderated mediation analysis are presented in Figure 5. We focus first on the effect of pitch-restricted drift rate on DP, which was the same regardless of the ROI to be tested in the mediation analysis. In the Unison condition (Figure 5A), DP tended to increase with increasing pitch-restricted drift rate for incorrect trials, whereas it remained relatively flat (but trended down with increasing pitch-restricted drift rate) for correct trials. As expected, there was also a main effect of response category such that incorrect trials tended to be more difficult than correct trials. In the Competing condition (Figure 5B), the same pattern of effects was observed but the tendency of DP to decrease with increasing pitch-restricted drift rate for correct trials was much more pronounced. This appeared to be driven by an increase in DP for low pitch-restricted drift rates on correct trials, wherein this increase was large enough that DP was essentially equivalent for correct and incorrect trials at the lowest pitch-restricted drift rates. This likely reflects the fact that absence of pitch on correct trials, for which speech tended to be less distorted overall, leads to increased DP because the speech signal is overall more intelligible but the talkers are difficult to segregate, whereas absence of pitch on incorrect trials, for which speech tended to be more distorted, leads to decreased DP because the speech signal is overall less intelligible and the listener tends to guess randomly (decreasing DP) because the talkers are also difficult to segregate. Crucially, the effect of pitch-restricted drift rate on DP interacted with response category for both Unison and Competing, which validates the moderated mediation approach.

Figure 5.

Post hoc moderated mediation analysis. (A–B) Hierarchical Bayesian linear regression of pitch-restricted drift rate on trial-wise DP in the Unison (left) and Competing (right) conditions. (C–D) Direct effect of pitch-restricted drift rate from hierarchical Bayesian linear regression of pitch-restricted drift rate and DP on fMRI activation in the bilateral auditory cortex (AC) ROI, shown separately for the Unison (left) and Competing (right) conditions. A binary ROI map (red) is plotted at center. (E–F) As in C–D, but for the left dorsal premotor (dPM) ROI. (G–H) As in C–D, but for the left middle pFC ROI. For all regression plots, response category is denoted by color (red = correct, blue = incorrect), solid lines indicate the posterior mean, and shaded regions indicate the posterior 95% credible interval. All ROI maps are displayed on an inflated cortical surface rendering of the Colin 27 template in MNI space.

Figure 5.

Post hoc moderated mediation analysis. (A–B) Hierarchical Bayesian linear regression of pitch-restricted drift rate on trial-wise DP in the Unison (left) and Competing (right) conditions. (C–D) Direct effect of pitch-restricted drift rate from hierarchical Bayesian linear regression of pitch-restricted drift rate and DP on fMRI activation in the bilateral auditory cortex (AC) ROI, shown separately for the Unison (left) and Competing (right) conditions. A binary ROI map (red) is plotted at center. (E–F) As in C–D, but for the left dorsal premotor (dPM) ROI. (G–H) As in C–D, but for the left middle pFC ROI. For all regression plots, response category is denoted by color (red = correct, blue = incorrect), solid lines indicate the posterior mean, and shaded regions indicate the posterior 95% credible interval. All ROI maps are displayed on an inflated cortical surface rendering of the Colin 27 template in MNI space.

Close modal

The direct effects of pitch-restricted drift rate on fMRI activation—that is, those remaining after accounting for the effects of DP on fMRI activation in the final step of the mediation model—are shown for the three ROIs in Figure 5CH. We focus first on the Competing condition (D, F, H) because this is the crucial condition relative to our a priori hypotheses regarding AC and left dPM. First, activation in bilateral AC and left dPM tended to increase with increasing pitch-restricted drift rate, regardless of whether the response was correct or incorrect (D, F), consistent with our a priori hypothesis regarding these ROIs. We must exercise extreme caution in interpreting these effects because, indeed, these ROIs were selected based on their significant overall correlation with pitch-restricted drift rate in the brain-behavior-stimulus model. However, the moderated mediation effect, which is unrelated to the brain-behavior-stimulus model, was not significant for AC or left dPM, suggesting that, in the Competing condition, the effect of pitch-restricted drift rate on fMRI activation in these ROIs was not mediated by DP. On the other hand, the direct effect of pitch-restricted drift on fMRI activation in the left middle pFC (H) patterned exactly as the effect of pitch-restricted drift rate on DP (B). Indeed, the moderated mediation effect was significant for pFC (posterior mean = 0.04, 95% CI [0.03, 0.05]; proportion mediated = 0.35). Thus, as expected, the effect of pitch-restricted drift rate on fMRI activation in pFC is strongly mediated by the effect of pitch-restricted drift rate on DP. Notably, there was a main effect of response category on fMRI activation in all three ROIs, but this effect took opposite directions for AC and left dPM (correct > incorrect) versus pFC (incorrect > correct), where the latter again matched the direction of the main effect of pitch-restricted drift rate on DP.

The direct effects of pitch-restricted drift rate on fMRI activation in the Unison condition (C, E, G) were similar overall to the Competing condition, but with some important differences. Like the Competing condition, effects in pFC (G) closely mirrored the effect of pitch-restricted drift rate on DP, whereas this was not the case for AC (C) and left dPM (E). Indeed, the moderated mediation effect was significant only for pFC (posterior mean = 0.01, 95% CI [0.001, 0.02]; proportion mediated = 0.23). Contrary to our a priori hypothesis, fMRI activation again tended to increase with increasing pitch-restricted drift rate regardless of response category, which likely explains why the overall correlation between pitch-restricted drift rate and fMRI activation in AC was significant in the brain-behavior-stimulus model in the Unison condition (Figure 5B, Unison; subject again to the caveat of bias in ROI selection). However, the magnitude of the effect was reduced for correct trials. Consistent with our a priori hypothesis, fMRI activation in left dPM tended to increase with increasing pitch-restricted drift rate only for incorrect trials, which likely explains the absence of a significant overall effect in left DPM in the brain-behavior-stimulus model (Figure 5B, Unison), because the vast majority of Unison trials were correct. Notably, fMRI activation in AC and left dPM again tended to be higher on correct trials (although the effect was drastically reduced for left dPM), whereas the opposite was true for pFC, which again mirrored the direction of the effect of response category on DP.

We hypothesized that listening-related activation in the left dorsal speech-premotor cortex (left dPM) and temporal lobe voice/pitch regions would be driven more reliably by vocal pitch for a speech recognition task that requires extraction of vocal pitch versus one that does not. To test this hypothesis, we compared correlations of behaviorally relevant acoustic speech features with fMRI activation in two multitalker speech recognition tasks—Competing, which required talker segregation using vocal pitch, and Unison, which did not—and with two measures reflecting the relative presence or absence of behaviorally relevant acoustic speech features—overall drift rate, which was allowed to be modulated by vocal pitch and phonetic speech content, and pitch-restricted drift rate, which was allowed to be modulated only by vocal pitch (Figure 3). We predicted that (i) overall drift rate would be correlated with fMRI activation in dPM and classic auditory-speech regions in the Competing and Unison conditions; and (ii) pitch-restricted drift rate would be correlated with fMRI activation in dPM and auditory regions sensitive to pitch and/or voice only in the Competing condition. Hypothesis (i) was confirmed by the brain-behavior-stimulus analysis and Hypothesis (ii) was confirmed for dPM but not AC, where activation in the latter was correlated with pitch-restricted drift rate in the Unison and Competing conditions (Figure 4).

In a recent fMRI study (Venezia et al., 2021), we used Auditory Bubbles to degrade sentences spoken by a single female talker in a quiet background, and listeners were asked to make subjective (yes–no) intelligibility ratings. We found that both left dPM and early auditory regions responded to vocal pitch only within trials rated as unintelligible by the listeners. Crucially, behavioral analysis showed that the intelligibility judgments relied exclusively on phonetic speech content. As such, left dPM and early AC were primarily driven by phonetic speech content as required by the task. We suspect these areas activated to vocal pitch on unintelligible trials because phonetic content was largely absent from the signal, leaving pitch as the only potentially relevant acoustic speech feature within this group of trials. This suggests that pitch-responsive cortical regions including left dPM can operate in multiple modes—an auditory-pitch mode and a phonetic-speech mode. Numerous recent studies support this conclusion, showing responses to vocal pitch (e.g., the F0 contour) and phonetic content (or acoustic features correlated thereof, e.g., the speech envelope) in early auditory regions and left dPM (Hamilton et al., 2021; Berezutskaya et al., 2020; Forseth, Hickok, Rollo, & Tandon, 2020; Venezia, Thurman, Richards, & Hickok, 2019; Brodbeck, Presacco, & Simon, 2018; Dichter et al., 2018).

Here, we hypothesized that left dPM and AC would be pushed into the auditory-pitch mode during the Competing task but not the Unison task. The logic supporting this hypothesis was twofold. First, because the Competing task required extraction of vocal pitch to segregate two talkers, pitch-responsive regions would be activated by pitch regardless of overall speech intelligibility. Second, the Unison task was analogous to the single-talker task in Venezia et al. (2021) but overall speech intelligibility was much higher (∼85% correct performance), leaving very few unintelligible trials on which pitch-sensitive regions might be recruited in the auditory-pitch mode. Our HDDM analysis showed that overall drift rate was more strongly driven by vocal pitch for Competing than Unison (Figure 3). Correspondingly, activation in left dPM was more strongly correlated with overall drift rate for Competing than Unison (Figure 4A). However, this effect could have been driven by a categorical response to intelligible speech as in Venezia et al. (2021). First, overall drift rate is modulated by both vocal pitch and phonetic content (Figure 3B/C). Recall that Auditory Bubbles estimates these relative contributions by stochastically removing (or retaining) components of the speech signal related to vocal pitch and phonetic content (Figures 1/2). Because phonetic content is strongly related to overall signal intelligibility (Venezia, Martin, et al., 2019; Venezia et al., 2016) and makes strong contributions to overall drift rate in both conditions (Figure 3B/C), brain-behavior-stimulus correlations with overall drift rate may reflect a categorical response to intelligible versus unintelligible speech. Second, overall task difficulty was higher for Competing (67% correct) than Unison (∼85% correct). Indeed, overall drift rate was almost entirely positive (i.e., speech was almost always intelligible) in the Unison condition (Figure 3A). Any categorical effects of intelligibility were likely to be larger in the Competing condition, which may explain the more widespread positive correlations with overall drift rate in downstream auditory regions (Competing vs. Unison; Figure 4A) that often respond preferentially to intelligible speech (Stoppelman, Harpaz, & Ben-Shachar, 2013; Okada et al., 2010; Narain et al., 2003).

Therefore, it was crucial to also examine brain-behavior-stimulus correlations with pitch-restricted drift rate, which represents the independent correlation of behaviorally relevant vocal pitch cues with fMRI activation (i.e., regardless of overall intelligibility). In both conditions, pitch-restricted drift rate was positively correlated with activation in bilateral auditory networks including anterolateral Heschl's gyrus and neighboring superior temporal gyrus, consistent with prior work on pitch and voice processing (Hamilton et al., 2021; Pernet et al., 2015; Puschmann, Uppenkamp, Kollmeier, & Thiel, 2010; Patterson, Uppenkamp, Johnsrude, & Griffiths, 2002; Griffiths, Uppenkamp, Johnsrude, Josephs, & Patterson, 2001; Belin, Zatorre, Lafaille, Ahad, & Pike, 2000). Given the relatively small contribution of vocal pitch to drift rate in the Unison condition (Figure 3), as expected given the nature of the task (essentially single-talker) and high overall intelligibility of signal, it was somewhat surprising to see robust correlations with pitch-restricted drift rate in early auditory regions (Figure 4B, Unison). Indeed, this violated our Hypothesis (ii) with respect to AC. The presence of robust correlations with pitch-restricted drift rate, despite low intertrial variation in the measure (Figure 3A, left, red), suggests a more obligatory response to vocal pitch in AC. One possible reason why this was observed here and not by Venezia et al. (2021) is increased power to detect correlations with vocal pitch afforded by the distillation of pitch-related cues to a single drift-rate metric via the HDDM, essentially a form of dimension reduction in the stimulus space, whereas Venezia et al. (2021) employed a high-dimensional multivariate analysis of acoustic features across the entire speech MPS.

In line with our Hypothesis (ii), in left dPM, we observed a task-related modulation of the correlation with pitch-restricted drift rate, such that activation was positively correlated with pitch-restricted drift rate in the Competing condition but not in the Unison condition (Figure 4B; Competing BF10 = 254, Unison BF10 = 0.22). In the Competing condition, this was accompanied by the appearance of negative correlations with pitch-restricted drift rate in a bilateral frontoparietal network (Figure 4B). Given the proposed role for this network in effortful listening (Herrera et al., 2021), this suggests that absence of vocal pitch significantly modulated the difficulty of the Competing task. Together, these findings lend support to the conclusions that (i) the Competing task pushed left dPM into an auditory-pitch mode and (ii) left dPM may play a task-modulated role in pitch extraction for multitalker speech recognition.

Regarding conclusion (i), there are at least two counterarguments. First, left dPM may contain functionally heterogeneous subregions. Figure 6 plots left dPM as defined in our previous study (Venezia et al., 2021) with left dPM as defined presently (Figure 4B, Competing). We also plot a recent estimate (peak coordinate) of the gyral component of the dorsal laryngeal motor cortex (Belyk et al., 2021). As is clear, the dPM regions are largely nonoverlapping, with the previous-study region on the crown of the precentral gyrus and the present-study region on the adjacent posterior wall of the precentral gyrus abutting the gyral laryngeal motor cortex. Interestingly, we noted in our previous study that pitch-related responses tended to increase moving posteriorly within left dPM. Thus, it possible that the present study has identified a pitch-selective subregion of left dPM. In Neurosynth (Yarkoni, Poldrack, Nichols, Van Essen, & Wager, 2011; https://neurosynth.org/ accessed on 10/10/2021), the center of mass of left dPM as defined in the present study ([−50, −4, 44]) was significantly associated with “pitch” (z = 4.16), “vocal” (z = 5.39), and “auditory” (z = 6.57), whereas none of these terms was associated with left dPM as defined by Venezia et al. (2021; [−52, 0, 38]). Second, and related, work by Hamilton and colleagues suggests that representations of different acoustic-phonetic speech features are spatially comingled within local patches of auditory responsive cortex, including motor regions (Hamilton et al., 2021; Hamilton, Edwards, & Chang, 2018). Thus, the “multiple modes” of processing in left dPM may reflect subpopulations of neurons tuned to different speech features.

Figure 6.

Left dPM as defined in the present study (green) versus Venezia et al. (2021; blue) with overlap shown in yellow, plotted together with the peak coordinate for gyral laryngeal motor cortex (LMC) from Belyk et al. (2021) shown as a 2-mm radius sphere (red). Plots displayed on the standard topology white-matter-boundary surface derived from the Colin27 template in MNI space.

Figure 6.

Left dPM as defined in the present study (green) versus Venezia et al. (2021; blue) with overlap shown in yellow, plotted together with the peak coordinate for gyral laryngeal motor cortex (LMC) from Belyk et al. (2021) shown as a 2-mm radius sphere (red). Plots displayed on the standard topology white-matter-boundary surface derived from the Colin27 template in MNI space.

Close modal

Regarding conclusion (ii), it is noteworthy that we generally observed positive correlations with speech recognition performance (drift rate) in auditory and lower-level (pre-)motor regions and negative correlations with performance in higher-level sensorimotor regions in inferior frontal and parietal cortices (Figure 4). This suggests the response in lower-level motor regions including left dPM is not purely compensatory, because these regions tend to activate more when speech recognition performance is better. However, because this study is correlational, we should not immediately conclude that motor regions play a role in speech recognition. Specifically, because we model trial-by-trial fluctuations in drift rate purely as a function of trial-by-trial fluctuations in the speech acoustics, those brain regions driven purely by speech acoustics may appear to be correlated with performance and vice versa. We have previously argued that motor activations during listening to speech may reflect a sensorimotor “resonance” between auditory regions that code acoustic-phonetic speech features and motor regions that use those features as “targets” to guide speech production (Venezia & Hickok, 2009). This argument cannot be ruled out here in relation to pitch-driven activations in left dPM. The fact that pitch-driven activations in early auditory regions are more obligatory while such activations in left dPM are more task sensitive supports the notion that core encoding of vocal pitch occurs in the AC, but leaves the door open to the possibility of top–down contributions to speech recognition by left dPM in certain listening scenarios such as in multitalker speech recognition (Stokes, Venezia, & Hickok, 2019). However, we also observed task-modulated activation by vocal pitch in the bilateral calcarine sulcus (Figure 4B), which we have previously suggested to reflect cross-modal processing of emotional or socially significant stimuli (Venezia et al., 2021) such as vocal pitch. Thus, increased attention to vocal pitch—driven by the demands of the speech recognition task but unrelated to speech recognition per se—may modulate pitch responses in some brain regions.

It is also important to rule out the possibility that task-related modulation (Competing vs. Unison) of the response to vocal pitch in left dPM reflects merely the fact that the Competing task was more difficult than the Unison task, whereas such difficulty was, by definition, associated with pitch because of the talker segregation demands of the Competing task. That is, we expect the relative presence or absence of vocal pitch across trials in the Competing task to be associated with fluctuations in difficulty across trials, such that fluctuations in difficulty may be driving the correlation with pitch-restricted drift rate in left dPM. To address this, we ran a post hoc mediation analysis (Figure 5) testing whether the correlation with pitch-restricted drift rate in left dPM was mediated by a trial-wise DP, reflecting essentially the decision-related component of RT (higher RT = higher DP). For comparison, we included bilateral AC, for which we expected no such mediation by DP, and left middle pFC, for which we strongly expected mediation by DP, in the mediation analysis. Mediation effects were split by response type (correct, incorrect) because we expected the direction of the relation between pitch-restricted drift rate and DP to be modulated by response type, which was indeed confirmed by the mediation analysis (Figure 5A/B).

Focusing on the crucial Competing condition, DP tended to increase with increasing pitch-restricted drift rate on incorrect trials and decrease with increasing pitch-restricted drift rate on correct trials (Figure 5B). DP was also greater overall for incorrect trials than correct trials. The effects of pitch-restricted drift rate on activation in pFC followed a nearly identical profile, suggesting mediation of the pitch-restricted drift rate effect by DP. Indeed, per our expectation, the mediation effect was significant for pFC. On the contrary, no such mediation was observed for AC or left dPM (Figure 5D/F). In these regions, activation tended to increase with increasing pitch-restricted drift rate regardless of whether the response was ultimately correct or incorrect. That is, AC and left dPM were activated by behaviorally relevant vocal pitch cues whether the presence of those cues made the task more effortful (incorrect trials) or less effortful (correct trials). Thus, the mediation effect of DP was not significant in AC nor left dPM. Interestingly, AC, but not left dPM, showed the same pattern in the Unison condition (Figure 5C/E), in agreement with the modulation of the pitch-restricted drift rate effect by task in left dPM but not AC in the main brain-behavior-stimulus analysis (Figure 4).

Finally, it should be noted that we cannot rule out unequivocally that pitch-related activations in AC and left dPM were related in some way to intelligibility rather than pitch per se. As noted, the presence of vocal pitch tended to improve performance in the Unison and Competing tasks (although dramatically more so in the Competing task; Figure 3), which suggests that pitch may have contributed to speech intelligibility. Such a contribution is likely in the Competing condition where the presence of pitch would aid in talker segregation and thus increase the availability of potentially intelligible speech cues. Consistent with this explanation, activation in AC and left dPM tended to be highest on correct trials for which pitch cues were present in the signal (high pitch-restricted drift rates; Figure 5D/F). However, this result is essentially “baked in” to the study design because (i) AC and left dPM tend to activate more to intelligible speech and (ii) the presence of vocal pitch increases speech intelligibility, at least when operationalized in terms of drift rate for the Competing task. That is, one would expect a region sensitive to both phonetic (intelligible) speech content and vocal pitch to show exactly the pattern of results observed in AC and left dPM. One source of evidence supporting a specific role for AC and left dPM in processing pitch is the fact that correlations with pitch-restricted drift rates did not extend into regions in the lateral superior temporal lobe that classically respond selectively to intelligible speech. Thus, we suggest the most likely explanation of the data is that AC and left dPM play a role in processing vocal pitch, where, at least during perception, the contribution of AC is more obligatory and the role of left dPM is more modulatory. Although it is possible this modulatory contribution reflects a recoding of pitch in laryngeal articulatory terms, it is not clear why such a transformation would be beneficial given suggestions of perceptual invariance for pitch (McPherson & McDermott, 2022) and a linear acoustic-to-articulatory mapping for pitch (Parrell et al., 2019), but it may be relevant that there is a perceptual “voice disadvantage” (i.e., worse performance for vocal compared to nonvocal stimuli) for even simple discrimination of pitch (Gao & Oxenham, 2022). A possible role for left dPM during speech recognition with multiple talkers may be to regulate the extent to which voice- or speech-related representations in AC are facilitated relative to fine-grained pitch cues.

This material is the result of work supported with resources from and the use of facilities at the VALLHCS, Loma Linda, CA. The contents do not represent the views of the U.S. Department of Veterans Affairs or the U.S. government.

Reprint requests should be sent to Jonathan H. Venezia, VALLHCS, Loma Linda, CA 92357; Loma Linda University School of Medicine, Loma Linda, CA 92350, or via e-mail: [email protected].

All code used for stimulus generation/presentation and fMRI preprocessing, as well as a full set of example stimuli, can be found at https://osf.io/ftmpa/. Code for the HDDM, Bayesian brain-behavior-stimulus, and moderated mediation analyses can be found at https://osf.io/ne2x4/. Results of hierarchical Bayesian fMRI analyses in cortical surface format (text files that can be mapped to the cortical surface using SUMA) as well as the results of HDDM and moderated mediation analyses (R data files) can be found at https://osf.io/ne2x4/. Code for generating the final plots is available in the repository where applicable. In accordance with VA policy, interested parties may obtain the raw MRI and behavioral data only once a Data Use Agreement has been finalized between the requesting institution and the VALLHCS, U.S. Department of Veterans Affairs.

Jonathan H. Venezia: Conceptualization; Data curation; Formal analysis; Funding acquisition; Investigation; Methodology; Project administration; Resources; Software; Supervision; Validation; Visualization; Writing–Original draft; Writing–Review & editing. Christian Herrera: Data curation; Investigation; Methodology; Project administration; Validation; Writing–Review & editing. Nicole Whittle: Data curation; Investigation; Project administration; Resources; Writing–Review & editing. Marjorie R. Leek: Conceptualization; Supervision; Writing–Review & editing. Samuel Barnes: Data curation; Methodology; Resources; Writing–Review & editing. Barbara Holshouser: Resources; Supervision; Writing–Review & editing. Alex Yi: Methodology; Resources; Writing–Review & editing.

This work was supported by the U.S. Department of Veterans Affairs, Veterans Health Administration, Rehabilitation Research & Development Service Award IK2RX002702 to J. H. V.

Retrospective analysis of the citations in every article published in this journal from 2010 to 2021 reveals a persistent pattern of gender imbalance: Although the proportions of authorship teams (categorized by estimated gender identification of first author/last author) publishing in the Journal of Cognitive Neuroscience (JoCN) during this period were M(an)/M = .407, W(oman)/M = .32, M/W = .115, and W/W = .159, the comparable proportions for the articles that these authorship teams cited were M/M = .549, W/M = .257, M/W = .109, and W/W = .085 (Postle and Fulvio, JoCN, 34:1, pp. 1–3). Consequently, JoCN encourages all authors to consider gender balance explicitly when selecting which articles to cite and gives them the opportunity to report their article's gender citation balance. The authors of this article report its proportions of citations by gender category to be as follows: M/M = .632; W/M = .162; M/W = .132; W/W = .074.

Arsenault
,
J. S.
, &
Buchsbaum
,
B. R.
(
2016
).
No evidence of somatotopic place of articulation feature mapping in motor cortex during passive speech perception
.
Psychonomic Bulletin & Review
,
23
,
1231
1240
. ,
[PubMed]
Belin
,
P.
,
Zatorre
,
R. J.
,
Lafaille
,
P.
,
Ahad
,
P.
, &
Pike
,
B.
(
2000
).
Voice-selective areas in human auditory cortex
.
Nature
,
403
,
309
312
. ,
[PubMed]
Belyk
,
M.
,
Brown
,
R.
,
Beal
,
D. S.
,
Roebroeck
,
A.
,
McGettigan
,
C.
,
Guldner
,
S.
, et al
(
2021
).
Human larynx motor cortices coordinate respiration for vocal-motor control
.
Neuroimage
,
239
,
118326
. ,
[PubMed]
Berezutskaya
,
J.
,
Baratin
,
C.
,
Freudenburg
,
Z. V.
, &
Ramsey
,
N. F.
(
2020
).
High-density intracranial recordings reveal a distinct site in anterior dorsal precentral cortex that tracks perceived speech
.
Human Brain Mapping
,
41
,
4587
4609
. ,
[PubMed]
Blanco
,
C.
,
Okuda
,
M.
,
Wright
,
C.
,
Hasin
,
D. S.
,
Grant
,
B. F.
,
Liu
,
S.-M.
, et al
(
2008
).
Mental health of college students and their non–college-attending peers: Results from the national epidemiologic study on alcohol and related conditions
.
Archives of General Psychiatry
,
65
,
1429
1437
. ,
[PubMed]
Boersma
,
P.
(
2002
).
Praat, a system for doing phonetics by computer
.
Glot International
,
5
,
341
345
.
Bolia
,
R. S.
,
Nelson
,
W. T.
,
Ericson
,
M. A.
, &
Simpson
,
B. D.
(
2000
).
A speech corpus for multitalker communications research
.
Journal of the Acoustical Society of America
,
107
,
1065
1066
. ,
[PubMed]
Brodbeck
,
C.
,
Presacco
,
A.
, &
Simon
,
J. Z.
(
2018
).
Neural source dynamics of brain responses to continuous stimuli: Speech processing from acoustics to comprehension
.
Neuroimage
,
172
,
162
174
. ,
[PubMed]
Brown
,
S.
,
Ngan
,
E.
, &
Liotti
,
M.
(
2008
).
A larynx area in the human motor cortex
.
Cerebral Cortex
,
18
,
837
845
. ,
[PubMed]
Buchsbaum
,
B. R.
,
Baldo
,
J.
,
Okada
,
K.
,
Berman
,
K. F.
,
Dronkers
,
N.
,
D'Esposito
,
M.
, et al
(
2011
).
Conduction aphasia, sensory-motor integration, and phonological short-term memory—An aggregate analysis of lesion and fMRI data
.
Brain and Language
,
119
,
119
128
. ,
[PubMed]
Buchsbaum
,
B. R.
,
Hickok
,
G.
, &
Humphries
,
C.
(
2001
).
Role of left posterior superior temporal gyrus in phonological processing for speech perception and production
.
Cognitive Science
,
25
,
663
678
. ,
[PubMed]
Bürkner
,
P.-C.
(
2017
).
Brms: An R package for Bayesian multilevel models using Stan
.
Journal of Statistical Software
,
80
,
1
28
.
Carpenter
,
B.
,
Gelman
,
A.
,
Hoffman
,
M. D.
,
Lee
,
D.
,
Goodrich
,
B.
,
Betancourt
,
M.
, et al
(
2017
).
Stan: A probabilistic programming language
.
Journal of Statistical Software
,
76
,
1
32
.
Carvalho
,
C. M.
,
Polson
,
N. G.
, &
Scott
,
J. G.
(
2009
).
Handling sparsity via the horseshoe
. In
Artificial intelligence and statistics
(pp.
73
80
).
Cauley
,
S. F.
,
Polimeni
,
J. R.
,
Bhat
,
H.
,
Wald
,
L. L.
, &
Setsompop
,
K.
(
2014
).
Interslice leakage artifact reduction technique for simultaneous multislice acquisitions
.
Magnetic Resonance in Medicine
,
72
,
93
102
. ,
[PubMed]
Chen
,
J. L.
,
Penhune
,
V. B.
, &
Zatorre
,
R. J.
(
2009
).
The role of auditory and premotor cortex in sensorimotor transformations
.
Annals of the New York Academy of Sciences
,
1169
,
15
34
. ,
[PubMed]
Cheung
,
C.
,
Hamilton
,
L. S.
,
Johnson
,
K.
, &
Chang
,
E. F.
(
2016
).
The auditory representation of speech sounds in human motor cortex
.
eLife
,
5
,
e12577
. ,
[PubMed]
Correia
,
J. M.
,
Caballero-Gaudes
,
C.
,
Guediche
,
S.
, &
Carreiras
,
M.
(
2020
).
Phonatory and articulatory representations of speech production in cortical and subcortical fMRI responses
.
Scientific Reports
,
10
,
4529
. ,
[PubMed]
D'Ausilio
,
A.
,
Bufalari
,
I.
,
Salmas
,
P.
,
Busan
,
P.
, &
Fadiga
,
L.
(
2011
).
Vocal pitch discrimination in the motor system
.
Brain and Language
,
118
,
9
14
. ,
[PubMed]
D'Ausilio
,
A.
,
Bufalari
,
I.
,
Salmas
,
P.
, &
Fadiga
,
L.
(
2012
).
The role of the motor system in discriminating normal and degraded speech sounds
.
Cortex
,
48
,
882
887
. ,
[PubMed]
D'Ausilio
,
A.
,
Pulvermuller
,
F.
,
Salmas
,
P.
,
Bufalari
,
I.
,
Begliomini
,
C.
, &
Fadiga
,
L.
(
2009
).
The motor somatotopy of speech perception
.
Current Biology
,
19
,
381
385
. ,
[PubMed]
Dichter
,
B. K.
,
Breshears
,
J. D.
,
Leonard
,
M. K.
, &
Chang
,
E. F.
(
2018
).
The control of vocal pitch in human laryngeal motor cortex
.
Cell
,
174
,
21
31
. ,
[PubMed]
Du
,
Y.
,
Buchsbaum
,
B. R.
,
Grady
,
C. L.
, &
Alain
,
C.
(
2014
).
Noise differentially impacts phoneme representations in the auditory and speech motor systems
.
Proceedings of the National Academy of Sciences, U.S.A.
,
111
,
7126
7131
. ,
[PubMed]
Eichert
,
N.
,
Papp
,
D.
,
Mars
,
R. B.
, &
Watkins
,
K. E.
(
2020
).
Mapping human laryngeal motor cortex during vocalization
.
Cerebral Cortex
,
30
,
6254
6269
. ,
[PubMed]
Evans
,
S.
, &
Davis
,
M. H.
(
2015
).
Hierarchical organization of auditory and motor representations in speech perception: Evidence from searchlight similarity analysis
.
Cerebral Cortex
,
25
,
4772
4788
. ,
[PubMed]
Fadiga
,
L.
,
Craighero
,
L.
,
Buccino
,
G.
, &
Rizzolatti
,
G.
(
2002
).
Speech listening specifically modulates the excitability of tongue muscles: A TMS study
.
European Journal of Neuroscience
,
15
,
399
402
. ,
[PubMed]
Forseth
,
K. J.
,
Hickok
,
G.
,
Rollo
,
P. S.
, &
Tandon
,
N.
(
2020
).
Language prediction mechanisms in human auditory cortex
.
Nature Communications
,
11
,
5240
. ,
[PubMed]
Gao
,
Z.
, &
Oxenham
,
A. J.
(
2022
).
Voice disadvantage effects in absolute and relative pitch judgments
.
Journal of the Acoustical Society of America
,
151
,
2414
. ,
[PubMed]
Griffiths
,
T. D.
,
Uppenkamp
,
S.
,
Johnsrude
,
I.
,
Josephs
,
O.
, &
Patterson
,
R. D.
(
2001
).
Encoding of the temporal regularity of sound in the human brainstem
.
Nature Neuroscience
,
4
,
633
637
. ,
[PubMed]
Hamilton
,
L. S.
,
Edwards
,
E.
, &
Chang
,
E. F.
(
2018
).
A spatial map of onset and sustained responses to speech in the human superior temporal gyrus
.
Current Biology
,
28
,
1860
1871
. ,
[PubMed]
Hamilton
,
L. S.
,
Oganian
,
Y.
,
Hall
,
J.
, &
Chang
,
E. F.
(
2021
).
Parallel and distributed encoding of speech across human auditory cortex
.
Cell
,
184
,
4626
4639
. ,
[PubMed]
Han
,
H.
, &
Park
,
J.
(
2018
).
Using SPM 12's second-level Bayesian inference procedure for fMRI analysis: Practical guidelines for end users
.
Frontiers in Neuroinformatics
,
12
,
1
. ,
[PubMed]
Herrera
,
C.
,
Whittle
,
N.
,
Leek
,
M. R.
,
Brodbeck
,
C.
,
Lee
,
G.
,
Barcenas
,
C.
, et al
(
2021
).
Cortical networks for recognition of speech with simultaneous talkers
.
PsyArXiV
.
Hervais-Adelman
,
A. G.
,
Carlyon
,
R. P.
,
Johnsrude
,
I. S.
, &
Davis
,
M. H.
(
2012
).
Brain regions recruited for the effortful comprehension of noise-vocoded words
.
Language and Cognitive Processes
,
27
,
1145
1166
.
Hickok
,
G.
,
Houde
,
J.
, &
Rong
,
F.
(
2011
).
Sensorimotor integration in speech processing: Computational basis and neural organization
.
Neuron
,
69
,
407
422
. ,
[PubMed]
Hoffman
,
M. D.
, &
Gelman
,
A.
(
2014
).
The no-U-turn sampler: Adaptively setting path lengths in Hamiltonian Monte Carlo
.
Journal of Machine Learning Research
,
15
,
1593
1623
.
Houpt
,
J. W.
, &
Bittner
,
J. L.
(
2018
).
Analyzing thresholds and efficiency with hierarchical Bayesian logistic regression
.
Vision Research
,
148
,
49
58
. ,
[PubMed]
Indefrey
,
P.
, &
Levelt
,
W. J.
(
2004
).
The spatial and temporal signatures of word production components
.
Cognition
,
92
,
101
144
. ,
[PubMed]
Kaernbach
,
C.
(
1991
).
Simple adaptive testing with the weighted up–down method
.
Perception & Psychophysics
,
49
,
227
229
. ,
[PubMed]
Keitel
,
A.
,
Gross
,
J.
, &
Kayser
,
C.
(
2018
).
Perceptually relevant speech tracking in auditory and motor cortex reflects distinct linguistic features
.
PLoS Biology
,
16
,
e2004473
. ,
[PubMed]
Kleiner
,
M.
,
Brainard
,
D.
,
Pelli
,
D.
,
Ingling
,
A.
,
Murray
,
R.
, &
Broussard
,
C.
(
2007
).
What's new in Psychtoolbox-3
.
Perception
,
36
,
1
16
.
Krieger-Redwood
,
K.
,
Gaskell
,
M. G.
,
Lindsay
,
S.
, &
Jefferies
,
E.
(
2013
).
The selective role of premotor cortex in speech perception: A contribution to phoneme judgements but not speech comprehension
.
Journal of Cognitive Neuroscience
,
25
,
2179
2188
. ,
[PubMed]
Magnusson
,
M.
,
Vehtari
,
A.
,
Jonasson
,
J.
, &
Andersen
,
M.
(
2020
).
Leave-one-out cross-validation for Bayesian model comparison in large data
. In
International conference on artificial intelligence and statistics
(pp.
341
351
).
McPherson
,
M. J.
, &
McDermott
,
J. H.
(
2022
).
Invariance in pitch perception
.
bioRxiv
.
Meister
,
I. G.
,
Wilson
,
S. M.
,
Deblieck
,
C.
,
Wu
,
A. D.
, &
Iacoboni
,
M.
(
2007
).
The essential role of premotor cortex in speech perception
.
Current Biology
,
17
,
1692
1696
. ,
[PubMed]
Morey
,
R. D.
, &
Rouder
,
J. N.
(
2018
).
BayesFactor: Computation of Bayes factors for common designs
(R package version 0.9.12–4.2). Retrieved from https://CRAN.R-project.org/package=BayesFactor
.
Mottonen
,
R.
,
van de Ven
,
G. M.
, &
Watkins
,
K. E.
(
2014
).
Attention fine-tunes auditory–motor processing of speech sounds
.
Journal of Neuroscience
,
34
,
4064
4069
. ,
[PubMed]
Mottonen
,
R.
, &
Watkins
,
K. E.
(
2009
).
Motor representations of articulators contribute to categorical perception of speech sounds
.
Journal of Neuroscience
,
29
,
9819
9825
. ,
[PubMed]
Mumford
,
J. A.
,
Turner
,
B. O.
,
Ashby
,
F. G.
, &
Poldrack
,
R. A.
(
2012
).
Deconvolving BOLD activation in event-related designs for multivoxel pattern classification analyses
.
Neuroimage
,
59
,
2636
2643
. ,
[PubMed]
Muraskin
,
J.
,
Brown
,
T. R.
,
Walz
,
J. M.
,
Tu
,
T.
,
Conroy
,
B.
,
Goldman
,
R. I.
, et al
(
2018
).
A multimodal encoding model applied to imaging decision-related neural cascades in the human brain
.
Neuroimage
,
180
,
211
222
. ,
[PubMed]
Narain
,
C.
,
Scott
,
S. K.
,
Wise
,
R. J.
,
Rosen
,
S.
,
Leff
,
A.
,
Iversen
,
S.
, et al
(
2003
).
Defining a left-lateralized response specific to intelligible speech using fMRI
.
Cerebral Cortex
,
13
,
1362
1368
. ,
[PubMed]
Nuttall
,
H. E.
,
Kennedy-Higgins
,
D.
,
Hogan
,
J.
,
Devlin
,
J. T.
, &
Adank
,
P.
(
2016
).
The effect of speech distortion on the excitability of articulatory motor cortex
.
Neuroimage
,
128
,
218
226
. ,
[PubMed]
Okada
,
K.
,
Rong
,
F.
,
Venezia
,
J.
,
Matchin
,
W.
,
Hsieh
,
I. H.
,
Saberi
,
K.
, et al
(
2010
).
Hierarchical organization of human auditory cortex: Evidence from acoustic invariance in the response to intelligible speech
.
Cerebral Cortex
,
20
,
2486
2495
. ,
[PubMed]
Oosterhof
,
N. N.
,
Wiestler
,
T.
,
Downing
,
P. E.
, &
Diedrichsen
,
J.
(
2011
).
A comparison of volume-based and surface-based multi-voxel pattern analysis
.
Neuroimage
,
56
,
593
600
. ,
[PubMed]
Osnes
,
B.
,
Hugdahl
,
K.
, &
Specht
,
K.
(
2011
).
Effective connectivity analysis demonstrates involvement of premotor cortex during speech perception
.
Neuroimage
,
54
,
2437
2445
. ,
[PubMed]
Panouillères
,
M. T.
,
Boyles
,
R.
,
Chesters
,
J.
,
Watkins
,
K. E.
, &
Möttönen
,
R.
(
2018
).
Facilitation of motor excitability during listening to spoken sentences is not modulated by noise or semantic coherence
.
Cortex
,
103
,
44
54
. ,
[PubMed]
Parrell
,
B.
,
Ramanarayanan
,
V.
,
Nagarajan
,
S.
, &
Houde
,
J.
(
2019
).
The FACTS model of speech motor control: Fusing state estimation and task-based control
.
PLoS Computational Biology
,
15
,
e1007321
. ,
[PubMed]
Patterson
,
R. D.
,
Uppenkamp
,
S.
,
Johnsrude
,
I. S.
, &
Griffiths
,
T. D.
(
2002
).
The processing of temporal pitch and melody information in auditory cortex
.
Neuron
,
36
,
767
776
. ,
[PubMed]
Pernet
,
C. R.
,
McAleer
,
P.
,
Latinus
,
M.
,
Gorgolewski
,
K. J.
,
Charest
,
I.
,
Bestelmeyer
,
P. E.
, et al
(
2015
).
The human voice areas: Spatial organization and inter-individual variability in temporal and extra-temporal cortices
.
Neuroimage
,
119
,
164
174
. ,
[PubMed]
Philiastides
,
M. G.
,
Heekeren
,
H. R.
, &
Sajda
,
P.
(
2014
).
Human scalp potentials reflect a mixture of decision-related signals during perceptual choices
.
Journal of Neuroscience
,
34
,
16877
16889
. ,
[PubMed]
Pulvermuller
,
F.
, &
Fadiga
,
L.
(
2010
).
Active perception: Sensorimotor circuits as a cortical basis for language
.
Nature Reviews Neuroscience
,
11
,
351
360
. ,
[PubMed]
Pulvermuller
,
F.
,
Huss
,
M.
,
Kherif
,
F.
,
Moscoso del Prado Martin
,
F.
,
Hauk
,
O.
, &
Shtyrov
,
Y.
(
2006
).
Motor cortex maps articulatory features of speech sounds
.
Proceedings of the National Academy of Sciences, U.S.A.
,
103
,
7865
7870
. ,
[PubMed]
Puschmann
,
S.
,
Uppenkamp
,
S.
,
Kollmeier
,
B.
, &
Thiel
,
C. M.
(
2010
).
Dichotic pitch activates pitch processing Centre in Heschl's gyrus
.
Neuroimage
,
49
,
1641
1649
. ,
[PubMed]
R Core Team
. (
2019
).
R: A language and environment for statistical computing
.
R Foundation for statistical computing
. .
Rizzolatti
,
G.
, &
Craighero
,
L.
(
2004
).
The mirror-neuron system
.
Annual Review of Neuroscience
,
27
,
169
192
. ,
[PubMed]
Schomers
,
M. R.
,
Kirilina
,
E.
,
Weigand
,
A.
,
Bajbouj
,
M.
, &
Pulvermuller
,
F.
(
2015
).
Causal influence of articulatory motor cortex on comprehending single spoken words: TMS evidence
.
Cerebral Cortex
,
25
,
3894
3902
. ,
[PubMed]
Schwartz
,
J.-L.
,
Basirat
,
A.
,
Ménard
,
L.
, &
Sato
,
M.
(
2012
).
The perception-for-action-control theory (PACT): A perceptuo-motor theory of speech perception
.
Journal of Neurolinguistics
,
25
,
336
354
.
Simonyan
,
K.
(
2014
).
The laryngeal motor cortex: Its organization and connectivity
.
Current Opinion in Neurobiology
,
28
,
15
21
. ,
[PubMed]
Skipper
,
J. I.
,
Devlin
,
J. T.
, &
Lametti
,
D. R.
(
2017
).
The hearing ear is always found close to the speaking tongue: Review of the role of the motor system in speech perception
.
Brain and Language
,
164
,
77
105
. ,
[PubMed]
Smith
,
S. M.
,
Beckmann
,
C. F.
,
Andersson
,
J.
,
Auerbach
,
E. J.
,
Bijsterbosch
,
J.
,
Douaud
,
G.
, et al
(
2013
).
Resting-state fMRI in the human connectome project
.
Neuroimage
,
80
,
144
168
. ,
[PubMed]
Stokes
,
R. C.
,
Venezia
,
J. H.
, &
Hickok
,
G.
(
2019
).
The motor system's [modest] contribution to speech perception
.
Psychonomic Bulletin & Review
,
26
,
1354
1366
. ,
[PubMed]
Stoppelman
,
N.
,
Harpaz
,
T.
, &
Ben-Shachar
,
M.
(
2013
).
Do not throw out the baby with the bath water: Choosing an effective baseline for a functional localizer of speech processing
.
Brain and Behavior
,
3
,
211
222
. ,
[PubMed]
Tourville
,
J. A.
,
Reilly
,
K. J.
, &
Guenther
,
F. H.
(
2008
).
Neural mechanisms underlying auditory feedback control of speech
.
Neuroimage
,
39
,
1429
1443
. ,
[PubMed]
Tremblay
,
P.
, &
Small
,
S. L.
(
2011
).
On the context-dependent nature of the contribution of the ventral premotor cortex to speech perception
.
Neuroimage
,
57
,
1561
1571
. ,
[PubMed]
Vandekerckhove
,
J.
,
Tuerlinckx
,
F.
, &
Lee
,
M. D.
(
2011
).
Hierarchical diffusion models for two-choice response times
.
Psychological Methods
,
16
,
44
62
. ,
[PubMed]
Vehtari
,
A.
,
Gelman
,
A.
, &
Gabry
,
J.
(
2017
).
Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC
.
Statistics and Computing
,
27
,
1413
1432
.
Venezia
,
J. H.
, &
Hickok
,
G.
(
2009
).
Mirror neurons, the motor system and language: From the motor theory to embodied cognition and beyond
.
Language and Linguistics Compass
,
3
,
1403
1416
.
Venezia
,
J. H.
,
Hickok
,
G.
, &
Richards
,
V. M.
(
2016
).
Auditory “bubbles”: Efficient classification of the spectrotemporal modulations essential for speech intelligibility
.
Journal of the Acoustical Society of America
,
140
,
1072
. ,
[PubMed]
Venezia
,
J. H.
,
Leek
,
M. R.
, &
Lindeman
,
M. P.
(
2020
).
Suprathreshold differences in competing speech perception in older listeners with Normal and impaired hearing
.
Journal of Speech, Language, and Hearing Research
,
63
,
2141
2161
. ,
[PubMed]
Venezia
,
J. H.
,
Martin
,
A.-G.
,
Hickok
,
G.
, &
Richards
,
V. M.
(
2019
).
Identification of the spectrotemporal modulations that support speech intelligibility in hearing-impaired and normal-hearing listeners
.
Journal of Speech, Language, and Hearing Research
,
62
,
1051
1067
. ,
[PubMed]
Venezia
,
J. H.
,
Richards
,
V. M.
, &
Hickok
,
G.
(
2021
).
Speech-driven spectrotemporal receptive fields beyond the auditory cortex: STRFs beyond auditory cortex
.
Hearing Research
,
408
,
108307
. ,
[PubMed]
Venezia
,
J. H.
,
Thurman
,
S. M.
,
Richards
,
V. M.
, &
Hickok
,
G.
(
2019
).
Hierarchy of speech-driven spectrotemporal receptive fields in human auditory cortex
.
Neuroimage
,
186
,
647
666
. ,
[PubMed]
Voss
,
A.
,
Nagler
,
M.
, &
Lerche
,
V.
(
2013
).
Diffusion models in experimental psychology: A practical introduction
.
Experimental Psychology
,
60
,
385
402
. ,
[PubMed]
Voss
,
A.
,
Voss
,
J.
, &
Lerche
,
V.
(
2015
).
Assessing cognitive processes with diffusion model analyses: A tutorial based on fast-dm-30
.
Frontiers in Psychology
,
6
,
336
. ,
[PubMed]
Wabersich
,
D.
, &
Vandekerckhove
,
J.
(
2014
).
The RWiener package: An R package providing distribution functions for the wiener diffusion model
.
R Journal
,
6
,
49
56
.
Wang
,
L.
, &
Preacher
,
K. J.
(
2015
).
Moderated mediation analysis using Bayesian methods
.
Structural Equation Modeling: A Multidisciplinary Journal
,
22
,
249
263
.
Watkins
,
K.
, &
Paus
,
T.
(
2004
).
Modulation of motor excitability during speech perception: The role of Broca's area
.
Journal of Cognitive Neuroscience
,
16
,
978
987
. ,
[PubMed]
Watkins
,
K. E.
,
Strafella
,
A. P.
, &
Paus
,
T.
(
2003
).
Seeing and hearing speech excites the motor system involved in speech production
.
Neuropsychologia
,
41
,
989
994
. ,
[PubMed]
Whitfield-Gabrieli
,
S.
, &
Nieto-Castanon
,
A.
(
2012
).
Conn: A functional connectivity toolbox for correlated and anticorrelated brain networks
.
Brain Connectivity
,
2
,
125
141
. ,
[PubMed]
Wilkins
,
K. C.
,
Lang
,
A. J.
, &
Norman
,
S. B.
(
2011
).
Synthesis of the psychometric properties of the PTSD checklist (PCL) military, civilian, and specific versions
.
Depression and Anxiety
,
28
,
596
606
. ,
[PubMed]
Wilson
,
S. M.
,
Saygin
,
A. P.
,
Sereno
,
M. I.
, &
Iacoboni
,
M.
(
2004
).
Listening to speech activates motor areas involved in speech production
.
Nature Neuroscience
,
7
,
701
702
. ,
[PubMed]
Yarkoni
,
T.
,
Poldrack
,
R. A.
,
Nichols
,
T. E.
,
Van Essen
,
D. C.
, &
Wager
,
T. D.
(
2011
).
Large-scale automated synthesis of human functional neuroimaging data
.
Nature Methods
,
8
,
665
670
. ,
[PubMed]