Right Posterior Temporal Cortex Supports Integration of Phonetic and Talker Information

Abstract Though the right hemisphere has been implicated in talker processing, it is thought to play a minimal role in phonetic processing, at least relative to the left hemisphere. Recent evidence suggests that the right posterior temporal cortex may support learning of phonetic variation associated with a specific talker. In the current study, listeners heard a male talker and a female talker, one of whom produced an ambiguous fricative in /s/-biased lexical contexts (e.g., epi?ode) and one who produced it in /∫/-biased contexts (e.g., friend?ip). Listeners in a behavioral experiment (Experiment 1) showed evidence of lexically guided perceptual learning, categorizing ambiguous fricatives in line with their previous experience. Listeners in an fMRI experiment (Experiment 2) showed differential phonetic categorization as a function of talker, allowing for an investigation of the neural basis of talker-specific phonetic processing, though they did not exhibit perceptual learning (likely due to characteristics of our in-scanner headphones). Searchlight analyses revealed that the patterns of activation in the right superior temporal sulcus (STS) contained information about who was talking and what phoneme they produced. We take this as evidence that talker information and phonetic information are integrated in the right STS. Functional connectivity analyses suggested that the process of conditioning phonetic identity on talker information depends on the coordinated activity of a left-lateralized phonetic processing system and a right-lateralized talker processing system. Overall, these results clarify the mechanisms through which the right hemisphere supports talker-specific phonetic processing.


Methods
After preprocessing, we conducted a univariate regression analysis for each participant. To construct the regressors of interest, stimulus onset times for each condition (ambiguous exposure trials, unambiguous exposure trials and each of the four continuum steps presented in the phonetic categorization task, modeled separately for each talker) were convolved with a gamma function to generated idealized hemodynamic response functions. Each regression also included a third-order polynomial term (to account for scanner drift over the course of the run) as well as the six motion parameters estimated during the alignment step of preprocessing.
Group-level analyses were performed using the 3dLME command and tested for fixed effects of Step, Bias (sh-bias, s-bias), and Talker (female, male); random by-subject intercepts were also included in the model. Results were masked to only include voxels that were (a) imaged in all 20 participants and (b) in regions that are broadly associated with speech processingnamely, the inferior frontal, middle frontal, insular, superior temporal, transverse temporal, middle temporal, supramarginal and angular cortices bilaterally, as defined in the AFNI Talairach atlas.
This group mask is visualized in Figure 3.6A. To correct for multiple comparisons, we first applied the 3dFWHMx command, which uses a spatial autocorrelation function to estimate the smoothness of the residual (error) time series from the regression analysis. To assess the likelihood of noiseonly clusters, we then used the 3dClustSim command to perform a series of Monte Carlo simulations on our group mask using the mean estimated smoothness values. These simulations indicated that we needed at least 116 contiguous voxels for a statistically significant cluster (with a voxel-wise p value of 0.05 and a cluster-wise alpha level of 0.05).

Results
Our univariate analysis examined how functional activation during the phonetic categorization task depended on Step (how /s/-like or /∫/-like the continuum step was), Bias (whether the talker had previously produced ambiguous fricatives in /s/-biased or /∫/-biased contexts), and Talker (whether the female or male talker produced the stimulus). Results are summarized in Table S1 and visualized in Figure S1; the full set of voxels we considered is shown in Figure 8 of the main text. Figure S1A shows that several regions, notably including the bilateral superior temporal cortex, showed stronger activation when speech was presented compared to silent trials. As shown in Figure S1B, a broad set of bilateral temporal regions were sensitive to the particular continuum step participants heard, as were the left middle frontal gyrus and the inferior frontal cortex bilaterally. A planned comparison contrasted the activation for the most ambiguous continuum steps (i.e., those near the /s/-/∫/ phonetic category boundary) and unambiguous steps (i.e., the clear /s/ and the clear /∫/). We found that parts of the bilateral temporal cortex responded more strongly to the unambiguous tokens than to the ambiguous ones, as did the left middle frontal gyrus ( Figure   S1C). We observed sensitivity to the Bias manipulation in bilateral temporoparietal cortex as well as in the bilateral inferior / middle frontal gyri ( Figure S1D). By contrast, a relatively limited set of regions was differentially sensitive to which particular talker listeners heard, as shown in Figure   S1E; specifically, we observed an effect of Talker in left posterior auditory cortex, the right superior and middle temporal gyri (extending into the right parietal cortex), and right inferior frontal cortex. Finally, we observed a significant Bias × Talker interaction in the right middle frontal gyrus.  Figure S1. The univariate analysis considered activation during the phonetic categorization task and was limited to (A) left-hemisphere regions that have been implicated in language processing and the corresponding regions in the right hemisphere. Functional activation was significantly modulated by (B) which step on the sign-shine continuum was presented, with (C) a subset of these regions responding more strongly to unambiguous tokens than ambiguous tokens. Activation was also modulated by (D) whether the talker had previously produced ambiguous fricatives in /s/-biased or /∫/-biased contexts, (E) whether the male or female voice was being presented, and (F) the interaction between these two factors. Volumetric clusters were projected to a surface reconstruction using FreeSurfer (Fischl, 2012) Table S1. Results of the univariate analysis of fMRI data (voxel-wise p < 0.05, cluster-level α < 0.05).

Discussion
Univariate analyses demonstrated that bilateral temporal and frontal regions were sensitive to whether the talker had previously produced ambiguous fricatives in /s/-biased or /∫/-biased contexts. The clusters observed in the current study were generally comparable (albeit more widespread) than the related clusters observed by Myers and Mesite (2014), who found that differences in the response of the RIFG and the LSTG/MTG as a function of the type of biasing exposure listeners had received. However, we suggest caution in interpreting these data given that we did not observe behavioral effects of the lexically biasing exposure in our fMRI experiment.
Effects of talker identity (female or male) were observed in several regions associated with vocal identity processing (Luthra, 2021). In particular, we found sensitivity to talker gender in the posterior superior temporal lobe bilaterally, consistent with the idea that this region supports discrimination between talkers (Belin et al., 2000;Bestelmeyer, Belin, & Grosbras, 2011), as well as in the right anterior temporal lobe, which has been implicated in mapping from acoustic information to a known vocal identity (Belin & Zatorre, 2003;Luzzi et al., 2018;Van Lancker & Kreiman, 1987). We also observed an effect of talker gender in the right inferior frontal gyrus, a Finally, univariate analyses revealed that the recruitment of the right MFG was jointly influenced by which specific talker listeners were hearing (female or male) as well as whether the talker had previously produced ambiguous fricatives in /s/-biased or /∫/-biased contexts. The right frontal cortex is not thought to play a substantial role in phonetic processing, though it is possible that this region may be relatively more important when talker identity uniquely determines the mapping from acoustics to phonetic categories. However, we encourage caution in reading too strongly into this effect, as no effects of the specific biasing context were observed in the behavioral data. Instead, we suggest that future work should more thoroughly scrutinize the contributions of right frontal cortex to phonetic processing. To do so, it would be important to make sure the talkers' voices are perceptually matched, such that naïve listeners do not exhibit different phonetic categorization functions for the different talkers (as they did in Experiment 2 of the current study).

Searchlight Analysis
Searchlight analyses presented in the main text provide evidence that the local pattern of activation in the right superior temporal sulcus contains information about phonetic identity as well as talker. Figure S2 serves as a complement to Figure 9D, visualizing this cluster on both pial (A) and inflated (B) brain surfaces.
A B z = 2.0 z = 3.3 z = 2.0 z = 3.3 Figure S2. A set of searchlight analyses identified voxels that were sensitive to phonetic identity and talker. Here, this cluster is visualized on (A) pial and (B) inflated brain surfaces.