Successful perception of speech in everyday listening conditions requires effective listening strategies to overcome common acoustic distortions, such as background noise. Convergent evidence from neuroimaging and clinical studies identify activation within the temporal lobes as key to successful speech perception. However, current neurobiological models disagree on whether the left temporal lobe is sufficient for successful speech perception or whether bilateral processing is required. We addressed this issue using TMS to selectively disrupt processing in either the left or right superior temporal gyrus (STG) of healthy participants to test whether the left temporal lobe is sufficient or whether both left and right STG are essential. Participants repeated keywords from sentences presented in background noise in a speech reception threshold task while receiving online repetitive TMS separately to the left STG, right STG, or vertex or while receiving no TMS. Results show an equal drop in performance following application of TMS to either left or right STG during the task. A separate group of participants performed a visual discrimination threshold task to control for the confounding side effects of TMS. Results show no effect of TMS on the control task, supporting the notion that the results of Experiment 1 can be attributed to modulation of cortical functioning in STG rather than to side effects associated with online TMS. These results indicate that successful speech perception in everyday listening conditions requires both left and right STG and thus have ramifications for our understanding of the neural organization of spoken language processing.
Since the initial observations of Carl Wernicke in 1874 that the posterior superior temporal gyrus (STG) must be the sensory speech center of the brain, the notion that Wernicke's area is the neural locus of auditory speech processing has become one of the most widely accepted concepts in cognitive neuroscience. Despite the intervening years, fundamental questions relating to the neurobiology of speech perception still exist, such as how exactly the two hemispheres contribute to speech perception. Two prominent neurobiological models of speech perception are the unilateral model of Rauschecker and Scott (2009) and the bilateral model of Hickok and Poeppel (2000). Both build on the notion that speech perception occurs in the context of a dual stream of processing with a ventral pathway involved in mapping sound to meaning and a dorsal pathway mapping sound to articulatory motor processes. Rauschecker and Scott argue that “speech perception and production are left lateralized in the human brain” (p. 720), with the locus of successful speech perception in the left anterior STG (Rosen, Wise, Chadha, Conway, & Scott, 2011; Scott, Blank, Rosen, & Wise, 2000). In contrast, Hickok and Poeppel (2000) argue that speech perception is processed bilaterally and propose the existence of a speech perception pathway in each hemisphere capable of processing speech sounds up to and including the mental lexicon (Hickok & Poeppel, 2007). Moreover, effective speech perception is thought to rely on sites both anterior and posterior of the transverse temporal gyrus, with phonological processing especially occurring bilaterally and semantic processing being more left dominant (Hickok, 2009).
The argument in favor of bilateral speech perception comes mainly from patient data where unilateral lesions or unilateral anaesthetization during Wada testing of either the left or right hemisphere (Hickok et al., 2008; McGlone, 1984) partially impairs speech perception. In contrast, patients with bilateral lesions encompassing both left and right superior temporal regions are more likely to suffer from verbal auditory agnosia, an inability to understand spoken language despite preservation of other language capabilities, that is, reading or writing (Buchman, Garron, Trost-Cardamone, Wichter, & Schwartz, 1986). Of the 63 well-detailed cases of verbal auditory agnosia, there were ∼70% with bilateral lesions, supporting the notion that both left and right hemispheres are critical to speech perception. Yet, the remaining 30% had unilateral damage (only one patient had a right hemisphere lesion) and still had auditory agnosia (Slevc & Shell, 2015), suggesting that the left STG may be sufficient for speech recognition. Results from patient data are, therefore, inconclusive, with respect to the unilateral or bilateral organization of speech processing. The damage caused by lesions or strokes, however, is not constrained to functional or anatomical boundaries. Furthermore, the neuroplastic changes and reorganization that occur after brain injury make it difficult to localize the specific origin of the resultant cognitive deficit.
Although such issues can be overcome by investigating the neural processing of healthy human participants, neuroimaging studies have thus far also reported mixed results with respect to the question of whether speech processing occurs unilaterally or bilaterally. Several neuroimaging studies on processing of intelligible speech report a left-lateralized locus of processing (Evans & McGettigan, 2017; Evans, McGettigan, Agnew, Rosen, & Scott, 2016; McGettigan et al., 2012; Narain et al., 2003; Scott et al., 2000; Binder et al., 1997), whereas later studies have reported bilateral involvement of temporal areas in speech perception (Evans et al., 2014; Rosen et al., 2011; Friederici, Kotz, Scott, & Obleser, 2010; Okada et al., 2010; Harris, Dubno, Keren, Ahlstrom, & Eckert, 2009; Obleser, Eisner, & Kotz, 2008; Zekveld, Heslenfeld, Festen, & Schoonhoven, 2006). It is likely that the lack of consistency with respect to the laterality of STG involvement during speech is due not only to methodological differences between studies but also to the correlational nature of such methods where observed changes in the BOLD signal can either be functionally relevant or epiphenomenal.
TMS is a neurophysiological technique that allows for noninvasive stimulation of the human brain through the application of strong but short magnetic pulses that enable us to modulate the underlying neural activity in conscious, healthy human participants (noninvasively). By inducing electrical currents in the brain that modulate and disrupt the ongoing activation within a given region, TMS can be used to demonstrate causality between a cognitive process and specific brain regions. As a result, TMS can be used to complement other neuropsychological techniques (such as fMRI and EEG), which are correlational in nature (Sack, 2006; Paus, 2005). Given the potential for causal conclusions, TMS is a valuable method to study the neurobiology of speech perception (Adank, Nuttall, & Kennedy-Higgins, 2017). It is perhaps surprising that few studies have been published in this research area.
Thus far, TMS has been found to impair both semantic and phonological judgments after left posterior STG stimulation (Krieger-Redwood, Gaskell, Lindsay, & Jefferies, 2013) as well as prosodic judgments (Alba-Ferrara, Ellison, & Mitchell, 2012) and human voice perception after right posterior STG stimulation (Bestelmeyer, Belin, & Grosbras, 2011). Given the critical importance of these regions in speech perception, it is perhaps not surprising that the application of TMS disrupted performance across these studies. Krieger-Redwood et al. (2013) conclude that the impairment is the result of TMS increasing the “ambiguity of the auditory input to the system which necessarily impacts on processing at all levels” (p. 2185). Taken together, this research suggests that TMS can be used to further our understanding in a way that complements other research techniques.
Yet, TMS also has limitations that need to be considered before adopting it as a viable technique. The most referred to concept in TMS research is the creation of a “virtual lesion” (Pascual-Leone, Bartres-Faz, & Keenan, 1999). A common misunderstanding of this phrase is that the induced “virtual lesion” results in a complete loss of cognitive ability within the region being stimulated, that is, TMS, is capable of inducing deficits akin to cortical deafness. Although some such effects have been observed in a limited fashion within the visual system (Amassian et al., 1989), generally, the effects in other cortical regions are far more subtle, and experiments rely on more fine-grained distinctions in performance across tasks/stimulation sites shown through RTs, error rates, or motor-evoked potentials. This is a general property of TMS rather than specific to neurobiology of language research; however, it must be considered when assessing the research conducted so far. Meister, Wilson, Deblieck, Wu, and Iacoboni (2007) found no effect on discrimination of two consonant–vowel syllables in noise after left STG stimulation despite finding an impairment of tone discrimination and Drager, Breitenstein, Helmke, Kamping, and Knecht (2004) found no effect relative to baseline in a picture–word verification task. Although Beauchamp, Nath, and Pasalar (2010) found that participants were significantly less likely to report a McGurk effect after single-pulse TMS to STG. They conclude that this result is best explained as interfering with audio-visual integration rather than as evidence that TMS can interfere with speech perception. Thus, although TMS does provide the opportunity to establish causal brain–behavior links, the very subtle effect that TMS has on the overall network makes the task of establishing causal relations far more complex, especially within a network that is as highly redundant as the speech perception network (Price & Friston, 2002). Indeed, Meister et al. theorize that the network for speech perception within the temporal lobes is too extensive to be compromised by TMS “because of compensatory processes within the contralateral temporal cortex” (p. 1695).
The view that it is challenging to disrupt speech perception by targeting a single area using TMS is supported by Andoh and Paus (2011). They combined 1-Hz off-line repetitive TMS with functional imaging to investigate the impact that stimulating the superior temporal region of each hemisphere would have on the activation in the contralateral hemisphere. The results showed a task-related increase in activation in the homologue areas contralateral to the site of stimulation, that is, stimulation of the left posterior temporal region resulted in a task-related increase in activation in the right superior and middle temporal gyri and the left cerebellum. Andoh and Paus (2011) suggest that these results are evidence of the brain compensating for the TMS-induced disruption to one hemisphere by drawing on additional resources from the opposite hemisphere. The authors suggest that this interhemispheric compensatory process is the reason why behavioral effects are not always observed after application of TMS. Moreover, they argue that this interhemispheric compensation is likely to represent the early stages of reorganization that occur in patients following neurological trauma, with individual differences in the degree of interhemispheric compensation explaining the variable impact of unilateral or bilateral damage.
The aim of the current study was to evaluate claims regarding the involvement of the STG made by the neurobiological models of Rauschecker and Scott (2009) and Hickok and Poeppel (2000) by using repetitive TMS to temporarily and selectively disrupt processing in either the left or right hemispheres of healthy human participants and measure the effect on participants' speech perception. Participants' ability to perceive speech in noise was assessed by comparing their performance on a speech reception threshold (SRT) task (Plomp & Mimpen, 1979a, 1979b) without TMS (control condition) and while receiving TMS separately to the left STG, right STG (experimental sites), or vertex (control site). In Experiment 1, we emulated everyday listening conditions by presenting participants with spoken sentences embedded in background noise. The bilateral model of Hickok and Poeppel (2000) predicts that TMS to the left or to the right STG will result in poorer performance relative to no TMS, whereas the unilateral model of Rauschecker and Scott (2009) predicts that only rTMS to the left STG should impair performance, with no effect of right STG stimulation. Experiment 2 replicated the TMS parameters of Experiment 1 but involved the use of a nonspeech, visual discrimination task in place of the SRT task of Experiment 1. This is a task that does not engage the STG of either hemisphere and was therefore included to test whether the results from Experiment 1 could be explained by nonspecific side effects of rTMS, for example, the distraction that arises from facial twitching. Following Andoh and Paus (2011), the current study adopted an online rTMS paradigm to maximize cortical modulation within the STG while minimizing the possibility of interhemispheric compensation reducing or eliminating the behavioral effects of the stimulation.
Sixteen individuals took part in this study (18–41 years old; mean = 23.25 years, SD = 6.94; 11 women). All participants were native British English speakers, with no reported history of speech, language, neurological, or psychiatric disorders. Hearing thresholds were not explicitly measured; however, all participants reported no history of hearing difficulty, the stimuli were presented suprathreshold (i.e., at a level higher than 20 dB HL), and no participant showed any sign of potential hearing difficulty in the baseline no TMS condition, that is, all participants performed within the expected range. All were safety screened according to University College London's protocols and presented no contraindications to either MRI or TMS. All participants gave informed, written consent for both the MRI and TMS test sessions, with all sessions approved by the university research ethics committee (#0599/001). Participants were paid or received course credit.
Participants' ability to perceive speech in noise was assessed by comparing their performance on the SRT task (Plomp & Mimpen, 1979a, 1979b). All sentences occurred in the presence of speech-shaped noise with the signal-to-noise ratio (SNR) varying adaptively depending on individual participant performance. The first sentence was presented at a favorable SNR, for example, +20 dB. Correct repetition of three or more keywords resulted in a reduction of 10 dB on subsequent trials, until participants were unable to correctly repeat more than two keywords. At this point, the SNR increased in steps of 6 dB until another reversal occurred with all subsequent changes occurring in steps of 4 dB. In all cases, the level of the speech signal remained constant, with the noise file varying in intensity. A reversal refers to the shift in direction of SNR change from one trial to the next, for example, if a participant repeated more than three keywords for four sentences in a row, then the SNR will reduce after each sentence making the subsequent sentence on each occasion harder to perceive. If on the fifth sentence the participant was unable to repeat at least three of the keywords, the SNR will increase making the subsequent sixth trial easier to understand. Such a change in direction from decreasing to increasing (or vice versa) SNR represents a reversal. Participants' SRTs were computed by taking the mean SNR from all trials where a reversal occurred (Schoof & Rosen, 2014; Plomp & Mimpen, 1979a, 1979b).
After presentation of each sentence, participants were asked to repeat verbatim what they heard. Responses were scored online immediately after each trial using a graphical user interface on a standard computer screen that was not visible to participants. Each sentence contained five keywords upon which scoring was based, for example, “The MEAL was COOKED BEFORE the BELL RANG” (keywords in uppercase letters). Keywords were also judged to be correct if participants changed the grammatical number of presented words, for example, “Meals” (plural) instead of “Meal” (singular). All other variations were scored as incorrect with no feedback given. Scoring of all responses was performed by the same author (P. A.). There was no interjudge reliability measure. It is for this reason that the scoring occurred under the very strict parameters outlined above, that is, only changes in number were permitted, all other errors were scored as incorrect. Where the participant did not respond clearly, the scorer asked them to repeat their response.
Orders of sentence list were counterbalanced using a Latin square technique. All sentences were pseudorandomly ordered such that the order of presentation was different between participants, but each sentence was only played once per participant (Figure 1).
Four lists of 30 sentences were created from a prerecorded set of the Institute of Electrical and Electronics Engineers sentences (IEEE, 1969). Some sentences were adapted from the original American English to suit the native British English sample of participants, for example, “The hogs were fed chopped corn and garbage” was adapted to “The pigs were fed chopped corn and rubbish.” One male speaker of standard southern British English read all sentences in a sound-attenuated room; the stimuli were original recorded by Rosen, Souza, Ekelund, and Majeed (2013). Audio digitizing was performed at 44.1 kHz and 16 bits. The beginning and end of each sentence were trimmed to zero crossings as closely as possible to the onset/offset of the initial and final speech sounds. The sentences were then peak-normalized to 99% of maximum amplitude and scaled to 70 dB SPL (sound pressure level) using Praat (Boersma & Weenink, 2014). Sentences were presented in steady-state speech-shaped noise, the spectrum of which was derived from the 120 test sentences without amplitude modulation; on all trials, the noise masker started 500 msec before the onset of the sentence with the noise, sentence, and TMS offsets all occurring concurrently. All sentences were presented binaurally via Etymotic ER1 earphones using a custom-made MATLAB script (R2013a; The Mathworks, Inc.).
Transcranial Magnetic Stimulation
Stimulation was performed using a Magstim Rapid2 and a 70-mm figure-of-eight coil (Magstim). Pulses were delivered online (i.e., during sentence presentation) at a rate of 10 Hz for 2500 msec, starting 500 msec before each sentence began and continuing until the sentence had finished (25 pulses per trial). A 10-Hz stimulation has been shown to be effective at disrupting processing within superior temporal regions (Pitcher, 2014; Andoh & Paus, 2011; Bestelmeyer et al., 2011; Bueti, van Dongen, & Walsh, 2008). The longest sentence was 2500 msec in length, and thus, 25 pulses were chosen to ensure that TMS was applied throughout the entire length of all sentences. Stimulation intensity was set at 40% of maximum stimulator output and held constant across all participants. During a period of extensive pilot testing, it was found that 40% of maximum stimulator output was found to be of sufficient intensity to have an experimental effect without causing significant discomfort for the participants. Additionally, the 40% stimulation intensity was sufficiently low to prevent the coil from overheating, thus ensuring that we did not need to switch coils between conditions. Motor thresholds were not used as their applicability to nonmotor regions is yet to be fully established (Stokes et al., 2013; Stewart, Walsh, & Rothwell, 2001) and previous experiments employing a similar methodology to the present experiment have shown that the use of single threshold can be effective in the superior temporal region (Pitcher, 2014; Bueti et al., 2008). The TMS frequency, intensity, and duration were well within established international safety limits (Rossi, Hallett, Rossini, & Pascual-Leone, 2009; Wassermann, 1998).
Before the main experiment, all participants received three to four trains of pulses per site to ensure they were comfortable with the stimulation parameters. During this demonstration, all participants used an earplug (3M EAR, 36-dB attenuation) in the ear ipsilateral to the site of stimulation to attenuate the sound of the coil discharge and avoid damage to the ear (Counter, Borg, & Lofqvist, 1991). During the main experiment, magnetically shielded Etymotic ER1 earphones were used bilaterally to deliver the auditory stimuli and to attenuate the sound of coil discharge.
Test of Etymotic ER1 Earphones
Before the main experiment, a pilot test of the attenuation capabilities of the ER1 earphones was conducted to investigate whether the acoustic click of the TMS coil interferes with the main experimental task. A B&K 4157 coupler was used (Brüel & Kjær sound and vibration measurement), with the output connected to the left channel of a Scarlett 2i2 USB interface (Focusrite Audio Engineering Ltd). The Scarlett 2i2 USB interface was adjusted, such that with the ER1 not inserted into the coupler and the Magstim rapid2 (Magstim) module running at 10 Hz, 100% maximum intensity (i.e., the 4157 responding to the acoustic click from the TMS coil), the recorded level was about 6 dB below overload. The ER1 inputs were connected to 50 Ω terminators, and only the right channel ER1 was used for the measurements, which were recorded using cooledit 96 (Adobe Systems, Inc.) at a sampling rate of 44.1 kHz, 16 bit.
A 70-mm diameter figure-of-eight TMS coil was held approximately 30 cm above the ER1 shielded transducer box. With the ER1 not inserted into the coupler, the Magstim rapid2 module was run at a rate of 10 Hz, 100% of maximum pulse strength. Under these conditions, the acoustic click associated with firing the TMS coil was recorded at a level of 81.9 dB SPL. Then, to assess the acoustic leakage through the foam insert of the ER1 earphones, with the TMS coil held in the same position, the ER1 was inserted into the B&K coupler, and the Rapid2 module was again run at 10 Hz, 100% maximum stimulator output. Under these conditions, the acoustic click of the TMS coil was recorded at 37.8 dB SPL, inferring an attenuation of 44.1 dB, resulting in a level of background noise that was believed to be low enough to not impact on the main experimental task. This conclusion was confirmed anecdotally when all participants reported being able to comfortably hear the sentences over the noise of the TMS pulses with no noticeable difference compared with the vertex stimulation condition.
Participants came to the Birkbeck-UCL Centre for Neuroimaging to obtain a T1-weighted structural magnetic resonance imaging scan (FLASH sequence, repetition time = 12 msec, echo time = 5.6 msec, flip angle = 19°, resolution = 1 mm × 1 mm × 1 mm). Immediately after the scanning session, the individual MRI slices were processed to create one composite image and rotated to match the orientation of the MNI 152 template brain. During the TMS session, the structural scan was used in conjunction with BrainSight frameless stereotaxy (Rogue Research). BrainSight uses an infrared camera and tracking system and displays the specific location and orientation of the TMS coil in real time on the individual participant's MRI, ensuring accurate and consistent stimulation of the target and control site.
The experimental sites for this study were taken from Adank (2012), who conducted an activation likelihood estimation (ALE) meta-analysis of 57 fMRI and PET studies that contrasted intelligible with less intelligible or unintelligible speech stimuli. ALE is used to establish the degree of overlap between coordinates taken from different neuroimaging papers. Across all 57 studies, the site with the highest ALE score—and therefore the site with the most observed activation across studies—was the left anterior STS with MNI coordinates of x = –60, y = −12, z = −6. A less active homologous cluster was found in the right anterior STS (x = +62, y = −8, z = −10). These two sets of coordinates were used as guides for placement of the TMS coil. In some participants, these coordinates did not match up to the STS; therefore, small visually guided adjustments of the coordinates were made on a participant-by-participant basis to ensure that stimulation targeted the STG across all participants; in all cases, the smallest possible adjustment to the y-coordinate was made with the x- and z-coordinates held consistent with those adopted from Adank (2012). Average MNI coordinates of target sites within the final sample in Experiment 1 for the left STG were x = −60, y= −12, z = −6, and the right STG were x = +61, y = −8, z = −10. Vertex was used as a control site and was identified as the highest point of the skull in the midsagittal plane.
The dependent variable for the speech reception task in all experiments is the average SNR level at which reversals occurred across the 30 test sentences per condition. A one-way repeated-measures ANOVA was conducted to investigate the effect of TMS condition on performance for each experiment separately. The within-participant factor is Stimulation Type (no TMS vs. vertex vs. left STG vs. right STG). In Experiment 1, all data were normally distributed according to Shapiro–Wilk test of normality (all ps > .2). In Experiment 2, two conditions were shown to be nonnormally distributed (left STG, p = .006 and no TMS, p = .004). Despite two of the conditions being nonnormally distributed in Experiment 2, for ease of comparison between experiments, the results of the one-way ANOVA are reported below. This is because this type of ANOVA is robust to deviations in normality (Lix, Keselman, & Keselman, 1996; Glass, Peckham, & Sanders, 1972), and the results of the nonparametric Friedman test were equivalent to the one-way ANOVA. All scores for both experiments fall within three standard deviations of the mean, and therefore, no score was considered to be an outlier. Bonferroni-corrected paired samples t tests were used for all follow-up analyses.
A repeated-measures ANOVA was conducted to investigate whether the application of TMS to different anatomical landmarks (no TMS, vertex, left STG, right STG) would produce differential effects on participants' ability to perceive speech in noise. Because of the functional relevance of bilateral STG in speech perception, overall thresholds were expected to be higher, representing poorer performance, after separate application of TMS to both the left and right STG conditions relative to the no TMS and vertex control conditions. A significant main effect of TMS Location was found, F(3, 45) = 10.47, p < .001, η2 = .41, indicating that TMS had a differential effect on speech perception ability depending on location of stimulation. Post hoc paired samples t tests confirmed that stimulation of the left STG (M = −1.64 ± 1.61 dB) and the right STG (M = −0.99 ± 1.81 dB) impaired perception of sentences presented in noise relative to both the no TMS (M = −2.96 ± 1.57 dB) and vertex (M = −2.81 ± 1.67 dB) stimulation conditions (see Table 1 for all relevant statistics). No difference was observed between either of the control conditions or between performance in the left and right STG stimulation conditions (see Figure 2). Therefore, TMS was more disruptive when stimulation was applied to either the left or right STG compared with the vertex or no stimulation conditions.
|Site A .||Site B .||t .||p .||Mean Difference .||Confidence Interval of Mean Difference .||Cohen's d .|
|L STG||R STG||−1.30||.213||−0.64||[−1.69, 0.4]||−0.325|
|L STG||No TMS||3.12||.007||1.32||[0.41, 2.22]||0.78|
|L STG||Vertex||3.64||.002||1.17||[0.48, 1.85]||0.91|
|R STG||No TMS||4.57||<.001||1.96||[1.04, 2.87]||1.14|
|R STG||Vertex||4.19||.001||1.81||[0.89, 2.73]||1.04|
|No TMS||Vertex||−0.42||.676||−0.15||[−0.9, 0.6]||−0.105|
|Site A .||Site B .||t .||p .||Mean Difference .||Confidence Interval of Mean Difference .||Cohen's d .|
|L STG||R STG||−1.30||.213||−0.64||[−1.69, 0.4]||−0.325|
|L STG||No TMS||3.12||.007||1.32||[0.41, 2.22]||0.78|
|L STG||Vertex||3.64||.002||1.17||[0.48, 1.85]||0.91|
|R STG||No TMS||4.57||<.001||1.96||[1.04, 2.87]||1.14|
|R STG||Vertex||4.19||.001||1.81||[0.89, 2.73]||1.04|
|No TMS||Vertex||−0.42||.676||−0.15||[−0.9, 0.6]||−0.105|
Bonferroni-corrected alpha level = (.05/6) = .008.
SRTs were found to be elevated, reflecting poorer performance as participants required a more favorable SNR to perform at an equivalent level, after application of online repetitive TMS to either the left or right STG compared with a no TMS control condition and the TMS control site (vertex). These results are important as the equal drop in performance across the left and right STG stimulation conditions supports accounts that propose bilateral processing in speech perception.
TMS research paradigms usually incorporate a stimulation control site, that is, a site that is stimulated despite its lack of functional relevance to the task/behavior under investigation. This is to ensure that any observed changes in behavior are caused by the intended disruption of cortical processing at the main experimental site and are not caused by general changes in attention/arousal caused by the TMS click and/or skin sensations that occur every time a TMS pulse is delivered. One alternative explanation for the results observed in Experiment 1 could be that the TMS coil was closer to the ears in both STG conditions compared with the vertex condition. As a result, therefore, the acoustic noise of discharging the coil (the click) was potentially more intense and therefore more disruptive, independent of the effect on the underlying neuroanatomy. This explanation is precluded, however, by the fact that, before Experiment 1 the Etymotic ER1 earphones were tested and showed a good level of attenuation of the TMS coil click resulting in a level of background noise that was believed to be low enough to not impact on the main experimental task. Furthermore, after data collection, the anecdotal responses from participants indicated no difficulty in hearing the speech stimuli in either of the experimental conditions compared with the control conditions, thus supporting our view that the results observed in Experiment 1 are not confounded by potential differences in the acoustic intensity of the TMS coil click across conditions.
Acoustic noise, however, is only one nonspecific effect of stimulation; the validity of the current results could be questioned, as the application of TMS directly innervated the temporalis muscles and thus caused a twitch in the canthus of the eye and the jaw of all participants. These twitches can at times be distracting and uncomfortable (Duecker & Sack, 2015). Because this facial twitching does not occur during vertex stimulation, it is possible that participants were simply more distracted by TMS in the left and right STG stimulation conditions during stimulus presentation compared with the control site stimulation. To test whether the results of the first experiment could be explained solely in terms of these nonspecific TMS distractions, a follow-up experiment was conducted where participants completed a visual discrimination threshold task under the same four TMS conditions used in Experiment 1 (left STG, right STG, vertex, no TMS). A visual discrimination threshold task was used because the bilateral STG are likely to be functionally irrelevant to this visual task. Therefore, if a disruptive effect of TMS location is found, it suggests that the observed effects of Experiment 1 are due to the confounding side effects of TMS application (e.g., the distraction that arises from facial twitches). In contrast, if no effect of TMS was found in Experiment 2, it would support the notion that the results of Experiment 1 were evidence of bilateral STG involvement in speech recognition.
Seventeen individuals took part in this study, all of whom met the same eligibility criteria as outlined in Experiment 1 and were paid for their participation. In addition to the previously outlined eligibility criteria (i.e., native British English; right-handed; with no reported history of speech, language, neurological, or psychiatric disorder), the participants' visual acuity was assessed to establish if it was within the normal range. All participants were assessed to have a binocular vision rating of less than 0.1 on the LogMAR scale, equating to greater than 0.8 on the decimal scale (Colenbrander, 2002), and on average, participants were capable of accurately verifying 80.46 ± 11.18 written sentences in 2 min, at an average of 1398 ± 193 msec per sentence (as assessed via the Speed and Capacity of Language Processing Test; Baddley, Emslie, & Nimmo-Smith, 1992). One participant was excluded from the final analysis for not completing the visual discrimination task as instructed. This participant was observed to repeatedly press the response keys throughout testing even at times when responses were not expected, that is, no stimuli were present on the screen (final analyzed n = 16; mean age = 21.5 ± 2.07 years, range = 18–25 years; eight women).
During the visual discrimination threshold task, each trial began with a fixation cross displayed in the center of the screen for 500 msec, followed by a blank screen for 500 msec and finally two sets of letter strings and another fixation cross were presented on screen for 2000 msec. The letter strings appeared just above and below the fixation cross. After the 2000 msec expired, the screen again went blank until the next trial began (intertrial interval = 4000 msec; see Figure 3).
The stimuli consisted of scrambled written versions of three of the five keywords used per trial in Experiment 1, an example of keywords used in Experiment 1 are COOKED BEFORE BELL; in Experiment 2, these were visually presented as DCOEOK BEROEF LBLE. On an “identical trial,” participants would simply see this letter string presented concurrently above and below the central fixation cross. On a “different trial,” three of the middle letters were changed in one of the three nonsense words. The first and last letters of all nonsense words were always held constant on different trials so that matching could not rely solely on the initial and final letter. Additionally, all stimuli and fixation crosses were presented using Courier New in font size 60. This is a fixed width font, and therefore, both sets of letter strings occupied the same horizontal space, and thus, matching had to rely on more than simple length comparisons.
The study consisted of 120 trials divided up into 30 trials per TMS condition. Of the 30 trials, 15 were identical and 15 were different. On the 15 “different trials,” the change occurred five times equally across the first, second, and third words. Letters were changed by simply replacing the three relevant letters with the next letter in the English alphabet, for example, DCOEOK BEROEF LBLE became DCOEOK BESPFF LBLE. Nonsense letter strings were used in place of real words to avoid ceiling effects, thus making an effect due to TMS modulation possible.
To make the visual discrimination threshold task as comparable to the SRT task used in Experiment 1, a staircase procedure was again adopted. In the same way that the level of the speech-shaped background noise varied adaptively dependent on performance, in the current experiment, the contrast level between the background and foreground (i.e., the visually presented text) was varied adaptively akin to the Pelli–Robson contrast sensitivity chart (Pelli & Bex, 2013; Pelli & Robson, 1988). On all trials, the background was black with an RGB (red, green, blue) value of [0, 0, 0]; on the first trial, the letter stings appeared with an RGB value of [0.8, 0.8, 0.8] and therefore appeared as white text on a black background. Correct discrimination resulted in an initial contrast change of ±0.1. As a result, correct discrimination resulted in a text RGB value of [0.7, 0.7, 0.7] on the subsequent trial, whereas incorrect discrimination would result in a text RGB value of [0.9, 0.9, 0.9]. This change occurred for the first 10 trials; contrast changes occurred in steps of 0.05 for Trials 11–16, steps of 0.025 for Trials 17–25, and steps of 0.001 for Trials 26–30.
As with Experiment 1, participants visual discrimination thresholds were computed by taking the mean RGB value of the letter strings for all trials where a reversal occurred (scores closer to zero represent better overall performance). Orders of stimuli list and stimulation sites were counterbalanced across participants. All stimuli lists were pseudorandomly ordered such that the order of presentation was different between participants, but each set of three nonsense letter string “sentences” only appeared once per participant. During pilot testing, it was found that this task incurred a fairly large learning effect; therefore, all participants completed 60 practice trials before starting the actual experimental session. No such practice effect was observed for the speech recognition threshold task in Experiment 1, as attested via a one-way ANOVA with factor Testing Order (first set of sentences presented vs. second vs. third vs. fourth) showing no significant difference in order of sentence presentation, F(3, 45) = 1.51, p = .224, η2 = .092. The TMS stimulator and procedure were identical to those used in Experiment 1, with 25 pulses administered per trial at a rate of 10 Hz.
MNI-152 Structural Brain Scan
Individual MRI structural scans were not obtained for any participants, instead the MNI-152 brain was used to guide placement of the TMS coil. The procedure changed here because a BrainSight software update provided a method for accurately positioning the coil without collecting individual structural scans. Compared with using frameless stereotaxy based on individual structural scans, the precision of localization using the average MNI structural brain was estimated to vary by less than 5 mm (Rogue Resolutions, personal communication), and thus, any inaccuracy in this localization technique was expected to be small, relative to the induced electrical field of a TMS pulse (Schönfeldt-Lecuona et al., 2005; Thielscher & Kammer, 2004). Both the participant-specific anatomical scans of Experiment 1 and the MNI template brain of Experiment 2 contain intrinsic spatial uncertainty from a combination of the spatial normalization procedure (i.e., use of an averaged MNI coordinate) and the unknowable functional and structural variation between participants. The only feature that differs between both techniques is the ability to adjust for anatomical variations when using participant-specific MRI scans. Although such minor adjustments were made for individual anatomical variability in Experiment 1, these adjustments were on the order of millimeters (average change of 1 mm in the y-coordinate) and therefore fell within the spatial resolution of TMS (5–10 mm). Therefore, although such adjustments were not possible using the MNI structural scan in Experiment 2, it does not seem plausible that the use of two different localization techniques will have substantially affected our results, as the variations associated within either technique were likely below the spatial resolution of TMS. Anecdotally, our experience suggests that the MNI template brain method is just as accurate as using an individual's structural for finding the hand region of the primary motor cortex, and therefore, it is reasonable to assume the results are comparable to the procedure used in Experiment 1 for stimulation of left or right STG (or vertex). In conjunction with Brainsight 2.3.5, the MNI-152 brain was adapted based on a minimum of five separate estimations of the front-, back-, top-, left-, and rightmost points on each participant's head with the MNI brain adapted to meet the measured dimensions. TMS target locations were the same as used in Experiment 1: left STG (x = −60, y = −12, z = −6), right STG (x = +62, y = −8, z = −10), vertex (x = 0, y = 0, z = +90), and a no TMS baseline condition. As the MNI-152 brain was fit to the dimensions of each participant's cranium, there was no need to adjust the target coordinates on an individual participant basis (as with Experiment 1), and therefore, the average coordinates match the target coordinates.
A repeated-measures ANOVA was conducted to investigate whether the application of TMS to different anatomical landmarks (no TMS, vertex, left STG, right STG) produced differential effects on participants' ability to discriminate between two nonsense letter strings at varying degrees of visual contrast. No significant main effect of TMS Condition was found, F(3, 45) = 1.08, p =. 367, η2 = .067, indicating that the performance of the visual discrimination threshold did not differ regardless of location of stimulation. To ensure that no significant differences are masked by an overall nonsignificant main effect, follow-up post hoc analyses were conducted without any correction for multiple comparisons. All comparisons returned nonsignificant results (all ps > .06). Therefore, the current stimulation parameters had no significant effect on the visual discrimination threshold performance, suggesting that nonspecific disruption, such as muscle twitching, was not sufficient to impair performance on the visual discrimination threshold task. Note that these results contrast with those of Experiment 1, where stimulation of the left or right STG disrupted performance on the speech recognition threshold task (Figure 4).
The aim of the current experiments was to develop our understanding of the role of bilateral STG in processing speech in noise; in doing so, we aimed to evaluate the claims made by the unilateral model of Rauschecker and Scott (2009) and the bilateral model of Hickok and Poeppel (2000) by temporarily disrupting the left or right STG using repetitive TMS and measuring its impact on participants' ability to perceive speech in noise. The results show that TMS to either the left or right STG reduced participants' ability to recognize speech in noise and thus supports neurobiological models of speech perception that hypothesize bilateral processing in speech perception. These results have ramifications for current and future neurobiological models of speech perception, which should acknowledge and subsequently understand the important roles that both hemispheres play.
Although a significant effect of TMS was found in Experiment 1, no effect of TMS was found in Experiment 2. The second experiment used a visual discrimination task to assess whether TMS-induced direct innervation of the temporalis muscle (a common side effect of TMS) impaired participants' ability to focus on the SRT task used in Experiment 1. The lack of a significant effect in Experiment 2 is important as it precludes the possibility that the results of Experiment 1 are due to the nonspecific effects of TMS, the parameters for which were identical across experiments. The nonsignificant result in Experiment 2 strongly suggests that participants were able to maintain enough attention despite innervation of facial musculature to complete the task in a valid way, with changes in performance on the speech task being driven by the cortical modulations induced through TMS. This is critically important for the current study and future studies, as it highlights that online TMS designs are appropriate to investigate speech perception. The importance here is highlighted by the results of Andoh and Paus (2011), who have shown that the application of off-line TMS results in compensatory modulations in ipsi- and contralateral regions of the brain to an extent that the behavioral perturbations induced through TMS can be overcome. When investigating action selection with TMS and fMRI, O'Shea, Johansen-Berg, Trief, Göbel, and Rushworth (2007) found that these compensatory processes occur within the first 4 min after TMS-induced neural modulation. Therefore, by using an online, as opposed to an off-line, repetitive TMS paradigm, we were able to establish the immediate impact of the disruption before any (or at least before the majority of) cortical adaptation occurred. These findings most closely approximate the impact of immediate neural trauma to superior temporal regions and the associated effect on speech perception.
Although the difference in SRTs between bilateral STG and the control conditions in Experiment 1 was significant, the overall magnitude of the effect is small. The just noticeable difference refers to the minimum level by which a stimulus must change before the difference is noticeable. Although there is still some disagreement as to the exact just noticeable difference for speech embedded in noise, it is believed to be roughly 2–3 dB (McShefferty, Whitmer, & Akeroyd, 2015; Killion, 2004). This suggests that, for a listener to gain any benefit from noise reduction in an acoustic signal, the noise would have to be reduced by a minimum of 2 dB. In comparison, the observed difference found in Experiment 1 of 1–2 dB between the left and right STG compared with the no TMS and vertex condition could be considered minimal in a real-world setting. However, this should be considered as a general limitation of TMS as a research technique as opposed to a limitation of the current results. Although the level of cortical modulation in TMS studies can be enough to impair performance allowing causal inferences concerning the role of certain regions on a specific task, the impairment in performance is often reflected in very subtle changes, that is, hundreds of milliseconds delay in RTs or a few percentage points in accuracy (Silvanto & Muggleton, 2008). Therefore, an important point to consider is not the size of the effect in real-world circumstances but instead whether or not a significant effect occurs in the context of the experimental design (de Graaf & Sack, 2011). In Experiment 1, a significant effect of repetitive TMS was found when applied online to the left and right STG regions, and even though the effect is small in real-world terms, it is theoretically important and should be considered in the context of the null effect on all control sites (vertex and no TMS) and the control task (visual discrimination) used in Experiment 2.
Despite the equivalent level of disruption caused by the application of TMS to each hemisphere, it is not necessarily inferred that the processes being manipulated across the two hemispheres are equivalent. A symmetrical disruption does not in itself necessitate symmetrical functioning (Obleser et al., 2008; Scott et al., 2000), and several previous studies have provided support in favor of hemispheric asymmetries in speech-related auditory processing. In an fMRI study, Wong, Uppunda, Parrish, and Dhar (2008) found that speech embedded in noise resulted in increased activation in bilateral STG. However, the pattern of activation differed between hemispheres. In the left STG, activation continued to increase as the noise became more intense, whereas in the right hemisphere, activation increased from clear speech to the moderate SNR condition but did not increase any more as the noise became even more intense. Despite the selective nature of the right hemisphere change in activation, Wong et al. found that the degree of individual difference in the right hemisphere activation was positively correlated with performance on a behavioral task in the most extreme listening condition (participants with greater right hemisphere activation performed better on the behavioral task), with no correlation found between behavioral performance and left STG activation. When combined with the results of Wong et al., the results from Experiment 1 suggest that speech perception is a bilateral process with both the left and right hemispheres performing important roles in the process, but the nature of the involvement of each hemisphere is likely different.
In conclusion, the results of the experiments presented here showed a TMS-induced impairment in speech perception after stimulation of both left and right temporal lobes and thus support neurobiological models of speech perception that hypothesize bilateral processing in speech perception. Additionally, no effect of TMS was found under any experimental condition on a task requiring visual perception/discrimination, and any potential differences induced by the acoustic noise of discharging the coil were controlled for through use of specialized earphones, thus suggesting that the current results are due to the modulation of cortical processing as opposed to nonspecific effects of online rTMS. These results have ramifications for current and future neurobiological models of speech perception and indicate that such models need to acknowledge the importance of both hemispheres. Finally, these results provide a base upon which future TMS studies can be conducted to investigate the specific roles that each hemisphere plays during successful speech perception.
This work was funded by an internal grant awarded to D. K.-H. from University College London. We thank Stuart Rosen for providing the speech materials used in Experiment 1, Steve Nevard for invaluable technical support, and all the individuals who participated in the two experiments.
Reprint requests should be sent to Dan Kennedy-Higgins, Department of Psychology, King's College London, Guy's Campus, London SE1 1UL, United Kingdom, or via e-mail: firstname.lastname@example.org.