Abstract

Recent neuroimaging evidence suggests that the frequency of entrained oscillations in auditory cortices influences the perceived duration of speech segments, impacting word perception [Kösem, A., Bosker, H. R., Takashima, A., Meyer, A., Jensen, O., & Hagoort, P. Neural entrainment determines the words we hear. Current Biology, 28, 2867–2875, 2018]. We further tested the causal influence of neural entrainment frequency during speech processing, by manipulating entrainment with continuous transcranial alternating current stimulation (tACS) at distinct oscillatory frequencies (3 and 5.5 Hz) above the auditory cortices. Dutch participants listened to speech and were asked to report their percept of a target Dutch word, which contained a vowel with an ambiguous duration. Target words were presented either in isolation (first experiment) or at the end of spoken sentences (second experiment). We predicted that the tACS frequency would influence neural entrainment and therewith how speech is perceptually sampled, leading to a perceptual overestimation or underestimation of the vowel's duration. Whereas results from Experiment 1 did not confirm this prediction, results from Experiment 2 suggested a small effect of tACS frequency on target word perception: Faster tACS leads to more long-vowel word percepts, in line with the previous neuroimaging findings. Importantly, the difference in word perception induced by the different tACS frequencies was significantly larger in Experiment 1 versus Experiment 2, suggesting that the impact of tACS is dependent on the sensory context. tACS may have a stronger effect on spoken word perception when the words are presented in continuous speech as compared to when they are isolated, potentially because prior (stimulus-induced) entrainment of brain oscillations might be a prerequisite for tACS to be effective.

INTRODUCTION

Noninvasive transcranial alternating current stimulation (tACS) is an increasingly popular technique in auditory and language research (Riecke & Zoefel, 2018; Zoefel & Davis, 2017; Heimrath, Fiene, Rufener, & Zaehle, 2016), with accumulating evidence showing that tACS efficiently affects sound processing and speech comprehension. Low-frequency tACS in the theta range (4 Hz) and alpha range (10 Hz) influences sound detection (Riecke, Formisano, Herrmann, & Sack, 2015; Riecke, Sack, & Schroeder, 2015; Neuling, Rach, Wagner, Wolters, & Herrmann, 2012), and high-frequency (40 Hz) tACS affects phoneme categorization (Rufener, Oechslin, Zaehle, & Meyer, 2016; Rufener, Zaehle, Oechslin, & Meyer, 2016). During continuous speech listening, tACS modifies auditory speech-evoked activity in the auditory cortex (Zoefel, Archer-Boyd, & Davis, 2018) and speech comprehension (Riecke, Formisano, Sorger, Başkent, & Gaudrain, 2018; Wilsch, Neuling, Obleser, & Herrmann, 2018).

The effects of tACS on auditory perception are thought to be mediated by oscillatory neural mechanisms that would be critical for auditory and linguistic processing (Zoefel, ten Oever, & Sack, 2018; Giraud & Poeppel, 2012; Peelle & Davis, 2012). Previous evidence shows that neural activity in the auditory cortices tracks the rhythmic structure of the speech signal. This neural tracking is linked to speech processing: Neural tracking is stronger when sentences are intelligible (Ding & Simon, 2013; Peelle, Gross, & Davis, 2013) and indicates how the speech signal is parsed in the brain (Kösem et al., 2018; Ding, Melloni, Zhang, Tian, & Poeppel, 2016; ten Oever & Sack, 2015). tACS is thought to influence neural tracking by modulating oscillatory activity of neural networks (Witkowski et al., 2016; Thut, Schyns, & Gross, 2011; Fröhlich & McCormick, 2010; but see Asamoah, Khatoun, & Mc Laughlin, 2019) and may provide a technique to test for a causal influence of neural tracking on the comprehension of spoken language.

So far, most tACS studies on speech have focused on the effects of tACS phase, that is, how the temporal alignment of the tACS current and speech envelope affect speech comprehension. Here, we further investigated whether the frequency of tACS influences speech perception. Neural activity in the theta range (3–8 Hz) is known to flexibly follow the syllabic rate of ongoing speech (Kösem et al., 2018; Ahissar et al., 2001). The flexible tracking of speech could reflect neural entrainment mechanisms, that is, the endogenous adjustment of neural rhythms to sensory dynamics (Obleser & Kayser, 2019). Neural entrainment is thought to facilitate speech processing via temporal referencing and temporal prediction (Kösem & van Wassenhove, 2017; Kösem, Gramfort, & van Wassenhove, 2014). The frequency of entrained theta oscillations would then define the expected syllabic rate from a brain referential standpoint, and this would influence how syllabic units and their constitutive phonological segments are processed in time (Figure 1; Bosker & Ghitza, 2018; Kösem et al., 2018; Bosker, 2017; Bosker & Kösem, 2017; Kösem & van Wassenhove, 2017).

Figure 1. 

Experimental design and predictions. (A) Participants listened to Dutch words that contained an ambiguous vowel (short “a” (/α/) – long “aa” (/a:/) contrast). The two vowels are dissociable based on both duration and spectral properties (second formant frequency, F2). On the basis of the perceived vowel, the words could be perceived as two distinct Dutch words with different meanings (e.g., “zag,” saw [verb] vs. “zaag,” saw [noun]). While participants listened to these words in isolation (Experiment 1) or in a sentence with a 4-Hz syllabic rate (Experiment 2), we applied continuous tACS at different frequencies (3 and 5.5 Hz). (B) tACS was applied to target participants' auditory cortices. For this, two electrodes were placed over the temporal cortices (centered on positions T7 and T8), and two other electrodes were placed symmetrically to the left and right side of the midline (respectively) so that their long sides were centered on the vertex (position Cz) and bordering each other, as in Riecke, Formisano, et al. (2015). (C) Power spectrum of the speech envelope of the sentence token in Experiment 2, which shows a peak at 4 Hz falling in between the two tACS frequencies. (D) We predicted that tACS entrains oscillations that act as temporal references for speech parsing. The change in frequency would bias the perceived duration of the chunked syllabic units and their constitutive phonological segments. More specifically, it would bias the perceived duration of the ambiguous vowel (duration overestimation under fast tACS, evidenced by a greater proportion of long vowel percepts, and underestimation under slow tACS, with a lower proportion of long vowel percepts), leading to the perception of different words.

Figure 1. 

Experimental design and predictions. (A) Participants listened to Dutch words that contained an ambiguous vowel (short “a” (/α/) – long “aa” (/a:/) contrast). The two vowels are dissociable based on both duration and spectral properties (second formant frequency, F2). On the basis of the perceived vowel, the words could be perceived as two distinct Dutch words with different meanings (e.g., “zag,” saw [verb] vs. “zaag,” saw [noun]). While participants listened to these words in isolation (Experiment 1) or in a sentence with a 4-Hz syllabic rate (Experiment 2), we applied continuous tACS at different frequencies (3 and 5.5 Hz). (B) tACS was applied to target participants' auditory cortices. For this, two electrodes were placed over the temporal cortices (centered on positions T7 and T8), and two other electrodes were placed symmetrically to the left and right side of the midline (respectively) so that their long sides were centered on the vertex (position Cz) and bordering each other, as in Riecke, Formisano, et al. (2015). (C) Power spectrum of the speech envelope of the sentence token in Experiment 2, which shows a peak at 4 Hz falling in between the two tACS frequencies. (D) We predicted that tACS entrains oscillations that act as temporal references for speech parsing. The change in frequency would bias the perceived duration of the chunked syllabic units and their constitutive phonological segments. More specifically, it would bias the perceived duration of the ambiguous vowel (duration overestimation under fast tACS, evidenced by a greater proportion of long vowel percepts, and underestimation under slow tACS, with a lower proportion of long vowel percepts), leading to the perception of different words.

Recently, findings from a magnetoencephalography experiment by Kösem et al. (2018) provide support for this proposal. They showed that sentences produced at a fast speech rate induced entrainment at a higher frequency (compared to slower sentences) and that this faster entrainment was sustained for a few cycles after the driving stimulus had ceased. Moreover, this sustained entrainment was observed to influence behavioral categorization of subsequent ambiguous target words: Sustained entrainment at a higher frequency biased the perception of vowels ambiguous between Dutch short “a” vowel (/α/) and long “aa” vowel (/a:/) toward long /a:/ vowel percepts. This suggests that the neural tracking of the temporal dynamics of speech is a predictive mechanism that is involved in the processing of subsequent speech input and directly influences speech perception. In line with Kösem et al. (2018), we predicted in this study that modulating the frequency of entrained theta oscillations with tACS modifies the perceived duration of speech segments and affects the perception of words.

In two experiments, we asked Dutch participants to listen to Dutch words that contained a vowel that was ambiguous with regard to its duration (short “a”, /α/ – long “aa”, /a:ː/ contrast). The words could be perceived as two distinct Dutch words with radically different meanings (e.g., “tak,” branch; “taak,” task). While participants listened to speech, we applied continuous tACS above the auditory cortices at different frequencies (3 and 5.5 Hz; Figure 1A). We expected that tACS at different entrainment frequencies would entrain corresponding neural oscillations and that these oscillations would influence temporal predictions, as reflected in how the words are perceived. Specifically, we predicted that stimulating the brain at a tACS frequency faster than the speech syllabic rate would lead to an overestimation of the speech segments' duration (and, in particular, of the ambiguous vowel), inducing a greater proportion of long vowel percepts; conversely, stimulating at a slower tACS frequency would lead to underestimation of the vowel duration (Figure 1D) and fewer long vowel percepts.

EXPERIMENT 1

Methods

Participants

Twenty-five native Dutch participants (mean age = 23 years, 17 women) took part in the study. All participants were suited to undergo noninvasive brain stimulation as assessed by prior screening. They reported no history of neurological or hearing disorders and gave their written informed consent before taking part in the study. One participant was excluded during tACS preparation because of intolerance to the electric stimulation. Another participant's data were excluded because of a recording error. In total, data from 23 participants remained for analysis. The experimental procedure was approved by the local ethics committee (Ethical Review Committee, Psychology and Neuroscience, Maastricht University).

Auditory Stimuli

The speech stimuli were a subset of words previously used in Kösem et al. (2018). A female native speaker of Dutch produced nine Dutch word pairs that only differed in their vowel, for instance, “zag” (saw [verb]) versus “zaag” (saw [noun]). The vowels for each word were constructed by selecting one long vowel “a” (/a:/) and manipulating its spectral and temporal properties, because the Dutch “a” (/α/) – “aa” (/a:/) contrast is cued by both spectral and temporal characteristics (Audio S1 and S21; Kösem et al., 2018; Bosker, 2017). The temporal manipulation involved compressing the vowel to a duration of 140 msec using PSOLA in Praat (i.e., maintaining the original pitch contour; Boersma & Weenink, 2007). Spectral manipulations were based on Burg's Linear Predictive Coding method in Praat, with the source and filter models estimated automatically from the selected vowel. The formant values in the filter models were adjusted to result in a constant F1 value (740 Hz, ambiguous between “a” and “aa”) and 13 different F2 values (1100–1700 Hz in steps of 50 Hz). Then, the source and filter models were recombined, and the new vowels were adjusted to have the same overall amplitude as the original vowel. This manipulation procedure resulted in a vowel with an ambiguous duration, but with spectral properties spanning a continuum from “a” and “aa.” Finally, the manipulated vowel tokens were combined with one consonantal frame (e.g., /z_x/) for each of the nine word pairs.

tACS Settings

The tACS montage followed the montage used by Riecke, Fomisano, et al. (2015) to stimulate the auditory cortices. Square rubber electrodes were attached to the scalp with conductive adhesive paste at positions defined by the International 10–20 system. Two electrodes were placed over the temporal cortices (centered on positions T7 and T8), and two other electrodes were placed symmetrically to the left and right side of the midline (respectively) so that their long sides were centered on the vertex (position Cz) and bordering each other. A sinusoidal current with fixed starting phase was applied to the circuit above each cerebral hemisphere using two battery-operated stimulator systems (Neuroconn). To create two approximately equivalent circuits, the skin was prepared so that the impedances of the left-lateralized and right-lateralized circuit were matched while keeping the net impedance below 10 kΩ (left: mean = 3.8, SD = 1.8 kΩ; right: mean = 3.7, SD = 1.8 kΩ). The sinusoidal current was presented at two frequencies: 3 and 5.5 Hz. The choice of these frequencies was based on the related previous MEG speech study (Kösem et al., 2018). Before the main experiment, tACS intensity was set individually by reducing the peak amplitude of the current simultaneously for both circuits in 0.1-mA steps from 1 mA to the point where participants reported feeling comfortable or uncertain about the presence of tACS under every electrode (on average, mean = 0.9 mA, SD = 0.1 mA, across participants).

For each tACS run of the experiment, tACS was continuously applied and its amplitude was ramped up over the first 10 sec of the run using raised-cosine ramps during which no trials were presented. For runs comprising sham stimulation, this onset ramp was followed by an additional offset ramp lasting 30 sec. Ramps at the end of the run were flipped; that is, they followed the reverse trajectory. Before the experiment, three waveforms were generated individually for each run (sampling rate: 16.5 kHz) that defined the acoustic stimulation, the electric stimulation, and the onsets of experimental trials (trial triggers) within the entire run, respectively. During the experiment, each of these waveforms was continuously fed in chunks into a separate channel of a digital-to-analog converter (National Instruments) operated by Datastreamer software (ten Oever et al., 2016). The outputs of the two “stimulation channels” were further split and fed into stimulation devices (stereo soundcard and two tACS systems; see previous two sections). The “trigger channel” output was fed into a PC on which Presentation software was running to control visual stimulation and button response acquisition.

Procedure

Participants were first familiarized with the auditory stimuli and task. They were presented with a vowel categorization task to estimate individual perceptual boundaries between “a” and “aa.” This pretest involved the presentation of the target word “dat,” that, and “daad,” deed, in isolation with 13 different equidistant F2 values between 1100 and 1700 Hz (with nine repetitions of each F2 value). The F2 values were presented in random order. Participants were asked to listen to the spoken words while fixating a fixation cross on the screen with the two response options presented left and right (“a” or “aa”; position counterbalanced across participants) and to report what vowel they heard by pressing a button after each word presentation. On the basis of this pretest, individual psychometric functions were determined, and the three F2 values yielding the 25%, 50%, and 75% long vowel “aa” categorization points were selected for the main experiment. This meant that the vowels spanned an ambiguous range, potentially allowing for the largest biasing effects, while at the same time providing participants sufficient variation to make the categorization task feasible. Note that, although the pretest aimed to ensure proportions of long vowel percepts of 25%, 50%, and 75%, somewhat lower values were observed in the main experiment. This may reflect a criterion shift induced by the increased number of word tokens presented in the main experiment (whereas the pretest involved only a single word token).

The main experiment consisted of five 10-min-long runs (two runs with 3-Hz tACS, two runs with 5.5-Hz tACS, and one sham run) with short breaks in between. Each run contained 162 trials. Participants were asked to perform the same vowel categorization task as in the pretest, but this time, all word pairs were presented. Participants were blinded for stimulation conditions, and runs were presented in random order. The sham run was identical to the tACS runs, except that it involved no electric stimulation beyond the on/off ramps (see tACS Settings section). In the stimulation conditions, the onsets of the target words appeared at six different phases of the tACS current (30°, 90°, 150°, 210°, 270°, and 330°). During debriefing, participants were asked to provide a percentage for each run quantifying their confidence that they received electric stimulation. Participants' confidence reports did not significantly differ between stimulation runs versus sham runs, t(22) = −0.8, p = .42, suggesting that they were unaware of whether they received stimulation or sham stimulation.

Data Analysis

We analyzed the effect of tACS condition (fast: 5.5 Hz; slow: 3 Hz; and sham stimulation) on the proportions of long vowel “aa” responses. Trials containing no button response (mean ± SD = 3.1 ± 8.5% of all trials, across participants) and trials presented during sham on-/off-ramps were discarded from the data analysis. Statistics were performed using generalized linear mixed models (GLMMs; Quené & van den Bergh 2008) with a logistic linking function as implemented in the lme4 library (Version 1.0.5; Bates, Mächler, Bolker, & Walker, 2015) in R (R Core Team, 2013).

Supplementary phase analyses were performed by reconstructing a time series (composed of the six tACS phases at which the target word was presented) for each stimulation condition. The phase that most effectively biases perception may vary across individuals because of individual differences in anatomy. To compensate for such possible interindividual variations, the maximum of the reconstructed series was aligned to the phase associated with the strongest long /a:/ vowel percepts (labeled as phase 90°).

Results

We expected that the rhythmic electric brain stimulation would entrain auditory cortices in a frequency-specific manner and hypothesized, based on Kösem et al. (2018), that this would influence the perceived duration of the words' vowels. We ran an analysis testing the fixed effects of tACS condition (fast, slow, sham) and Vowel F2 on the proportions of long vowel “aa” responses. We also included the interaction between Vowel F2 and tACS condition, because log-likelihood model comparison indicated that the model with the interaction term was a better fit to the data (log-likelihood: −8592.4 vs. −8548.6; χ2 = 86.657, p < .001). The model included random intercepts for Participants and Word pair, with by-participant and by-word pair random slopes for Vowel F2 (Barr, Levy, Scheepers, & Tily, 2013). More complex random effects structures failed to converge.

We expected to find a higher proportion of long “aa” responses in the fast 5.5-Hz tACS condition as compared to the slow 3-Hz tACS condition. However, against our expectations, no effect of tACS frequency was observed (fast vs. slow: p = .831; Figure 2). That is, the proportion of long vowel responses was not significantly larger for the fast tACS frequency condition versus the slow tACS frequency condition. These analyses also did not reveal evidence for differences between the two tACS frequency conditions and sham stimulation (sham vs. fast: p = .082; sham vs. slow: p = .054).

Figure 2. 

tACS frequency effects in Experiment 1. tACS frequency did not significantly influence word perception. (A) Proportion of long vowel word response during slow (3-Hz) tACS (blue, mean = 38.9%), fast (5.5-Hz) tACS (red, mean = 36.7%), and sham stimulation (black, mean = 35.6%) pooled across vowel F2s. No significant effect of stimulation frequency was found. (B) Difference in proportion of long vowel word response between slow tACS and sham conditions and between fast tACS and sham conditions. Each dot and dashed line represents one participant. The bold line represents the average. Bars denote SEM.

Figure 2. 

tACS frequency effects in Experiment 1. tACS frequency did not significantly influence word perception. (A) Proportion of long vowel word response during slow (3-Hz) tACS (blue, mean = 38.9%), fast (5.5-Hz) tACS (red, mean = 36.7%), and sham stimulation (black, mean = 35.6%) pooled across vowel F2s. No significant effect of stimulation frequency was found. (B) Difference in proportion of long vowel word response between slow tACS and sham conditions and between fast tACS and sham conditions. Each dot and dashed line represents one participant. The bold line represents the average. Bars denote SEM.

To control that participants paid attention to the task and relied on acoustic cues to provide their response, we presented vowels with three distinct F2 frequencies (one ambiguous F2 value, one F2 value biasing participant reports toward short /α/ responses, and one F2 value biasing participant reports toward long /a:/ responses). The results suggest that participants indeed paid attention to the stimuli as they relied on the spectral cues to categorize the vowels: The vowel F2 had an effect on target word perception (β = 0.029, SE = 0.007, z = 4.033, p < .001) indicating that vowels with higher F2 were more likely to be perceived as long “aa.”

We observed no significant effect of tACS phase on target word perception. Specifically, we analyzed perceptual reports for each tACS phase, after realignment to the phase associated with the strongest long “aa” vowel percepts (see Methods). Under the hypothesis that oscillatory phase modulates target word perception, we expected a bias toward long “aa” vowel percepts at phases neighboring the best phase, whereas a bias toward short vowel word percepts should be observable at opposing phases. To test this prediction, long “aa” categorization proportions were averaged across the hypothesized positive half-wave (phases 30° and 150°; excluding 90°, which trivially represented the maximum value because of the phase realignment) and across the hypothesized negative half-wave (phases 210°, 270°, and 330°), and then the two resulting averages were statistically compared. Similar GLMMs as reported above were used, with the predictor realigned oscillation half cycle (positive half cycle coded as +0.5, negative half cycle coded as −0.5), which yielded no significant effect of oscillation half cycle (p = .15).

In summary, the results from Experiment 1 showed no significant influence of tACS frequency (or phase) on the perception of ambiguous words presented in isolation. A potential explanation for this null result is that low-frequency tACS effects on speech perception may be more readily observable when target words are presented in a (quasi)rhythmic auditory context as in previous studies (Riecke et al., 2018; Wilsch et al., 2018; Riecke, Fomisano, et al., 2015), potentially because tACS may more strongly affect neural rhythms that are already present (Reato, Rahman, Bikson, & Parra, 2010). In a second experiment, we tested if tACS frequency influences speech perception when prior auditory input has already brought auditory cortices into an entrainment regime, by presenting ambiguous target words at the end of spoken sentences.

EXPERIMENT 2

Methods

Participants

Thirty-one native Dutch participants (mean age = 23 years, 18 women) took part in the study. All participants performed prior screening as in Experiment 1. Two participants were excluded because of a bias in speech perception observed during the pretest (proportion of long “aa” words > 90%). One participant was excluded because of a recording error. In total, 28 participants remained for analysis.

Auditory Stimuli

As in Experiment 1, the same female native speaker of Dutch produced nine Dutch word pairs that only differed in their vowel. In Experiment 2, these words were produced at the end of the fixed sentence frame “Hij zegt het woord [target]” (He says the word [target]; Audio S3). Target words were excised and manipulated to be ambiguous in vowel duration and quality. First, the durations of the two vowels of each pair were set to the mean vowel duration of that pair (M = 136 msec). Then, using sample-by-sample linear interpolation, we mixed the weighted sounds of the pair (11-point continuum; Step 1 = 100% “a” + 0% “aa”; Step 6 = 50% “a” + 50% “aa”; Step 11 = 0% “a” + 100% “aa”; i.e., a step size of 10%) to create 11 different steps changing in vowel quality. We used this interpolation method because it resulted in more natural sounding output, although it also resulted in spectral vowel continua—similar to Experiment 1. These manipulated vowels were then spliced back into the consonantal frame from the “aa” member of each pair and concatenated onto one fixed token of the context sentence. This token of “Hij zegt het woord…” had a duration of 1100 msec and a pronounced peak at 4 Hz in its modulation spectrum, given the four monosyllabic words, falling in between the two tACS frequencies (Figure 1C).

tACS Settings

All tACS parameters were set as described for Experiment 1. The average impedances of the left-lateralized and right-lateralized circuit were 5.3 ± 2.2 and 5.4 ± 2.4 kΩ, respectively, and the average tACS intensity was 0.9 ± 0.1 mA as before (mean ± SD across participants).

Procedure

The second experiment consisted of two acquisition sessions because of the increased duration of trials in comparison to Experiment 1 (as full sentences were presented). In the first acquisition session, participants were familiarized with the stimuli with a vowel categorization task as in Experiment 1. Each session consisted of six 7.5-min-long runs (four stimulation runs and two sham runs) of 81 trials with short breaks in between. In each session, participants were blinded for stimulation conditions, and runs were presented in random order to counterbalance run order across participants. As in Experiment 1, participants were asked to listen to the sentences and report their perception of the last word. The onsets of the target words appeared at six different phases of the tACS current. Participants' confidence reports did not significantly differ between tACS runs versus sham runs, t(27) = 0.1, p = .92.

Data Analysis

Similar analyses were performed as in Experiment 1. Trials containing no button response (mean ± SD = 1.4 ± 4.0% of all trials, across participants) and sham trials presented during tACS on–off ramps were discarded from the data analysis. A GLMM was used to test for fixed effects of Vowel F2 and tACS condition (fast, slow, and sham). The models also included random intercepts for Participants and Target pair, with by-participant and by-word pair random slopes for Vowel F2.

Results

As in Experiment 1, the vowel F2 had an effect on target word perception F2 (β = 0.115, SE = 0.014, z = 8.500, p < .001). In contrast with Experiment 1, and in line with our hypothesis, the difference between fast and slow tACS frequency conditions seemed to be significant: 5.5-Hz tACS led to a small increase in the proportion of long vowel responses relative to 3-Hz tACS (fast vs. slow: β = −0.085, SE = 0.043, z = −1.980, p = .048; Figure 3). Adding the interaction term tACS Condition × Vowel F2 did not improve model fit, as evidenced by log-likelihood model comparison (p = .195). Adding the interaction term rendered the difference between tACS frequencies nonsignificant (p = .075). However, the term did not significantly improve the overall model fit; therefore, the term was excluded from the model. Contrasts with the sham condition yielded no significant result (fast vs. sham: p = .475; slow vs. sham: p = .298), and no effect of tACS phase was observed (i.e., after phase realignment, GLMMs with the predictor realigned oscillation half cycle showed no significant effect of half cycle; p = .536).

Figure 3. 

tACS frequency influenced word perception in Experiment 2. (A) Proportion of long vowel word response during slow (3-Hz) tACS (blue, mean = 42.9%), fast (5.5-Hz) tACS (red, mean = 43.9%), and sham stimulation (black, mean = 43.2 %). Bars denote SEM. *p < .05. (B) Difference in proportion of long vowel word response between slow tACS and sham conditions and between fast tACS and sham conditions. Each dot and dashed line represents one participant. The magenta dashed line denotes the outlier data shown in Figure 4. The bold line represents the average. Bars denote SEM.

Figure 3. 

tACS frequency influenced word perception in Experiment 2. (A) Proportion of long vowel word response during slow (3-Hz) tACS (blue, mean = 42.9%), fast (5.5-Hz) tACS (red, mean = 43.9%), and sham stimulation (black, mean = 43.2 %). Bars denote SEM. *p < .05. (B) Difference in proportion of long vowel word response between slow tACS and sham conditions and between fast tACS and sham conditions. Each dot and dashed line represents one participant. The magenta dashed line denotes the outlier data shown in Figure 4. The bold line represents the average. Bars denote SEM.

Furthermore, we tested whether the observed difference between the fast and slow stimulation frequencies could still be observed when controlling for individual differences in tACS intensity and the order of presentation of trials within a given block. Extending the confirmatory model reported above with the predictor tACS intensity (scaled to improve model fit) did not improve model fit as evidenced by log-likelihood model comparison (log-likelihood: −9038.8 vs. −9037.9; χ2 = 1.768, p = .184). Moreover, the effect between fast and slow tACS frequencies was still significant, even when controlling for tACS intensity (p = .048). The same held when extending the model with the predictor trial number: This also did not improve model fit (log-likelihood: −9038.8 vs. −9037.9; χ2 = 1730, p = .188), and the effect between fast and slow tACS frequencies was still significant (p = .049).

Figure 4 shows the distribution of the difference between fast and slow tACS frequencies for each participant in Experiments 1 (n = 23) and 2 (n = 28). One participant in Experiment 2 was identified as an outlier in terms of effect size (>2 SDs away from the mean). When excluding this participant's data from the analysis, the effect between fast and slow tACS frequencies remained in the same direction but failed to reach significance (p = .056).

Figure 4. 

tACS frequency influence on speech perception is different across experiments. Box plots represent the distribution of the difference between fast and slow tACS conditions in Experiments 1 (n = 23) and 2 (n = 28). Each dot represents one participant. tACS frequency has a larger effect on the perception of spoken words when they are presented in continuous speech (Experiment 2) versus in isolation (Experiment 1). The central mark of the box plot represents the median of the distribution; the edges of the box are the 25th and 75th percentiles; the whiskers extend to the most extreme, nonoutlier data points; and the cross represents an outlier. *p < .05.

Figure 4. 

tACS frequency influence on speech perception is different across experiments. Box plots represent the distribution of the difference between fast and slow tACS conditions in Experiments 1 (n = 23) and 2 (n = 28). Each dot represents one participant. tACS frequency has a larger effect on the perception of spoken words when they are presented in continuous speech (Experiment 2) versus in isolation (Experiment 1). The central mark of the box plot represents the median of the distribution; the edges of the box are the 25th and 75th percentiles; the whiskers extend to the most extreme, nonoutlier data points; and the cross represents an outlier. *p < .05.

To compare tACS frequency effects across experiments, we additionally ran an omnibus analysis on the complete data set from both experiments. This omnibus GLMM was identical to the GLMM reported above, except that it additionally contained the fixed effect experiment and an interaction term for tACS Frequency (fast, slow) × Experiment. Adding this interaction term significantly improved model fit, as evidenced by log-likelihood model comparison (χ2 = 18.953, p < .001), and the two-way interaction was indeed significant for the fast versus slow contrast (β = 0.208, SE = 0.060, z = 3.472, p < .001; Figure 4). Moreover, this interaction was still observed even when the data from the outlier participant in Experiment 2 were excluded (p < .001). These results show that the observed difference in perception between fast and slow tACS conditions was significantly larger in Experiment 2 compared to Experiment 1. Considering that single words were presented in Experiment 1, whereas full sentences were presented in Experiment 2, these results suggest that tACS frequency effects on speech perception may be more readily observable when target words are presented in a (quasi)rhythmic auditory context.

GENERAL DISCUSSION

We tested the effect of tACS frequency (within the theta range) on the perception of speech content, following recent evidence suggesting that low-frequency neural entrainment to the speech envelope influences the categorization of phonemes and therefore the perception of words (Kösem et al., 2018; ten Oever & Sack, 2015). Our first experiment showed no significant effect of tACS frequency on word perception. On the basis of previous tACS studies on the perception of continuous speech (Riecke et al., 2018; Wilsch et al., 2018), we reasoned that this null result may reflect the use of isolated words. Therefore, we further hypothesized that tACS frequency effects on perceptual speech segmentation require the speech to be presented in a continuous (quasi-rhythmic auditory context. Our second experiment provides partial support for our hypotheses: We observed that tACS presented at a fast frequency elicits, on average, more long vowel percepts than tACS presented at a slower frequency, consistent with the idea that entrainment of faster neural oscillations results in a denser sampling of speech input (Kösem et al., 2018). However, this effect was weak. When removing the participant's data with outlier tACS frequency effect, a tendency was still observed, but it failed to reach significance. We also found a significant difference with respect to the tACS frequency effect on speech segmentation across experiments, which was robust to outlier data. In line with our secondary hypothesis, tACS frequency had a significantly larger influence on the segmentation of speech when the latter was presented in a continuous sentential context rather than as an isolated word.

These results suggest that tACS can have a small influence on the perception of speech sounds. We interpret the outcomes as an indication that tACS influenced neural entrainment, which reflects a neural mechanism by which the input speech signal is sampled at the appropriate temporal granularity (Giraud & Poeppel, 2012; Ghitza, 2011). We used a tACS montage targeting auditory cortices (Riecke, Fomisano, et al., 2015; Riecke, Sack, et al., 2015), suggesting that the observed effect occurs in auditory cortical areas involved in speech processing. This notion is corroborated by findings showing that phonological information may be decoded from early auditory oscillatory activity (Di Liberto, O'Sullivan, & Lalor, 2015; ten Oever & Sack, 2015) and that behavioral perceptual biases induced by fast versus slow speech rhythms arise early in perception (Maslowski, Meyer, & Bosker, 2019) and independently from attention (Bosker, Reinisch, & Sjerps, 2017). Our results show no significant effect of tACS phase on vowel perception. Although not the focus of our study, this absence of a phase effect in the presence of a frequency effect is in line with previous results from a speech study that used auditory, instead of electric, stimulation to manipulate neural entrainment (Bosker & Kösem, 2017). It contradicts phase effects observed in a previous tACS study that investigated intelligibility of continuous speech in noise (Riecke et al., 2018), suggesting that such phase effects arise during behavioral tasks that require processes related to auditory stream segregation.

The combined outcomes suggest that tACS may modulate the perceptual sampling of speech more effectively in the context of continuous speech than for single word presentations. A tentative interpretation for our results is that tACS may be more likely to have a modulatory influence on brain oscillations that have already been entrained by prior sensory input. That is, tACS at the relatively weak stimulation intensity used here (∼1.8 mA peak-to-peak) may be more effective in modulating a preexisting neural entrainment (induced by a given rhythmic sensory input) than in inducing neural entrainment in the absence of external sensory rhythms. Concurrent recordings of neural activity during transcranial stimulation show that low-intensity tACS may not induce neural oscillations when neural activity is not strongly rhythmic (Lafon et al., 2017) but could affect already present narrow-band neural rhythms (Reato et al., 2010). This could explain why low-frequency tACS is most effective at frequencies close to ongoing brain rhythms (Kanai, Chaieb, Antal, Walsh, & Paulus, 2008) and in sensory stimulus-induced entrainment settings (Riecke et al., 2018; Wilsch et al., 2018; Zoefel et al., 2018; Riecke, Formisano, et al., 2015). We speculate that, in Experiment 2, tACS at 3 and 5.5 Hz modulated the frequency of neural oscillations that were entrained to the envelope of the continuous speech stimuli, which fluctuated most strongly at 4 Hz. When words were presented in isolation, there was no rhythmic auditory stimulation to entrain neural oscillations, and as such, tACS probably had less influence on the brain processes that involve entrained oscillations, such as temporal predictions (Kösem et al., 2018; Stefanics et al., 2010).

Alternatively, tACS may have affected word perception differently across our two experiments because neural responses to the target word differed when it was presented in continuous speech as compared to when it was presented in isolation. Neural responses to a word are likely attenuated in continuous speech, considering that the response evoked by an acoustic input reduces when the input is preceded by a temporally regular sequence of stimuli (Todorovic & de Lange, 2012; Costa-Faidella, Baldeweg, Grimm, & Escera, 2011). Moreover, tACS-induced periodic alterations in neural excitability may affect sensory stimulus processing most effectively when the stimuli are near threshold. Therefore, tACS probably modulated neural activity in our two experiments in a similar fashion, but this modulation was stronger in Experiment 2 as neural responses evoked by the target word were weaker and thus more susceptible to tACS-induced modulations.

While we interpret the difference in tACS condition effects between experiments as a consequence of the presence versus absence of lead-in sentence, it should be noted that other factors differed between experiments and could have affected the results, such as target word spectral manipulation. Note, however, that speech rate effects of preceding stimulus history have been observed with both spectral manipulation methods (Kaufeld, Ravenschlag, Meyer, Martin, & Bosker, 2019; Kösem et al., 2018; Bosker, 2017); therefore, the reported null results in Experiment 1 cannot solely be attributed to the spectral manipulation of the target words. Furthermore, the size of the effect in Experiment 2 was rather modest and failed to meet our statistical significance criterion when removing outlier data (although the statistical comparison across experiments remained significant). As such, the present outcomes do not warrant bold claims about the alleged “brain-hacking” potential of transcranial electrical brain stimulation. In fact, concerns have been expressed recently about the efficacy of transcranial direct current stimulation and tACS in directly modulating neural activity and behavior, in particular, when applied currents are weak (∼1–2 mA; Liu et al., 2018; Opitz et al., 2016). At this current strength, effects on neural activity are observable but may be restricted to temporal biasing of spikes and/or modulation of ongoing neural rhythms of similar frequency as the applied current (Krause, Vieira, Csorba, Pilly, & Pack, 2019; Liu et al., 2018). Our behavioral findings fit with these observations and point to an interesting role of sensory stimulation history on tACS efficacy, which should inspire further investigation into the constraints under which tACS modulates human behavior and speech comprehension in particular.

Acknowledgments

This study was supported by the Netherlands Organization for Scientific Research (NWO) Gravitation grant 024.001.006 awarded to the Language in Interaction Consortium, a Marie Sklodowska-Curie Individual Fellowship (grant 843088) to A. K., a James S. McDonnell Foundation Understanding Human Cognition Collaborative Award (grant 220020448), and Welcome Trust Investigator Award in Science (grant 207550) to O. J. We thank Annelies van Wijngaarden for the recordings of her voice.

Reprint requests should be sent to Anne Kösem, Centre for Research in Neuroscience Lyon, Brain Dynamics and Cognition Team, INSERM U1028, CNRS UMR529, Lyon, Rhône-Alpes 69366, France, or via e-mail: anne.kosem@inserm.fr.

Note

1. 

The data reported in this paper, including all audio files, are available from the Donders Repository at http://hdl.handle.net/11633/aadgmptw.

REFERENCES

REFERENCES
Ahissar
,
E.
,
Nagarajan
,
S.
,
Ahissar
,
M.
,
Protopapas
,
A.
,
Mahncke
,
H.
, &
Merzenich
,
M. M.
(
2001
).
Speech comprehension is correlated with temporal response patterns recorded from auditory cortex
.
Proceedings of the National Academy of Sciences, U.S.A.
,
98
,
13367
13372
.
Asamoah
,
B.
,
Khatoun
,
A.
, &
Mc Laughlin
,
M.
(
2019
).
TACS Motor system effects can be caused by transcutaneous stimulation of peripheral nerves
.
Nature Communications
,
10
,
266
.
Barr
,
D. J.
,
Levy
,
R.
,
Scheepers
,
C.
, &
Tily
,
H. J.
(
2013
).
Random effects structure for confirmatory hypothesis testing: Keep it maximal
.
Journal of Memory and Language
,
68
,
255
278
.
Bates
,
D.
,
Mächler
,
M.
,
Bolker
,
B.
, &
Walker
,
S.
(
2015
).
Fitting linear mixed-effects models using Lme4
.
Journal of Statistical Software
,
67
,
1
48
.
Boersma
,
P.
, &
Weenink
,
D.
(
2007
).
Praat Ver. 4.06 [Software]
.
Bosker
,
H. R.
(
2017
).
Accounting for rate-dependent category boundary shifts in speech perception
.
Attention, Perception, & Psychophysics
,
79
,
333
343
.
Bosker
,
H. R.
, &
Kösem
,
A.
(
2017
).
An entrained rhythm's frequency, not phase, influences temporal sampling of speech
. In
Proceedings Interspeech 2017
(pp.
2416
2420
).
Bosker
,
H. R.
, &
Ghitza
,
O.
(
2018
).
Entrained theta oscillations guide perception of subsequent speech: Behavioural evidence from rate normalisation
.
Language, Cognition and Neuroscience
,
33
,
955
967
.
Bosker
,
H. R.
,
Reinisch
,
E.
, &
Sjerps
,
M. J.
(
2017
).
Cognitive load makes speech sound fast, but does not modulate acoustic context effects
.
Journal of Memory and Language
,
94
,
166
176
.
Costa-Faidella
,
J.
,
Baldeweg
,
T.
,
Grimm
,
S.
, &
Escera
,
C.
(
2011
).
Interactions between ‘what’ and ‘when’ in the auditory system: Temporal predictability enhances repetition suppression
.
Journal of Neuroscience
,
31
,
18590
18597
.
Di Liberto
,
G. M.
,
O'Sullivan
,
J. A.
, &
Lalor
,
E. C.
(
2015
).
Low-frequency cortical entrainment to speech reflects phoneme-level processing
.
Current Biology
,
25
,
2457
2465
.
Ding
,
N.
,
Melloni
,
L.
,
Zhang
,
H.
,
Tian
,
X.
, &
Poeppel
,
D.
(
2016
).
Cortical tracking of hierarchical linguistic structures in connected speech
.
Nature Neuroscience
,
19
,
158
164
.
Ding
,
N.
, &
Simon
,
J. Z.
(
2013
).
Adaptive temporal encoding leads to a background-insensitive cortical representation of speech
.
Journal of Neuroscience
,
33
,
5728
5735
.
Fröhlich
,
F.
, &
McCormick
,
D. A.
(
2010
).
Endogenous electric fields may guide neocortical network activity
.
Neuron
,
67
,
129
143
.
Ghitza
,
O.
(
2011
).
Linking speech perception and neurophysiology: Speech decoding guided by cascaded oscillators locked to the input rhythm
.
Frontiers in Psychology
,
2
,
130
.
Giraud
,
A.-L.
, &
Poeppel
,
D.
(
2012
).
Cortical oscillations and speech processing: Emerging computational principles and operations
.
Nature Neuroscience
,
15
,
511
517
.
Heimrath
,
K.
,
Fiene
,
M.
,
Rufener
,
K. S.
, &
Zaehle
,
T.
(
2016
).
Modulating human auditory processing by transcranial electrical stimulation
.
Frontiers in Cellular Neuroscience
,
10
,
53
.
Kanai
,
R.
,
Chaieb
,
L.
,
Antal
,
A.
,
Walsh
,
V.
, &
Paulus
,
W.
(
2008
).
Frequency-dependent electrical stimulation of the visual cortex
.
Current Biology
,
18
,
1839
1843
.
Kaufeld
,
G.
,
Ravenschlag
,
A.
,
Meyer
,
A. S.
,
Martin
,
A. E.
, &
Bosker
,
H. R.
(
2019
).
Knowledge-based and signal-based cues are weighted flexibly during spoken language comprehension
.
Journal of Experimental Psychology: Learning, Memory, and Cognition
,
46
,
549
562
.
Kösem
,
A.
,
Bosker
,
H. R.
,
Takashima
,
A.
,
Meyer
,
A.
,
Jensen
,
O.
, &
Hagoort
,
P.
(
2018
).
Neural entrainment determines the words we hear
.
Current Biology
,
28
,
2867
2875
.
Kösem
,
A.
,
Gramfort
,
A.
, &
van Wassenhove
,
V.
(
2014
).
Encoding of event timing in the phase of neural oscillations
.
Neuroimage
,
92
,
274
284
.
Kösem
,
A.
, &
van Wassenhove
,
V.
(
2017
).
Distinct contributions of low- and high-frequency neural oscillations to speech comprehension
.
Language, Cognition and Neuroscience
,
32
,
536
544
.
Krause
,
M. R.
,
Vieira
,
P. G.
,
Csorba
,
B. A.
,
Pilly
,
P. K.
, &
Pack
,
C. C.
(
2019
).
Transcranial alternating current stimulation entrains single-neuron activity in the primate brain
.
Proceedings of the National Academy of Sciences, U.S.A.
,
116
,
5747
5755
.
Lafon
,
B.
,
Henin
,
S.
,
Huang
,
Y.
,
Friedman
,
D.
,
Melloni
,
L.
,
Thesen
,
T.
, et al
(
2017
).
Low frequency transcranial electrical stimulation does not entrain sleep rhythms measured by human intracranial recordings
.
Nature Communications
,
8
,
1199
.
Liu
,
A.
,
Vöröslakos
,
M.
,
Kronberg
,
G.
,
Henin
,
S.
,
Krause
,
M. R.
,
Huang
,
Y.
, et al
(
2018
).
Immediate neurophysiological effects of transcranial electrical stimulation
.
Nature Communications
,
9
,
5092
.
Maslowski
,
M.
,
Meyer
,
A. S.
, &
Bosker
,
H. R.
(
2019
).
Listeners normalize speech for contextual speech rate even without an explicit recognition task
.
Journal of the Acoustical Society of America
,
146
,
179
188
.
Neuling
,
T.
,
Rach
,
S.
,
Wagner
,
S.
,
Wolters
,
C. H.
, &
Herrmann
,
C. S.
(
2012
).
Good vibrations: Oscillatory phase shapes perception
.
Neuroimage
,
63
,
771
778
.
Obleser
,
J.
, &
Kayser
,
C.
(
2019
).
Neural entrainment and attentional selection in the listening brain
.
Trends in Cognitive Sciences
,
23
,
913
926
.
Opitz
,
A.
,
Falchier
,
A.
,
Yan
,
C. G.
,
Yeagle
,
E. M.
,
Linn
,
G. S.
,
Megevand
,
P.
, et al
(
2016
).
Spatiotemporal structure of intracranial electric fields induced by transcranial electric stimulation in humans and nonhuman primates
.
Scientific Reports
,
6
,
31236
.
Peelle
,
J. E.
, &
Davis
,
M. H.
(
2012
).
Neural oscillations carry speech rhythm through to comprehension
.
Frontiers in Psychology
,
3
,
320
.
Peelle
,
J. E.
,
Gross
,
J.
, &
Davis
,
M. H.
(
2013
).
Phase-locked responses to speech in human auditory cortex are enhanced during comprehension
.
Cerebral Cortex
,
23
,
1378
1387
.
Quené
,
H.
, &
van den Bergh
,
H.
(
2008
).
Examples of mixed-effects modeling with crossed random effects and with binomial data
.
Journal of Memory and Language
,
59
,
413
425
.
R Core Team
. (
2013
).
R: A language and environment for statistical computing
.
R Foundation for Statistical Computing
,
Vienna
. http://www.R-project.org/.
Reato
,
D.
,
Rahman
,
A.
,
Bikson
,
M.
, &
Parra
,
L. C.
(
2010
).
Low-intensity electrical stimulation affects network dynamics by modulating population rate and spike timing
.
Journal of Neuroscience
,
30
,
1567
1579
.
Riecke
,
L.
,
Formisano
,
E.
,
Herrmann
,
C. S.
, &
Sack
,
A. T.
(
2015
).
4-Hz transcranial alternating current stimulation phase modulates hearing
.
Brain Stimulation
,
8
,
777
783
.
Riecke
,
L.
,
Formisano
,
E.
,
Sorger
,
B.
,
Başkent
,
D.
, &
Gaudrain
,
E.
(
2018
).
Neural entrainment to speech modulates speech intelligibility
.
Current Biology
,
28
,
161
169
.
Riecke
,
L.
,
Sack
,
A. T.
, &
Schroeder
,
C. E.
(
2015
).
Endogenous delta/theta sound-brain phase entrainment accelerates the buildup of auditory streaming
.
Current Biology
,
25
,
3196
3201
.
Riecke
,
L.
, &
Zoefel
,
B.
(
2018
).
Conveying temporal information to the auditory system via transcranial current stimulation
.
Acta Acustica
,
104
,
883
886
.
Rufener
,
K. S.
,
Oechslin
,
M. S.
,
Zaehle
,
T.
, &
Meyer
,
M.
(
2016
).
Transcranial alternating current stimulation (tACS) differentially modulates speech perception in young and older adults
.
Brain Stimulation
,
9
,
560
565
.
Rufener
,
K. S.
,
Zaehle
,
T.
,
Oechslin
,
M. S.
, &
Meyer
,
M.
(
2016
).
40 Hz-transcranial alternating current stimulation (tACS) selectively modulates speech perception
.
International Journal of Psychophysiology
,
101
,
18
24
.
Stefanics
,
G.
,
Hangya
,
B.
,
Hernádi
,
I.
,
Winkler
,
I.
,
Lakatos
,
P.
, &
Ulbert
,
I.
(
2010
).
Phase entrainment of human delta oscillations can mediate the effects of expectation on reaction speed
.
Journal of Neuroscience
,
30
,
13578
13585
.
ten Oever
,
S.
,
de Graaf
,
T. A.
,
Bonnemayer
,
C.
,
Ronner
,
J.
,
Sack
,
A. T.
, &
Riecke
,
L.
(
2016
).
Stimulus presentation at specific neuronal oscillatory phases experimentally controlled with TACS: Implementation and applications
.
Frontiers in Cellular Neuroscience
,
10
,
240
.
ten Oever
,
S.
, &
Sack
,
A. T.
(
2015
).
Oscillatory phase shapes syllable perception
.
Proceedings of the National Academy of Sciences, U.S.A.
,
112
,
15833
15837
.
Thut
,
G.
,
Schyns
,
P. G.
, &
Gross
,
J.
(
2011
).
Entrainment of perceptually relevant brain oscillations by non-invasive rhythmic stimulation of the human brain
.
Frontiers in Psychology
,
2
,
170
.
Todorovic
,
A.
, &
de Lange
,
F. P.
(
2012
).
Repetition suppression and expectation suppression are dissociable in time in early auditory evoked fields
.
Journal of Neuroscience
,
32
,
13389
13395
.
Wilsch
,
A.
,
Neuling
,
T.
,
Obleser
,
J.
, &
Herrmann
,
C. S.
(
2018
).
Transcranial alternating current stimulation with speech envelopes modulates speech comprehension
.
Neuroimage
,
172
,
766
774
.
Witkowski
,
M.
,
Garcia-Cossio
,
E.
,
Chander
,
B. S.
,
Braun
,
C.
,
Birbaumer
,
N.
,
Robinson
,
S. E.
, et al
(
2016
).
Mapping entrained brain oscillations during transcranial alternating current stimulation (tACS)
.
Neuroimage
,
140
,
89
98
.
Zoefel
,
B.
,
Archer-Boyd
,
A.
, &
Davis
,
M. H.
(
2018
).
Phase entrainment of brain oscillations causally modulates neural responses to intelligible speech
.
Current Biology
,
28
,
401
408
.
Zoefel
,
B.
, &
Davis
,
M. H.
(
2017
).
Transcranial electric stimulation for the investigation of speech perception and comprehension
.
Language, Cognition and Neuroscience
,
32
,
910
923
.
Zoefel
,
B.
,
ten Oever
,
S.
, &
Sack
,
A. T.
(
2018
).
The involvement of endogenous neural oscillations in the processing of rhythmic input: More than a regular repetition of evoked neural responses
.
Frontiers in Neuroscience
,
12
,
95
.

Author notes

This article is part of a Special Focus deriving from a symposium at the 2018 annual meeting of Cognitive Neuroscience Society, entitled, “Hierarchical cortical rhythms and temporal predictions in auditory and speech perception.”