Abstract

Spoken sentence comprehension relies on rapid and effortless temporal integration of speech units displayed at different rates. Temporal integration refers to how chunks of information perceived at different time scales are linked together by the listener in mapping speech sounds onto meaning. The neural implementation of this integration remains unclear. This study explores the role of short and long windows of integration in accessing meaning from long samples of speech. In a cross-linguistic study, we explore the time course of oscillatory brain activity between 1 and 100 Hz, recorded using EEG, during the processing of native and foreign languages. We compare oscillatory responses in a group of Italian and Spanish native speakers while they attentively listen to Italian, Japanese, and Spanish utterances, played either forward or backward. The results show that both groups of participants display a significant increase in gamma band power (55–75 Hz) only when they listen to their native language played forward. The increase in gamma power starts around 1000 msec after the onset of the utterance and decreases by its end, resembling the time course of access to meaning during speech perception. In contrast, changes in low-frequency power show similar patterns for both native and foreign languages. We propose that gamma band power reflects a temporal binding phenomenon concerning the coordination of neural assemblies involved in accessing meaning of long samples of speech.

INTRODUCTION

Spoken sentence comprehension relies on successful temporal binding of speech units, namely the integration of the segmental, suprasegmental, lexical, morphologic, syntactic, semantic, and contextual properties of the utterances that occur at different rates and overlap in time. It remains unknown how this integration is implemented in the human mind, and brain oscillations may offer a plausible mechanistic explanation.

Two separate cognitive operations, both involving temporal binding, are proposed to form the cognitive architecture of language comprehension, namely memory retrieval and semantic/syntactic unification operations (Hagoort, 2005; Jackendoff, 2002). The former refers to retrieval of phonological, syntactic, and semantic properties of words from long-term memory. The latter refers to combining information from individual words to create an overall representation of the utterance. Most evidence on the neural mechanisms underpinning temporal binding for speech comprehension has been provided by studies employing semantic or syntactic violation paradigms, written stimuli, and ERP or event-related field measurements (see Friederici & Weissenborn, 2007; Friederici, 2002). ERPs/event-related fields are time-locked responses to stimuli obtained with high temporal resolution neuroimaging technique recordings such as EEG and magnetoencephalography (MEG).

Previous ERP studies have emphasized the difficulty in identifying a clear onset and offset at which specific speech processes occur, suggesting that many linguistic processes, including phonologic, semantic, syntactic, and pragmatic, might take place in parallel from very early on after the onset of the utterances (see Molinaro, Barber, & Carreiras, 2011; Hagoort, 2008, for reviews). Thus, it is at present unclear how the distributed nodes of the speech network are bound together while sentence level meaning emerges. We hypothesized that the time course of brain oscillations might disclose non-time-locked activities that complement data obtained by ERP studies, which can provide new evidence on temporal binding for speech. Synchronous activation of distributed neuronal assemblies has been proposed as a general mechanism to form transient functional networks and to integrate local information (Singer, 1999; Singer & Gray, 1995). Oscillatory synchrony might thus serve the crucial role of integrating speech units, taking place at different temporal scales. Studies of brain oscillations during the processing of well-formed long samples of speech are rare. Nevertheless, in written language, EEG/MEG data obtained using semantic/syntactic violation paradigms suggest that language-related memory retrieval operations are associated with power increases in the theta band (4–7 Hz) and power decreases in the alpha band (9–14 Hz; Bastiaansen, Oostenveld, Jensen, & Hagoort, 2008; Bastiaansen, Van der Linden, ter Keurs, Dijkstra, & Hagoort, 2005; Hagoort, Hald, Bastiaansen, & Petersson, 2004), whereas semantic/syntactic unification operations are linked to increases in beta (15–20 Hz) and gamma (>21 Hz) band activities (Bastiaansen, Magyari, & Hagoort, 2010; Penolazzi, Angrilli, & Job, 2009; Haarmann, Cameron, & Ruchkin, 2002; Braeutigam, Bailey, & Swithenby, 2001; Rohm, Klimesch, Haider, & Doppelmayr, 2001). In fact, beta power linearly increases in syntactically correct sentences and decreases after a syntactic violation (Bastiaansen et al., 2010). Moreover, contrary to words semantically incongruent with respect to their sentence context, semantically congruent words are accompanied by an increase in low (35–45 Hz; Hald, Bastiaansen, & Hagoort, 2006; Weiss & Mueller, 2003) and broad band (30–100 Hz; Penolazzi et al., 2009) gamma band activity during reading.

Because spoken sentence comprehension increases as the sentence unravels in time, the study of the time course of the patterns of oscillatory activity may disclose mechanisms underpinning the unit-by-unit integration of the incoming information provided by the speech signal and context. According to the binding by synchrony hypothesis, sensorial and cognitive integration is the product of neural synchrony in the gamma band between local (Singer, 2002) and distant (Varela, Lachaux, Rodriguez, & Martinerie, 2001; Rodriguez, Lachaux, Martinerie, Renault, & Varela, 1999) neural assemblies. Supporting this proposal, significant increases in gamma band oscillations have been found to be associated with the emergence of meaningful objects in visual (Tallon-Baudry, 2009; Melloni et al., 2007; Rodriguez et al., 1999) and audio-visual (Schneider, Debener, Oostenveld, & Engel, 2008; Widmann, Gruber, Kujala, Tervaniemi, & Schröger, 2007) studies, supporting the prominent role of high-frequency oscillations in unimodal and polymodal integration. Modulations in gamma band activity might thus reveal the implementation of fast windows of coupling/uncoupling activity between different neural networks engaged in the emergence of unified single object representations (Engel & Singer, 2001; Singer, 1999), including those emerging from speech.

Brain oscillations have also been reported as biological signatures of the on-line tracking and sampling of speech units. At the lexical level, brain activity resonates at the same frequency at which words regularly occur in a continuous speech stream (Buiatti, Peña, & Dehaene-Lambertz, 2009). At the sublexical level, the Asymmetric Sampling in Time (AST) theory proposes that different neural assemblies, asymmetrically distributed over the hemispheres, resonate with the occurrence of slow or rapid events such as syllables or phonemes, respectively (Poeppel, 2003). In fact, theta oscillations over the right hemisphere would reflect syllable tracking, occurring every 200–300 msec, whereas low gamma frequencies over the left hemisphere would indicate phoneme tracking, taking place every 40–100 msec. Indeed, theta and, to some extent, gamma frequency ranges match the average duration of the syllable and phonemes across several languages (Greenberg, Carvey, Hitchcock, & Chang, 2003). Recent studies support the AST proposal, showing that theta band phase coherence was significantly reduced over a group of right temporal sensors when the spectral information of the spoken signal at which the syllable occurs was removed by filtering, rendering speech unintelligible (Luo & Poeppel, 2007). Furthermore, the power spectrum recorded during silence over electrodes projected over left and right Heschl's gyrus showed that theta (3–6 Hz) band activity was stronger over the right hemisphere whereas gamma (28–40 Hz) band activity was stronger over the left hemisphere (Giraud et al., 2007). Brain oscillations may thus reflect the functioning of tracking/sampling mechanisms underpinning the processing of speech units occurring at different temporal scales.

To shed light on the role of brain oscillations during the perception of well-formed spoken sentences, we carried out a cross-linguistic study with native speakers of Italian and Spanish. We hypothesize that speech units such as phonemes, syllables, syntactic structure and some isolated lexical items will be accessed by both groups of participants when they listen to Italian and Spanish utterances, whereas semantic–syntactic–pragmatic integration, indispensable for speech comprehension, will only be possible when participants listen to their respective native language. We investigated the time course of brain oscillations in a broad frequency range (1–100 Hz) during attentive listening to utterances in the native language (Spanish for Spanish speakers and Italian for Italian speakers) and in foreign languages (Spanish and Japanese for Italian speakers and Italian and Japanese for Spanish speakers). All evaluated languages (Italian, Japanese, and Spanish) have similar phonemic repertory (International Phonetic Association, 1999), whereas only Italian and Spanish have similar rhythmic (prosodic), syllabic and syntactic structures, and partially share their lexical repertoire (Ramus, Nespor, & Mehler, 1999; Nespor & Vogel, 1986). Overall, Italian and Spanish are linguistically close, and both have low linguistic similarity with Japanese. Participants were not previously exposed to any of the tested foreign languages. To control for acoustic and articulatory factors, we also evaluated the brain responses to the utterances from the three languages played reversed in time (thereafter backward speech). To assure that participants attentively listened to all utterances, they were required to judge whether a short sound played after the end of each utterance was part of the sentence or not (see Figure 1A). In summary, in two independent groups (i.e., Italian and Spanish native speakers), we evaluated six experimental conditions obtained from the combination of three languages (Italian, Japanese and Spanish) and two types of playback (forward and backward).

Figure 1. 

(A) Trials started with a fixation cross followed by a 1500-msec silent period after which an Italian, Japanese, or Spanish utterance played either forward or backward was presented. After a 500-msec silent interval, a 300-msec-long test sound was presented, and participants had to judge whether the test sound was part of the previous utterance. (B) We depict a schematic proposal on linking the changes in oscillatory activity with the cognitive processes underpinning speech processing. Briefly, changes in the low-frequency spectrum (<20 Hz) would reflect the processing of meaningless speech units (e.g., phonemes and syllables) and general nonlinguistic cognitive processing, whereas the high-frequency spectrum would mirror the integration of the meaningful ones (e.g., words).

Figure 1. 

(A) Trials started with a fixation cross followed by a 1500-msec silent period after which an Italian, Japanese, or Spanish utterance played either forward or backward was presented. After a 500-msec silent interval, a 300-msec-long test sound was presented, and participants had to judge whether the test sound was part of the previous utterance. (B) We depict a schematic proposal on linking the changes in oscillatory activity with the cognitive processes underpinning speech processing. Briefly, changes in the low-frequency spectrum (<20 Hz) would reflect the processing of meaningless speech units (e.g., phonemes and syllables) and general nonlinguistic cognitive processing, whereas the high-frequency spectrum would mirror the integration of the meaningful ones (e.g., words).

In the context of the current task, we predicted that the oscillatory responses could reflect (a) non-language-specific activity, that is, cognitive processing related to task performance such as attentive monitoring and STM, and (b) language-specific activity related to speech units processing (see Figure 1B).

Regarding non-language-specific oscillatory activity, we expected to observe alpha suppression after the cue indicating the trial onset. Alpha suppression has been reported for alerting when participants are exposed to a cue predicting the presentation of a target (Babiloni et al., 2004), during sustained attention (Yamagishi, Goda, Callan, Anderson, & Kawato, 2005), and after the abrupt onset of visual (Yantis & Jonides, 1984) and auditory stimuli (Shahin, Picton, & Miller, 2009). We expected that alpha suppression would decrease while the sentence evolves in time, remaining higher for attentionally more demanding experimental conditions.

Concerning language-specific oscillatory responses, we anticipated two possible, not mutually exclusive, scenarios reflecting different aspects of speech processing. The first scenario concerns sampling and tracking mechanisms of meaningless speech units (e.g., phonemes and syllables). On the basis of the AST theory, it is expected that phoneme and syllable tracking processes should be accompanied by an increase in oscillatory power at the frequency at which the tracked units occur, that is, the theta band for syllables and the low gamma band for phonemes. Regarding the time course of oscillatory activity reflecting phoneme and syllable tracking, it is expected that the increase in both theta and low gamma band power should start as soon as phonemes and syllables are perceived and remains high until the utterance finishes. Furthermore, it should be similar in any experimental condition in which syllables and phonemes can be identified. Specifically, because the phoneme repertory is highly similar in all evaluated languages, we predicted that low gamma band will significantly increase in all forward utterances reflecting the ability to track phonemes. We also predicted an increase in theta band activity for Italian and Spanish forward utterances, reflecting syllable tracking. Japanese is a special case, because it is structured containing mora, a subsyllabic prosodic unit. However, in nonnative Japanese-speaking adults, the mora is likely to be perceived as a syllable, and thus theta band activity should also be high when participants listen to forward Japanese utterances. We anticipated a significantly smaller increase in theta and low gamma band for backward speech, because the reversion in time of utterances seriously distorts the acoustic–phonetic properties of some frequent phonemes and syllables, rendering sampling and tracking of these linguistic units difficult. A second scenario will become evident when the first linguistic units conveying meaning, such as words or phrases, are recognized and must remain active until the sentence level meaning is found. This processing may reflect the implementation of semantic/syntactic unification processes involving the updating of information provided by incoming words with that provided by their neighbors. This second step is unique to the native language and engages different neural networks involved in processing the speech signal and other analyses relevant to the emergence of a meaningful object from utterances. We predicted that the time course of high-frequency brain oscillations, specifically in the gamma range should increase as soon as the first meaningful linguistic units were recognized and sentence comprehension started and should remain high until the sentence level meaning of the utterance could be anticipated, the moment when integration is no longer required. The last word of a sentence can be recognized as soon as its first phoneme is perceived (Van Berkum, Brown, Zwitserlood, Kooijman, & Hagoort, 2005; McQueen, Cutler, & Norris, 2003). Listener expectations allow that the end of almost any utterance can be guessed (Van Berkum, 2008), and gamma band power might thus decrease before the sentence ends. We did not predict increments in gamma band oscillations for either nonnative language forward or backward utterances, because the meaningful units would not be integrated in a sentence level meaning. Neither did we predict increases in gamma band for foreign utterances containing recognizable lexical items (such as /galleria/, which means gallery in both Italian and Spanish), because we anticipated that gamma band activity will reflect sentence level meaning updating and building processes, but not isolated word memory retrieval. Concerning theta activity, previous studies have shown increases in theta band activity associated with the increase in verbal STM demands (Bastiaansen et al., 2010; Hagoort & van Berkum, 2007; Weiss et al., 2005). From this perspective, we also expected that theta band activity would increase in utterances from the native and the close foreign language, associated to the recognition of the ongoing lexical and phrasal items. Finally, regarding the beta band, we anticipated that if beta activity reflects semantic/syntactic unification processes (Bastiaansen et al., 2010) we should observe a linear increase in beta band exclusively for native utterances.

In summary, we expected to find patterns of oscillatory activity that were illustrative of the processes underpinning access to the meaning of well-formed spoken sentences in the native language, using a no-violation paradigm.

METHODS

Participants

Two groups of 24 adults each were evaluated. One group of participants was composed of native Spanish speakers (from Chile), and the other group consisted of native Italian speakers (from Italy). Four participants of each group were excluded from the analysis because their EEG data contained artifacts in more than 50% of the trials in one or more experimental conditions (see Methods). All participants were monolingual, of ages 20–30 years (Spanish group: mean = 23.2 years, SD = 2.6 years; Italian group: mean = 24.6 years, SD = 2.9 years), right-handers, and reported normal audition. There were 12 women in each group. The study received the approval of the regional ethical committee for biomedical research. Participants received a monetary compensation for their participation and signed written informed consent to participate in the study.

Stimuli

Forward and Backward Utterances

We recorded speech samples from nine different female monolingual native speakers, three from each language. Each speaker recited a series of 54 utterances in her native language using adult-directed speech. From this pool of utterances, we selected 18 utterances by each speaker creating a set of 54 different utterances per language. The set of utterances had similar semantic interpretation across all languages (see the list of sentences translated to English in Supplementary Text 1). All sentences were affirmative; pronounced using adult-directed speech; were not prosodically, semantically, or syntactically related to each other; and were matched in the number of syllables and (as much as possible) in the number of function and content words. All sentences had an SVO structure in Spanish and Italian and an SOV structure in Japanese. No systematic acoustic or linguistic clues were present at any time during the utterances. The set of utterances was not significantly different across languages in either mean energy (root mean square = 0.18, 0.19, and 0.18 Pa, for Italian, Japanese, and Spanish, respectively), syllable number (15–18 syllables) or duration (2800–3000 msec). Supplementary Figure 1 illustrates the waveform and spectrogram of a single utterance played in each one of the three languages for their forward and backward versions. A naive native speaker of each language verified that the selected utterances were well formed. A set of 54 backward utterances per language was created by reversing the forward utterances in time (find an example in Supplementary Sound 1 with the corresponding description in Supplementary Text 2). Critically, all participants were evaluated with identical stimuli, procedure, EEG system, and experimental design in their respective native countries.

Test Sounds

Test sounds were 300-msec long chunks, randomly extracted from any part of the utterance. Chunks consisted of segmental or suprasegmental units that did not match real words or verbal expressions in any native language. Test sounds belonged to the same experimental condition as the spoken sentence. For instance, after a Forward Italian utterance, the target could be a chunk extracted either from the same spoken sentence or from another Forward Italian sentence.

Task and Procedure

Participants were evaluated in sound proof Faraday rooms. The structure of the trials is illustrated in Figure 1A. Fifty-four trials per experimental condition were presented. Participants were accommodated in a chair placed 1.5 m away from a monitor and loud speakers. Written instructions, presented on the monitor, informed participants that they were expected to attentively listen to a series of utterances from different languages delivered by loud speakers. Once the sentence finished, a short sound would be presented, and they should judge whether the sound was a chunk taken from the preceding utterance. In each trial, participants responded by pressing one of two different buttons on a response pad. For half of the participants, the right button indicated “YES” and the left button indicated “NO,” whereas for the other half, the button assignment was reversed. The delayed match-to-sample task of acoustic chunks was used to assure equivalent attentional demands during the processing of native and foreign languages. No practice or feedback was provided. We instructed participants to avoid body and eye movements and blinking; however, small movements and blinking were allowed during breaks that were set to occur every 12 min of the session. The presentation order of the experimental conditions was pseudorandomized across participants. The consecutive repetition of the same experimental condition, speaker, and spoken sentence was not allowed.

EEG Data Acquisition

EEG was continuously recorded with a 64- and 128-channel EEG system (EGI, Inc., Eugene, OR) in Spanish and Italian participants, respectively. The EEG was digitized at a sampling rate of 1000 Hz (bandpass filter = 0.01–100 Hz). Electrodes were referenced to the vertex (Cz).

Data Analysis

Behavioral performance and electrophysiological activity were compared across groups and experimental conditions. The analysis of brain activity focused on the attentive listening period of each trial. We present here the results for correct trials; however, similar results were obtained when all trials were analyzed together.

Behavioral Data

Mean accuracy and RT of the correct responses were submitted to separate repeated measures ANOVA with Language (Italian, Japanese, and Spanish) and Type of Utterance (Forward and Backward) as within-subject factors and Group (Italian and Spanish) as a between-subject factor. The Greenhouse–Geisser correction was applied.

Time Resolved Spectral Power Computation

The raw EEG signal was segmented into a series of 5000-msec long epochs starting at 1500 msec before the onset of the utterances. The continuous 50-Hz (AC) component (same in Chile and Italy) was filtered in each epoch while keeping the biological 50-Hz signal. To achieve that, the amplitude and phase of the AC signal was estimated and subtracted from the original signal, resulting in the selectively elimination of the periodic part of the 50-Hz component (line frequency). Channels contaminated with eye movements, blinking or motion artifacts, and epochs with more than seven contaminated channels with voltage fluctuations exceeding ±100 μV, transients exceeding ±70 μV, or electrooculogram activity exceeding ±70 μV were excluded from the spectral power analysis. Each nonrejected epoch was analyzed by applying a sliding window fast Fourier transform (Hamming window, window length, and step and window overlap were equal to 232 msec, 10 msec, and 90%, respectively, for frequencies from 11 to 100 Hz and 500 msec, 10 msec, and 95%, respectively, for frequencies from 1 to 10 Hz). For every participant, time, and frequency, bin amplitude was computed following the procedure described in Melloni et al. (2007): Signal windows (232 or 500 points) were zero-padded and fast Fourier transformed to get an interpolated frequency resolution of ∼1 Hz per frequency bin. The instantaneous amplitude was then computed by taking the real and imaginary Fourier coefficients (C(f,t)r and C(f,t)i), squaring and adding them, and taking the square root (i.e., for a given time window t and frequency bin f), as follows:
formula
This amplitude is equivalent to the magnitude of the observed oscillation at a given time and frequency point, and it was used to construct a time–frequency map per experimental condition. Each time–frequency map was normalized against 1500 msec prestimulus baseline and averaged across all nonrejected trials and electrodes. The normalization involved subtracting the baseline average and dividing it by the baseline standard deviation (SD) on a frequency-by-frequency basis where S is a signal, μ is the average of the signal during the baseline period, and σ is the SD of the same baseline period. Then, the normalized signal was computed by
formula

We ran the spectral power analysis over 64 or 128 electrodes in the Spanish and Italian groups, respectively. Similar results were obtained in the Italian group when we restricted the analysis to 64 electrodes roughly matched in location to the 64 electrodes of the net used for Spanish speakers.

Time–frequency windows for statistical comparisons

Time–frequency ROI (TF-ROI) were identified by evaluating significant task-related changes of oscillatory activity, being oblivious to any effect of experimental conditions or groups. By selecting the time windows of interest (in a way orthogonal to the research hypothesis, which was to evaluate the effect of native language), we alleviate the multiple comparison problem, because instead of running multiple ANOVAs with Language, Type and Group for every time and frequency bin, we only run the ANOVAs over the identified TF-ROIs. Independence between the selection procedure and the test of the experimental factors was achieved through assuring orthogonality between the selection contrast and the test contrast, a balanced design matrix between conditions, and by including equal amounts of trials per condition (Kriegeskorte, Simmons, Bellgowan, & Baker, 2009). For the selection contrast, we averaged activity across all experimental factors, that is, group, language, type, and electrodes, and contrasted those values against the prestimulus interval. This procedure identified the time windows when power at each frequency bin significantly differed between the listening to utterance period and the corresponding baseline without any a priori assumption about the effect of group, language or type of utterance and is orthogonal to the effect of experimental conditions or groups (Kriegeskorte et al., 2009). In particular, we first averaged the time–frequency responses across the six experimental conditions and channels for each participant. Then, the time–frequency averages from all participants were pooled together, regardless of their group. The mean power at each of the 100 frequency bins was submitted to a paired t test (alpha = 0.05; two-tailed) comparing each sample of the power during the listening period (0–3500 msec after the onset of the utterance) against its corresponding baseline. A single baseline value per frequency was obtained by averaging the power across the 1500 msec before utterance onset. As a result, we observed three TF-ROIs in which the power between 1 and 100 Hz was significantly different from baseline (Figure 2 and Supplementary Figure 2a and b). The first window involved a frequency range from 4 to 8 Hz (theta band) and extended from 100 to 3200 msec after sentence onset. The second window concerned frequencies from 9 to 14 Hz (alpha band) and extended from 1000 to 2800 msec. The last window implicated frequencies from 55 to 75 Hz (middle gamma band) and extended from 1000 to 2900 msec after the sentence onset. No significant differences were observed for other frequencies, including low gamma (21–40 Hz) and beta (15–20 Hz) bands. Similar TF-ROIs were identified when applying false discovery rate (q < 0.05) to correct for multiple comparisons in the entry matrix of p values used to identify the TF-ROIs (see Supplementary Figure 2). We report results based on the uncorrected TF-ROI windows as differences in window size did not affect the results reported below. The uncorrected TF-ROIs provide an estimate of the lower and upper bound of the effects in time. In this context, it is important to note that the differences in the middle gamma band starting at 1000 msec after the onset of the sentences is observed for the average activity and thus should be taken as the upper bound of the effect, that is, the point at which maximal overlap between sentences and subjects is reached for the first time, such that the increase in gamma becomes statistically significant.

Figure 2. 

The time course of the mean t (A) and p values (B, C) for theta, alpha, and low and middle gamma bands from the onset to the end of the utterances are depicted. The t and p values for each frequency bin from 1 to 100 Hz were obtained from the comparison of the oscillatory activity averaged across all utterances from all experimental conditions, all channels, and, regardless of the group, against the baseline. The mean t and p values are plotted in red lines for theta, from 4 to 8 Hz; green lines for alpha, from 9 to 14 Hz; black lines for low gamma, from 21 to 40 Hz; and blue lines for middle gamma power, from 55 to 75 Hz, in blue lines.

Figure 2. 

The time course of the mean t (A) and p values (B, C) for theta, alpha, and low and middle gamma bands from the onset to the end of the utterances are depicted. The t and p values for each frequency bin from 1 to 100 Hz were obtained from the comparison of the oscillatory activity averaged across all utterances from all experimental conditions, all channels, and, regardless of the group, against the baseline. The mean t and p values are plotted in red lines for theta, from 4 to 8 Hz; green lines for alpha, from 9 to 14 Hz; black lines for low gamma, from 21 to 40 Hz; and blue lines for middle gamma power, from 55 to 75 Hz, in blue lines.

To explore the spatial distribution of the oscillatory response, spectral power in each TF-ROI was averaged over nine clusters of adjacent electrodes located over lateral (central, left, and right) and antero-posterior (anterior, middle, and posterior) regions of the scalp. All electrode clusters contained similar number of electrodes in both groups of participants (see Figure 3).

Figure 3. 

Left and right panels show the nine clusters of adjacent electrodes over a 64- and a 128-electrode net arrangements used in Spanish and Italian groups, respectively. Each group of electrodes involves the electrodes numbered under the silver-colored area, whereas the dotted lines indicate the limit of the groups of electrodes. The names of the groups of electrodes are indicated in the left picture.

Figure 3. 

Left and right panels show the nine clusters of adjacent electrodes over a 64- and a 128-electrode net arrangements used in Spanish and Italian groups, respectively. Each group of electrodes involves the electrodes numbered under the silver-colored area, whereas the dotted lines indicate the limit of the groups of electrodes. The names of the groups of electrodes are indicated in the left picture.

The mean spectral power observed in each TF-ROI for the nine groups of electrodes was submitted to separated repeated measures ANOVAs, one per TF-ROI, with Language (Italian, Japanese, and Spanish), Type of Utterance (forward and backward), Lateralization (central, left, and right), and Antero-posterior Location (anterior, middle, and posterior) as within-subject factors and Group (Italian and Spanish groups) as between-subject factor. Greenhouse–Geisser correction was applied in all comparisons.

Finally, it should be noted that a main advantage of our cross-sectional experimental design is that it allows ruling out concerns related to stimulus material, methodology, and/or selection procedure in the TF-ROIs, etc. This is because our results can be replicated in an independent population. In particular, the effect of native language over another language can be evaluated for Spanish native speakers and replicated in Italian native speakers (see Results). Replication is the most rigorous test that can be applied and immediately eliminates spurious results (which are not likely to replicate).

Comparison with previous studies showing hemispheric differences

Previous data reported hemispheric differences for theta and low-gamma activities during speech processing. In fact, theta phase coherence from 4 to 8 Hz is significantly greater over the right as compared with the left temporal MEG sensors during intelligible speech perception (Luo & Poeppel, 2007). Moreover, during silence (Giraud et al., 2007), the spontaneous oscillatory power over fronto-temporal electrodes is greater for theta (3–6 Hz) over the right as compared with the left hemisphere, whereas the contrary is observed for low gamma (28–40 Hz). To compare our results with these previous studies, we submitted the mean theta and low gamma power observed during the time–frequency windows and electrodes described for each one of the referenced studies (see Supplementary Figure 3) to separated repeated measures ANOVAs, one for each study and each frequency range, with Language (Italian, Japanese, and Spanish), Type of Utterance (forward and backward), and Hemisphere (right and left) as within-subject factors and Group (Italian and Spanish groups) as a between-subject factor. Greenhouse–Geisser correction was applied in all comparisons with more than two levels.

ERP Analysis

To rule out that differences in induced response were attributable to differences in evoked responses, ERP and global field power were compared between experimental conditions and groups (for a detailed description of these analyses, see Supplementary Text 3).

RESULTS

Behavioral Results

Confirming the compliance with the task, in both groups the mean accuracy was significantly higher than chance for all experimental conditions (p < .001 for each comparison; Figure 4A).

Figure 4. 

Mean accuracy (A) and mean RT (B) of the correct responses for Italian and Spanish groups per experimental condition are depicted in the left and the right plots, respectively. Vertical lines indicate 1 SD of the mean.

Figure 4. 

Mean accuracy (A) and mean RT (B) of the correct responses for Italian and Spanish groups per experimental condition are depicted in the left and the right plots, respectively. Vertical lines indicate 1 SD of the mean.

Regarding the effect of the native language on mean accuracy, we found a significant triple interaction, Language × Type × Group [F(2, 76) = 4.3, p < .02]. Mean accuracy was significantly higher for forward native language as compared with each other experimental condition in both groups (p < .05 for each comparison). In addition, mean accuracy and RTs were significantly greater for forward as compared with backward sentences in both groups and languages [F(1, 38) = 11.4, p < .002; F(1, 38) = 4.4, p < .04; main effect of Type of Utterance in accuracy and RTs, respectively], suggesting that participants were more accurate but slower to process forward than backward utterances (Figure 4A and B).

At the end of the experiment, participants were asked to report whether they had recognized the foreign languages. All participants answered that the languages were Spanish, Italian, and other Asiatic or Slavic languages, some participants also recognized Japanese. Moreover, all participants reported that they recognized one or two isolated lexical items in 50–60% of the utterances from the rhythmically close language (i.e., Spanish for Italians and Italian for Spanish speakers; see Supplementary Table 1). Those words occurred at different moments of the utterances of the rhythmically close foreign language. Importantly, none of them recognized a complete sentence in any of the foreign languages. These set of isolated words were recalled at the end of the utterances, showing that they were maintained in short-term verbal memory. Backward utterances were mostly reported as unknown foreign languages.

Oscillatory Brain Activity

Middle Gamma Band

We found a significant Language × Type of Utterance × Group interaction [F(1.9, 74.1) = 6.7, p < .002]. Post hoc comparisons revealed that both groups, Italian and Spanish participants, showed significantly higher middle gamma power for their respective native language as compared with the native language of the other group of participants played forward. In fact, mean middle gamma power was significantly higher for forward Spanish than for forward Italian utterances in Spanish speakers whereas it was significantly higher for forward Italian than for forward Spanish utterances in Italian speakers [F(1, 38) = 11.53, p < .002] over each group of electrodes (see Table 1). The temporal evolution of oscillatory power per group, language, and type of utterance is depicted in Figure 5.

Table 1. 

Mean Middle Gamma Band Power Comparisons by Group of Electrodes, Restricted to Spanish and Italian Forward Utterances

Lateralization
Antero-posterior Location
Language
Italian Speakers
Spanish Speakers
Language × Group
Mean
SD
Mean
SD
F(1, 38)
p
Central Anterior Italian 0.046 0.112 0.027 0.077 5.7 .02 
Spanish −0.001 0.082 0.068 0.189 
Central Middle Italian 0.085 0.129 0.038 0.072 12.7 .001 
Spanish 0.010 0.048 0.097 0.171 
Central Posterior Italian 0.088 0.113 0.034 0.047 10.7 .002 
Spanish 0.037 0.078 0.081 0.091 
Left Anterior Italian 0.119 0.165 0.042 0.099 10 .003 
Spanish 0.016 0.078 0.119 0.244 
Left Middle Italian 0.131 0.163 0.058 0.105 10.8 .002 
Spanish 0.033 0.056 0.145 0.247 
Left Posterior Italian 0.103 0.119 0.047 0.067 11.3 .002 
Spanish 0.048 0.072 0.117 0.155 
Right Anterior Italian 0.122 0.211 0.058 0.112 12.3 .001 
Spanish 0.003 0.091 0.167 0.296 
Right Middle Italian 0.138 0.228 0.055 0.102 3.9 .054 
Spanish 0.027 0.078 0.174 0.267 
Right Posterior Italian 0.104 0.148 0.040 0.050 11.2 .02 
Spanish 0.035 0.067 0.104 0.149 
Lateralization
Antero-posterior Location
Language
Italian Speakers
Spanish Speakers
Language × Group
Mean
SD
Mean
SD
F(1, 38)
p
Central Anterior Italian 0.046 0.112 0.027 0.077 5.7 .02 
Spanish −0.001 0.082 0.068 0.189 
Central Middle Italian 0.085 0.129 0.038 0.072 12.7 .001 
Spanish 0.010 0.048 0.097 0.171 
Central Posterior Italian 0.088 0.113 0.034 0.047 10.7 .002 
Spanish 0.037 0.078 0.081 0.091 
Left Anterior Italian 0.119 0.165 0.042 0.099 10 .003 
Spanish 0.016 0.078 0.119 0.244 
Left Middle Italian 0.131 0.163 0.058 0.105 10.8 .002 
Spanish 0.033 0.056 0.145 0.247 
Left Posterior Italian 0.103 0.119 0.047 0.067 11.3 .002 
Spanish 0.048 0.072 0.117 0.155 
Right Anterior Italian 0.122 0.211 0.058 0.112 12.3 .001 
Spanish 0.003 0.091 0.167 0.296 
Right Middle Italian 0.138 0.228 0.055 0.102 3.9 .054 
Spanish 0.027 0.078 0.174 0.267 
Right Posterior Italian 0.104 0.148 0.040 0.050 11.2 .02 
Spanish 0.035 0.067 0.104 0.149 
Figure 5. 

The mean time–frequency power across all channels per group and experimental condition for backward (A) and forward (B) utterances is depicted. The color bar indicates the spectral power in SD units (σ).

Figure 5. 

The mean time–frequency power across all channels per group and experimental condition for backward (A) and forward (B) utterances is depicted. The color bar indicates the spectral power in SD units (σ).

In addition, gamma band power was higher for left (p < .001) and right (p < .004) groups of electrodes [lateralization, F(1.7, 65.9) = 8.9, p < .001] than for central electrodes and showed a posterior-to-anterior distribution, with higher amplitudes in middle (p < .009) and posterior (p < .03) electrodes than anterior ones [antero-posterior location, F(1.3, 50.3) = 3.6, p < .03]. Scalp topographies of the effects are shown in Figure 6.

Figure 6. 

We draw the spatial distribution of the mean power in Italian and Spanish groups for the time–frequency ROIs significantly different to the baseline, that is, theta band (4–8 Hz) from 100 to 3200 msec; alpha band (9–14 Hz) from 0 to 500 msec and from 2300 to 2800 msec; and middle gamma band (55–75 Hz) from 1000 to 2900 msec. Ears and nose indicate the orientation of the skull maps. Color bars indicate the amplitude of the power in SD unit.

Figure 6. 

We draw the spatial distribution of the mean power in Italian and Spanish groups for the time–frequency ROIs significantly different to the baseline, that is, theta band (4–8 Hz) from 100 to 3200 msec; alpha band (9–14 Hz) from 0 to 500 msec and from 2300 to 2800 msec; and middle gamma band (55–75 Hz) from 1000 to 2900 msec. Ears and nose indicate the orientation of the skull maps. Color bars indicate the amplitude of the power in SD unit.

These results support the proposal that gamma oscillations reflect the integration of meaningful units (i.e., words) during native language processing. Middle gamma power increased around 1000 msec, coincident with the time when two or more words have been recognized and integrated, and decreased when the comprehension of the sentence could be anticipated and, in the context of our task, word integration was not longer required. Further supporting our prediction concerning word integration, we did not observe significant increments in gamma oscillations for any other forward or backward experimental condition although participants had access to phonemes, prosodic units, isolated words, morphemes, and, in some cases, even syntactic frameworks but that they were unable to integrate into a comprehensible message.

The rather late onset is determined by the summation across trials. It is thus possible that some effects started earlier in time but reached stability only around 1 sec. Hence, the latency reflects the upper bound of the effects.

Theta Band

Theta band power was significantly higher for forward than backward spoken sentences [type of utterance, F(1, 38) = 13.9, p < .001] and showed the opposite scalp distribution as middle gamma. That is, theta band power was significantly higher over anterior than middle (p < .005) and posterior (p < .001) regions [antero-posterior location, F(1.4, 51.9) = 13.3, p < .001]. Moreover, six significant interactions were observed, that is, Type of Utterance × Lateralization [F(1.9, 51.9) = 13.6, p < .04], Type of Utterance × Language × Lateralization [F(3.4, 129) = 2.9, p < .03], Type of Utterance × Antero-posterior Location [F(1.2, 47) = 5.5, p < .02], Language × Antero-posterior Location [F(2.7, 102) = 2.9 p < .04]; Lateralization × Antero-posterior Location [F(1.8, 70) = 12.7 p < .001], and Type of Utterance × Lateralization × Antero-posterior Location [F(2.9, 112) = 4.3, p < .006]. Post hoc analysis showed that, across both groups and languages, theta band power was significantly higher for forward utterances over the central anterior group of electrodes as compared with the response over any other group of electrodes (p < .05 for each comparison). Furthermore, mean theta power was significantly lower for Forward Japanese than for Forward Spanish utterances over the right posterior than the central anterior group of electrodes (p < .05).

Overall, our results support the role of theta oscillations in syllabic tracking. First, as predicted, theta band power was significantly higher for forward than backward languages. In backward speech the syllabic tracking could be periodically reduced mainly as a consequence of the distortion in syllables containing stop phonemes. Second, the time course of theta band power was consistent with syllable tracking: It started after 100 msec and remained high until the end of the utterance. Third, the fact that both Italian and Spanish participants exhibited similar increases in theta power for all forward languages including Japanese may be explained because adult participants perceived more as syllables.

Alpha Band

Alpha power was lower over posterior as compared with anterior (p < .001) and middle (p < .001) groups of electrodes in the period from 0 to 500 msec after sentence onset (antero-posterior location, F(1.4, 54.6) = 25.6, p < .001). Moreover, a Lateralization × Antero-posterior Location was observed [F(3.3, 126) = 13.4, p < .001]. Over anterior groups of electrodes alpha power was greater for central and right as compared with left electrodes (p < .04 and p < .02, respectively), whereas over posterior groups of electrodes, alpha power was significantly greater for central as compared with left and right electrodes (p < .002 and p < .004, respectively). Across all experimental conditions and groups, alpha power significantly increased as a function of time, as revealed by higher alpha power around the end of the sentence [2.3–2.8 sec after sentence onset) as compared with its beginning (0–0.5 sec; F(1, 38) = 17.3, p < .001]. In addition, around the sentence ending (2.3–2.8 sec) mean alpha power was lower for backward as compared with Forward utterances [utterance, F(1, 38) = 5.3, p < .03].

These results suggest that participants deployed a similar amount of attentional resources to process the onset of the sentences in all experimental conditions. Alpha suppression was, however, less reduced for backward sentences around their end, suggesting that backward speech demanded more attention.

Interhemispheric Comparisons

We did not observe significant hemispheric differences neither for theta nor for low gamma band power, when we restricted our comparisons to those sensors and frequency windows previously used (Giraud et al., 2007; Luo & Poeppel, 2007; see Supplementary Table 2). However, although we did not find hemispheric differences, we did observe regional differences in theta band power with significantly higher responses over central anterior electrodes as compared with middle or posterior electrodes (see above).

ERPs

We did not find significant differences either by Group, Language, or Type of Utterance for the ERP from 0 to 500 or from 0 to 2900 msec after the sentence onset (see Supplementary Figure 4), suggesting that the evoked activity to speech was similar in both groups and across all experimental conditions. The same holds true when analyzing the global field power over the entire period, comparing forward versus backward utterances or forward native language against the closest language in each group of subjects (Spanish vs. Italian for the Italian- and Spanish-speaking group).

DISCUSSION

Our results show that changes in brain oscillations may reflect different steps of the speech processing, including syllable tracking/sampling and sentence level meaning processing (see Figure 1A and B). In fact, the early starting and sustained increase in theta power observed in the three evaluated languages played forward support the AST proposal about syllable tracking (Poeppel, 2003). Moreover, the time course of the increase in middle gamma band power, observed in native language only, fits well with predictions from psycholinguistic models about semantic/syntactic integration for speech (see below).

Gamma Power Increase for the Native Language

The only significant difference in brain oscillations related to native language processing was an increase in middle gamma band power. The middle gamma band increase became significant 1000 msec after sentence onset, remained high for several hundreds of milliseconds, and decreased around the end of the sentence. The time course of middle gamma band observed for the native language fits well with the time course predicted by current psycholinguistic models for the semantic/syntactic unification process. Both parallel and serial models acknowledge that sentence level comprehension relies on a temporal binding of the speech units that evolves while speech unravels (e.g., Hagoort & van Berkum, 2007; Culicover & Jackendoff, 2006; Gorrell, 1998; McClelland, St. John, & Taraban, 1989; Frazier, 1987; Marslen-Wilson & Tyler, 1980). The semantic and/or syntactic interpretation of any utterance cannot be guessed from the perception and even recognition of its initial word; however, it can be anticipated before the utterance's end. Unification thus should take place in between, involving several processes (e.g., temporal binding of prelexical and lexical aspects as well as their semantic–syntactic–pragmatic unification) that likely entail dynamic coordination of a series of close and distant linguistic and nonlinguistic neural networks.

The fact that we did not observe significant increments in gamma oscillations for the close language although participants had access to phonemes, prosodic units, syllables and, to some extent, lexical and morphosyntactic structures that they, however, could not integrate either with neighboring words or in the context of the message further supports our claim that middle gamma band activity reflects linguistic processes of semantic integration of several words into a comprehensible meaning. Note that Spanish and Italian participants reported the recognition of isolated words from the close foreign language based on lexical similarities with their native language. However, despite its salience and memorization, isolated lexical recognition did not lead to a sustained increase in middle gamma band power for foreign languages. Middle gamma band power thus cannot be associated to the salience or verbal memory of isolated words lending further support to our proposal that reflects integration while the utterance unravels in time. Also, differences in performance between conditions, in particular, the fact that higher performance was observed for the native language sentences, are unlikely to explain the middle gamma band results, as we did not observe any significant correlation between performance and the amplitude of the gamma oscillation or between the different experimental conditions (data not shown). Previous studies have shown that low gamma band activity positively correlates with attentional demands (Palva & Palva, 2007) and/or cognitive effort (Simos, Papanikolaou, Sakkalis, & Micheloyannis, 2002). Our results, however, cannot be attributed to these factors, as the most attention demanding conditions (such as backward speech, which had lower performance as compared with forward conditions) were not associated with a significant increase in middle gamma power. Subvocal activation associated with the programming of the tongue movements has been suggested to explain the increase in low gamma power from 30 to 35 Hz (Giraud et al., 2007). Similarly, to rehearse the utterances, native speakers could have silently repeated the spoken sentences until their end. However, we did not observe significant differences at this frequency range in any experimental conditions. Increases in low and middle gamma band activity have also been reported during tasks requiring verbal episodic memory (Schack & Weiss, 2005) and spatial working memory of auditory objects (Lutzenberger, Ripper, Busse, Birbaumer, & Kaiser, 2002). The fact that gamma band increased only for the native language rules out that working memory maintenance related to the auditory matching-to-sample task. Crucially, our results also do not resemble the topography, time course, or frequency content of artifactual gamma band activity, which is because of microsaccadic artifacts (Yuval-Greenberg, Tomer, Keren, Nelken, & Deouell, 2008; see also Melloni, Schwiedrzik, Rodriguez, & Singer, 2009).

In our study, middle gamma band activity decreased shortly before the end of the utterance. However, interpretations of a given spoken sentence can go beyond its end and may involve further integrative processing of speech. However, we believe that the task imposed, which did not ask for the analysis of the semantic, syntactic, or pragmatic content of the utterances but, for the detection of meaningless chunks, may have restricted the interpretation of the sentence to the anticipated and most straightforward one. Nonetheless, in natural language, brain oscillations may reflect the manifold interpretations that any given sentence can have, particularly in a discursive context, and could possibly extend beyond its end.

Mechanistic models addressing how information provided by phonemes, words, and context is integrated into a single linguistic meaning from normal speech are rare. Moreover, previous studies on how sentence level meaning is accessed from speech have mostly relied on violation paradigms, reading protocols, and the analysis of ERPs, preventing a direct comparison with our study. However, our results fit well with those predicted by the binding by synchrony hypothesis, originally proposed for the visual system. In this proposal, gamma band synchronization serves as a neural mechanism for the integration of signals, separated in space and time, yielding a unified sensory experience of a meaningful object (Fries, 2005; Varela et al., 2001; Singer, 1999; Singer & Gray, 1995). Increases in gamma band synchronization have been observed for the representation of coherent objects within (Tallon-Baudry & Bertrand, 1999) and between (Schneider et al., 2008; Widmann et al., 2007) sensory modalities, denoting the prominent role of gamma oscillations in unimodal and multimodal representations. Previous studies on language comprehension from reading report that low gamma band responses increase when semantic unification is possible (Penolazzi et al., 2009; Hald et al., 2006) and when pragmatic information, that is, world knowledge, is integrated (Hagoort et al., 2004). These results speak for a unification process that takes several sources of information into account (e.g., word meaning, world knowledge, listener's expectation). Gamma band synchronization for speech is widely observed over the scalp, suggesting that different neuronal assemblies involved in the processing of the different features of speech objects are interacting during speech temporal binding. In our study, this type of integration was only possible when listeners accessed to meaningful units from their native language.

Binding for sentence comprehension involves two components: a spatial and a temporal one. Binding in space refers to the integration of distributed neuronal ensembles coding for the meaning of individual words. Binding in time refers to the fact that language unravels sequentially, and to understand the meaning of a sentence, information needs to be bound in time. Thus, the distributed neural ensembles have to be integrated over space and time for sentence level meaning to emerge. There is evidence that gamma oscillations bind information in space across both short and long distances (Varela et al., 2001). For the integration in time, we propose that the brain dynamically integrates the incoming words in the context of the previous ones by exploiting predictions. In particular, we hypothesize that, as words unravel, they preactivate the semantic network most closely related to them. Preactivations reflecting predictions have been proposed to take place in the gamma frequency band (Engel, Fries, & Singer, 2001) in the form of subthreshold oscillations. Predictions serve the purpose to facilitate information processing (Melloni, Schwiedrzik, Müller, Rodriguez, & Singer, 2011). Thus, when the incoming information matches the prediction, resonance occurs; hence, information is selected, and an increase in gamma oscillation is observed (Grossberg, 2009). When unpredictable words appear, they do not resonate, which is seen as decreases in gamma oscillations. In summary, we propose that, as information arrives sequentially, the new information joins the already synchronously oscillating assembly, which now represents the compound and current meaning. Previous studies on reading have shown that already at 90 msec predictable words differ from unpredictable ones (Dambacher, Rolfs, Göllner, Kliegl, & Jacobs, 2009), further confirming that language comprehension takes places at a remarkable speed, which in turn necessitates neuronal mechanism that support these temporal dynamics. Gamma oscillation could serve that purpose because they represent fast integration windows and by adjusting the phase within and between populations of neurons allow for effective integration or routing of information (Fries, 2009).

Contrary to previous studies, we did not find significant differences in low gamma band (i.e., 20–50 Hz) related to speech processing (Bastiaansen et al., 2010; Penolazzi et al., 2009; Giraud et al., 2007). However, low, middle, and high gamma subbands may exhibit different patterns of responses and have different neural origins (for a review, see Roopun et al., 2008). Although the differences between our study and previous reports may be because of the specifics of the task and study design (spoken vs. written stimuli, no violation vs. violation paradigms, and cross-linguistic vs. only native language), it is important to note that also the physical properties of the active neural assemblies (such as size and geometry: von Stein & Sarnthein, 2000) or which cortical layers are mostly involved (Roopun et al., 2006; Cunningham et al., 2004) can affect the frequency of the network oscillations.

Theta, Alpha, and Beta Range Oscillations

We observed a significant increase in theta power in all experimental conditions, which was greater for forward than backward speech in both groups. The theta results support the AST proposal for syllable sampling (Poeppel, 2003), although we did not find right hemispheric dominance. The increase in theta band observed for Japanese utterances might be ascribed to the possibility that Japanese morae were perceived as syllables. In fact, a mora is hardly processed by nonnative Japanese speakers (Menning, Imaizumi, Zwitserlood, & Pantev, 2002). Greater increases in theta band for forward as compared with backward utterances suggest that syllabic structure is sampled with more difficulty but is not absent in backward speech.

Previous studies have interpreted the increases in theta power as a biological marker of lexical-semantic access. For example, during the comprehension of a written story displayed word per word, open-class (e.g., nouns, verbs, and adjectives) and close-class (e.g., articles, determiners, and prepositions) words elicit different patterns of increases in theta band power (Bastiaansen et al., 2005). In contrast, our results did not show an exclusive increase in theta band power for the experimental condition when open class words were entirely accessible, that is, forward native language, but also for experimental conditions where lexical access was completely unfeasible.

Theta band activity has also been associated with working memory load. In fact, theta power (Bastiaansen, van Berkum, & Hagoort, 2002a) and theta coherence (Weiss & Mueller, 2003) linearly increase over time when well-formed written or auditory sentences are perceived. Furthermore, during the sequential reading of sentences, theta coherence increases after the detection of a syntactic (Bastiaansen, van Berkum, & Hagoort, 2002b) and semantic (Hald et al., 2006; Hagoort et al., 2004) violation. Likewise, during spoken sentence processing, theta coherence is significantly greater during and after the perception of subject–subject than subject–object relative clauses from relative utterances (Weiss et al., 2005), supporting the idea that more complex sentences require more verbal working memory resources. In our study, we observed a sustained increase in theta band from the beginning to the end of the utterances, without significant differences across either experimental conditions or groups. In contrast, in the conditions in which working memory demands were highest, that is, backward speech, the increase in theta activity was smaller, suggesting that theta oscillations did not reflect verbal working memory effort in our study.

Concerning the alpha band, we found significantly higher power near the end as compared with the onset of the utterances in all experimental conditions, suggesting that attention demands were initially greater and then decreased by the end of any utterance. This pattern of initial alpha suppression may reflect the functioning of a general attention mechanism for auditory stimuli. The tendency to present longer alpha suppression in backward as compared with forward utterances supports the idea that backward speech involved higher attention demands. Our results agree with previous reports showing that alpha activity is sensitive to general task demands such as attentional processes and that it decreases when the task requires more attentional resources (Jensen, Gelfand, Kounios, & Lisman, 2002; Klimesch, 1999). Alternatively, the increment in alpha oscillations could also reflect active inhibition of areas unrelated to the current task (Jensen & Mazaheri, 2010).

We did not find significant differences in beta band power. It has been reported that beta power linearly increases while sentences evolve in time and that this increase is disrupted by syntactic violations (Bastiaansen et al., 2010). Such increases in beta band activity have been interpreted as reflecting a step of the semantic/syntactic unification (Rohm et al., 2001) and also an increment in semantic working memory demands (Haarmann et al., 2002) during sentence processing. Our data did not support these proposals, however, given that we used different experimental paradigm, further studies are necessary to clarify the nature of this discrepancy.

In summary, our study contributes to a better understanding of the temporal binding problem for speech and may have important consequences for current neurocognitive models of speech processing. First, we report a plausible neuronal mechanism for accessing sentence level meaning from spoken sentences, that is, gamma band synchrony for the native language. This has neither been observed nor implicated in previous speech studies. Second, our results provide empirical evidence of a double dissociation in middle gamma band power: In fact, when using the same set of utterances, we found that middle gamma band power increased for Italian speakers when they heard forward Italian, but not forward Spanish, whereas the opposite was found for Spanish speakers. We interpret that switch in the gamma band patterns as reflecting the fact that only when native speakers listen to their native language they can integrate words into a meaningful message. In contrast, both groups of participants showed similar time courses in lower frequencies, that is, the theta band, when listening to both Italian and Spanish languages. We interpret this similar response in low frequencies as reflecting processing of speech units such as syllables that lack meaning and are accessible from the unknown languages we used (i.e., Spanish and Italian largely share their phonemic and syllabic repertoire). Third, our results support the extension of the binding by synchrony hypothesis to speech comprehension. The cross-linguistic nature of this study provides strong evidence for an integrative mechanism underpinning access to the plain meaning of spoken sentences. Thus, our study is the first to directly show an increase in middle gamma band synchrony as a neural mechanism for the processing and comprehension of spoken sentences in the native language.

Acknowledgments

We thank Eugenio Rodriguez for providing analysis tools, Caspar M. Schwiedrzik for helping in editing the manuscript, and five anonymous reviewers for their insightful comments during the revision of the manuscript. This work was supported by grants Fondecyt 1090662, PIA-Conicyt-CIE-05, and PBCT-PSD72 to M. P.

Reprint requests should be sent to Marcela Peña, Cognitive Neuroscience Sector, Scuola Internazionale Superiore di Studi Avanzati, via Bonomea 265, 34136 Trieste, Italy, or via e-mail: pena@sissa.it.

REFERENCES

REFERENCES
Babiloni
,
C.
,
Miniussi
,
C.
,
Babiloni
,
F.
,
Carducci
,
F.
,
Cincotti
,
F.
,
Del Percio
,
C.
,
et al
(
2004
).
Sub-second “temporal attention” modulates alpha rhythms. A high-resolution EEG study.
Brain Research, Cognitive Brain Research
,
19
,
259
268
.
Bastiaansen
,
M.
,
Magyari
,
L.
, &
Hagoort
,
P.
(
2010
).
Syntactic unification operations are reflected in oscillatory dynamics during on-line sentence comprehension.
Journal of Cognitive Neuroscience
,
22
,
1333
1347
.
Bastiaansen
,
M.
,
Oostenveld
,
R.
,
Jensen
,
O.
, &
Hagoort
,
P.
(
2008
).
I see what you mean: Theta power increases are involved in the retrieval of lexical semantic information.
Brain Language
,
106
,
15
28
.
Bastiaansen
,
M.
,
van Berkum
,
J. J.
, &
Hagoort
,
P.
(
2002a
).
Event-related theta power increases in the human EEG during online sentence processing.
Neuroscience Letters
,
323
,
13
16
.
Bastiaansen
,
M.
,
van Berkum
,
J. J.
, &
Hagoort
,
P.
(
2002b
).
Syntactic processing modulates the theta rhythm of the human EEG.
Neuroimage
,
17
,
1479
1492
.
Bastiaansen
,
M.
,
Van der Linden
,
M.
,
ter Keurs
,
M.
,
Dijkstra
,
T.
, &
Hagoort
,
P.
(
2005
).
Theta responses are involved in lexico-semantic retrieval during language processing.
Journal of Cognitive Neurosciences
,
17
,
530
541
.
Braeutigam
,
S.
,
Bailey
,
A. J.
, &
Swithenby
,
S. J.
(
2001
).
Phase-locked gamma band responses to semantic violation stimuli.
Brain Research, Cognitive Brain Research
,
10
,
365
377
.
Buiatti
,
M.
,
Peña
,
M.
, &
Dehaene-Lambertz
,
G.
(
2009
).
Investigating the neural correlates of continuous speech computation with frequency-tagged neuroelectric responses.
Neuroimage
,
44
,
509
519
.
Culicover
,
P. W.
, &
Jackendoff
,
R.
(
2006
).
The simpler syntax hypothesis.
Trends in Cognitive Sciences
,
10
,
413
418
.
Cunningham
,
M. O.
,
Whittington
,
M. A.
,
Bibbig
,
A.
,
Roopun
,
A.
,
LeBeau
,
F. E.
,
Vogt
,
A.
,
et al
(
2004
).
A role for fast rhythmic bursting neurons in cortical gamma oscillations in vitro.
Proceedings of the National Academy of Sciences, U.S.A.
,
101
,
7152
7157
.
Dambacher
,
M.
,
Rolfs
,
M.
,
Göllner
,
K.
,
Kliegl
,
R.
, &
Jacobs
,
A. M.
(
2009
).
Event-related potentials reveal rapid verification of predicted visual input.
PLoS One
,
4
,
e5047
.
Engel
,
A. K.
,
Fries
,
P.
, &
Singer
,
W.
(
2001
).
Dynamic predictions: Oscillations and synchrony in top–down processing.
Nature Reviews Neuroscience
,
2
,
704
716
.
Engel
,
A. K.
, &
Singer
,
W.
(
2001
).
Temporal binding the neural correlates of sensory awareness.
Trends in Cognitive Sciences
,
5
,
16
25
.
Frazier
,
L.
(
1987
).
Theories of sentence processing.
In J. Garfield (Ed.)
,
Modularity in knowledge representation and natural-language processing
(pp.
291
307
).
Cambridge, MA
:
MIT Press
.
Friederici
,
A. D.
(
2002
).
Towards a neural basis of auditory sentence processing.
Trends in Cognitive Sciences
,
6
,
78
84
.
Friederici
,
A. D.
, &
Weissenborn
,
J.
(
2007
).
Mapping sentence form onto meaning: The syntax-semantic interface.
Brain Research
,
1146
,
50
58
.
Fries
,
P.
(
2005
).
A mechanism for cognitive dynamics: Neuronal communication through neuronal coherence.
Trends in Cognitive Sciences
,
9
,
474
480
.
Fries
,
P.
(
2009
).
Neuronal gamma-band synchronization as a fundamental process in cortical computation.
Annual Review of Neuroscience
,
32
,
209
224
.
Giraud
,
A.
,
Kleinschmidt
,
A.
,
Poeppel
,
D.
,
Lund
,
T. E.
,
Frackowiak
,
R. S. J.
, &
Laufs
,
H.
(
2007
).
Endogenous cortical rhythms determine cerebral specialization for speech perception production.
Neuron
,
56
,
1127
1134
.
Gorrell
,
P.
(
1998
).
Syntactic analysis and reanalysis in sentence processing.
In J. D. Fodor & F. Ferreira (Eds.)
,
Reanalysis in sentence processing
(pp.
201
245
).
Dordrecht, the Netherlands
:
Kluwer Academic Publishers
.
Greenberg
,
S.
,
Carvey
,
H.
,
Hitchcock
,
L.
, &
Chang
,
S.
(
2003
).
Temporal properties of spontaneous speech-A syllable-centric perspective.
Journal of Phonetics
,
31
,
465
485
.
Grossberg
,
S.
(
2009
).
Cortical and subcortical predictive dynamics and learning during perception, cognition, emotion and action.
Philosophical Transactions of the Royal Society of London, Series B, Biological Sciences
,
364
,
1223
1234
.
Haarmann
,
H. J.
,
Cameron
,
K. A.
, &
Ruchkin
,
D. S.
(
2002
).
Neural synchronization mediates on-line sentence processing EEG coherence evidence from filler-gap constructions.
Psychophysiology
,
39
,
820
825
.
Hagoort
,
P.
(
2005
).
On Broca, brain, and binding: A new framework.
Trends in Cognitive Sciences
,
9
,
416
423
.
Hagoort
,
P.
(
2008
).
The fractionation of spoken language understanding by measuring electrical and magnetic brain signals.
Philosophical Transactions of the Royal Society of London, Series B, Biological Sciences
,
363
,
1055
1106
.
Hagoort
,
P.
,
Hald
,
L.
,
Bastiaansen
,
M.
, &
Petersson
,
K. M.
(
2004
).
Integration of word meaning and world knowledge in language comprehension.
Science
,
304
,
438
441
.
Hagoort
,
P.
, &
van Berkum
,
J.
(
2007
).
Beyond the sentence given.
Philosophical Transactions of the Royal Society of London, Series B, Biological Sciences
,
362
,
801
811
.
Hald
,
L. A.
,
Bastiaansen
,
M. C.
, &
Hagoort
,
P.
(
2006
).
EEG theta and gamma responses to semantic violations in online sentence processing.
Brain Language
,
96
,
90
105
.
International Phonetic Association
. (
1999
).
Handbook of the International Phonetic Association: A guide to the use of the International Phonetic Alphabet.
Cambridge, UK
:
Cambridge University Press
.
Jackendoff
,
R.
(
2002
).
Foundations of language: Brain, meaning, grammar, evolution.
Oxford, UK
:
Oxford University Press
.
Jensen
,
O.
,
Gelfand
,
J.
,
Kounios
,
J.
, &
Lisman
,
J. E.
(
2002
).
Oscillations in the alpha band (9-12 Hz) increase with memory load during retention in a short-term memory task.
Cerebral Cortex
,
12
,
877
882
.
Jensen
,
O.
, &
Mazaheri
,
A.
(
2010
).
Shaping functional architecture by oscillatory alpha activity: Gating by inhibition.
Frontiers in Human Neuroscience
,
4
,
1
8
.
Klimesch
,
W.
(
1999
).
EEG alpha and theta oscillations reflect cognitive and memory performance: A review and analysis.
Brain Research Reviews
,
29
,
169
195
.
Kriegeskorte
,
N.
,
Simmons
,
W. K.
,
Bellgowan
,
P. S.
, &
Baker
,
C. I.
(
2009
).
Circular analysis in systems neuroscience: The dangers of double dipping.
Nature Neurosciences
,
12
,
535
540
.
Luo
,
H.
, &
Poeppel
,
D.
(
2007
).
Phase patterns of neuronal responses reliably discriminate speech in human auditory cortex.
Neuron
,
54
,
1001
1010
.
Lutzenberger
,
W.
,
Ripper
,
B.
,
Busse
,
L.
,
Birbaumer
,
N.
, &
Kaiser
,
J.
(
2002
).
Dynamics of gamma-band activity during an audiospatial working memory task in humans.
Journal of Neurosciences
,
22
,
5630
5638
.
Marslen-Wilson
,
W. D.
, &
Tyler
,
L. K.
(
1980
).
The temporal structure of spoken language understanding.
Cognition
,
8
,
1
71
.
McClelland
,
J. L.
,
St. John
,
M.
, &
Taraban
,
R.
(
1989
).
Sentence comprehension: A parallel distributed processing approach.
Language and Cognitive Processes
,
4
,
287
336
.
McQueen
,
J. M.
,
Cutler
,
A.
, &
Norris
,
D.
(
2003
).
Flow of information in the spoken word recognition system.
Speech Communication
,
41
,
257
270
.
Melloni
,
L.
,
Molina
,
C.
,
Peña
,
M.
,
Torres
,
D.
,
Singer
,
W.
, &
Rodriguez
,
E.
(
2007
).
Synchronization of neural activity across cortical areas correlates with conscious perception.
Journal of Neuroscience
,
27
,
2858
2865
.
Melloni
,
L.
,
Schwiedrzik
,
C. M.
,
Müller
,
N.
,
Rodriguez
,
E.
, &
Singer
,
W.
(
2011
).
Expectations change the signatures and timing of electrophysiological correlates of perceptual awareness.
Journal of Neuroscience
,
31
,
1386
1396
.
Melloni
,
L.
,
Schwiedrzik
,
C. M.
,
Rodriguez
,
E.
, &
Singer
,
W.
(
2009
).
(Micro)saccades, corollary activity and cortical oscillations.
Trends in Cognitive Sciences
,
13
,
239
245
.
Menning
,
H.
,
Imaizumi
,
S.
,
Zwitserlood
,
P.
, &
Pantev
,
C.
(
2002
).
Plasticity of the human auditory cortex induced by discrimination learning of non-native, mora-timed contrasts of the Japanese language.
Learning & Memory
,
9
,
253
267
.
Molinaro
,
N.
,
Barber
,
H. A.
, &
Carreiras
,
M.
(
2011
).
Grammatical agreement processing in reading: ERP findings and future directions.
Cortex
,
47
,
908
930
.
Nespor
,
M.
, &
Vogel
,
I.
(
1986
).
Prosodic phonology.
Dordrecht, NL
:
Foris Publications Holland
.
Palva
,
S.
, &
Palva
,
J. M.
(
2007
).
New vistas for alpha-frequency band oscillations.
Trends in Neurosciences
,
30
,
150
158
.
Penolazzi
,
B.
,
Angrilli
,
A.
, &
Job
,
R.
(
2009
).
Gamma EEG activity induced by semantic violation during sentence reading.
Neuroscience Letters
,
465
,
74
78
.
Poeppel
,
D.
(
2003
).
The analysis of speech in different temporal integration windows: Cerebral lateralization as “asymmetric sampling in time”.
Speech Communication
,
41
,
245
255
.
Ramus
,
F.
,
Nespor
,
M.
, &
Mehler
,
J.
(
1999
).
Correlates of linguistic rhythm in the speech signal.
Cognition
,
73
,
265
292
.
Rodriguez
,
E.
,
Lachaux
,
J. P.
,
Martinerie
,
J.
,
Renault
,
B.
, &
Varela
,
F. J.
(
1999
).
Perception's shadow: Long-distance synchronization of human brain activity.
Nature
,
397
,
430
433
.
Rohm
,
D.
,
Klimesch
,
W.
,
Haider
,
H.
, &
Doppelmayr
,
M.
(
2001
).
The role of theta and alpha oscillations for language comprehension in the human electroencephalogram.
Neuroscience Letters
,
310
,
137
140
.
Roopun
,
A. K.
,
Kramer
,
M. A.
,
Carracedo
,
L. M.
,
Kaiser
,
M.
,
Davies
,
C. H.
,
Traub
,
R. D.
,
et al
(
2008
).
Temporal interactions between cortical rhythms.
Frontiers in Neurosciences
,
2
,
145
154
.
Roopun
,
A. K.
,
Middleton
,
S. J.
,
Cunningham
,
M. O.
,
LeBeau
,
F. E.
,
Bibbig
,
A.
,
Whittington
,
M. A.
,
et al
(
2006
).
A beta2-frequency (20-30 Hz) oscillation in nonsynaptic networks of somatosensory cortex.
Proceedings of the National Academy of Sciences, U.S.A.
,
103
,
15646
15650
.
Schack
,
B.
, &
Weiss
,
S.
(
2005
).
Quantification of phase synchronization phenomena and their importance for verbal memory process.
Biological Cybernetics
,
92
,
275
287
.
Schneider
,
T. R.
,
Debener
,
S.
,
Oostenveld
,
R.
, &
Engel
,
A. K.
(
2008
).
Enhanced EEG gamma-band activity reflects multisensory semantic matching in visual-to-auditory object priming.
Neuroimage
,
42
,
1244
1254
.
Shahin
,
A. J.
,
Picton
,
T. W.
, &
Miller
,
L. M.
(
2009
).
Brain oscillations during semantic evaluation of speech.
Brain and Cognition
,
70
,
259
266
.
Simos
,
P. G.
,
Papanikolaou
,
E.
,
Sakkalis
,
E.
, &
Micheloyannis
,
S.
(
2002
).
Modulation of gamma-band spectral power by cognitive task complexity.
Brain Topography
,
14
,
191
196
.
Singer
,
W.
(
1999
).
Neuronal synchrony: A versatile code for the definition of relations?
Neuron
,
24
,
49
25
.
Singer
,
W.
(
2002
).
Cognition, gamma oscillations and neuronal synchrony.
In R. C. Reisin, M. R. Nuwer, M. Hallett, & C. Medina (Eds.)
,
Advances in clinical neurophysiology
(pp.
3
22
).
Amsterdam
:
Elsevier
.
Singer
,
W.
, &
Gray
,
C. M.
(
1995
).
Visual feature integration the temporal correlation hypothesis.
Annual Review of Neuroscience
,
18
,
555
586
.
Tallon-Baudry
,
C.
(
2009
).
The roles of gamma-band oscillatory synchrony in human visual cognition.
Frontiers in Bioscience
,
14
,
321
332
.
Tallon-Baudry
,
C.
, &
Bertrand
,
O.
(
1999
).
Oscillatory gamma activity in humans and its role in object representation.
Trends in Cognitive Sciences
,
3
,
151
162
.
Van Berkum
,
J. J.
(
2008
).
Understanding sentences in context. What brain waves can tell us.
Current Directions in Psychological Science
,
17
,
376
380
.
Van Berkum
,
J. J. A.
,
Brown
,
C. M.
,
Zwitserlood
,
P.
,
Kooijman
,
V.
, &
Hagoort
,
P.
(
2005
).
Anticipating upcoming words in discourse: Evidence from ERPs and reading times.
Journal of Experimental Psychology: Learning, Memory, and Cognition
,
31
,
443
467
.
Varela
,
F.
,
Lachaux
,
J. P.
,
Rodriguez
,
E.
, &
Martinerie
,
J.
(
2001
).
The brainweb: Phase synchronization and large-scale integration.
Nature Reviews Neuroscience
,
2
,
229
239
.
von Stein
,
A.
, &
Sarnthein
,
J.
(
2000
).
Different frequencies for different scales of cortical integration: From local gamma to long range alpha/theta synchronization.
International Journal of Psychophysiology
,
38
,
301
313
.
Weiss
,
S.
, &
Mueller
,
H. M.
(
2003
).
The contribution of EEG coherence to the investigation of language.
Brain Language
,
85
,
325
343
.
Weiss
,
S.
,
Mueller
,
H. M.
,
Schack
,
B.
,
King
,
J. W.
,
Kutas
,
M.
, &
Rappelsberger
,
P.
(
2005
).
Increased neuronal communication accompanying sentence comprehension.
International Journal of Psychophysiology
,
57
,
129
141
.
Widmann
,
A.
,
Gruber
,
T.
,
Kujala
,
T.
,
Tervaniemi
,
M.
, &
Schröger
,
E.
(
2007
).
Binding symbols sounds: Evidence from event-related oscillatory gamma-band activity.
Cerebral Cortex
,
17
,
2696
2702
.
Yamagishi
,
N.
,
Goda
,
N.
,
Callan
,
D. E.
,
Anderson
,
S. J.
, &
Kawato
,
M.
(
2005
).
Attentional shifts towards an expected visual target alter the level of alpha-band oscillatory activity in the human calcarine cortex.
Brain Research, Cognitive Brain Research
,
25
,
799
809
.
Yantis
,
S.
, &
Jonides
,
J.
(
1984
).
Abrupt visual onsets and selective attention: Evidence from visual search.
Journal of Experimental Psychology: Human Perception and Performance
,
10
,
601
621
.
Yuval-Greenberg
,
S.
,
Tomer
,
O.
,
Keren
,
A. S.
,
Nelken
,
I.
, &
Deouell
,
L. Y.
(
2008
).
Transient induced gamma-band response in EEG a manifestation of miniature saccades.
Neuron
,
58
,
429
441
.