Spoken word recognition is achieved via competition between activated lexical candidates that match the incoming speech input. The competition is modulated by prelexical cues that are important for segmenting the auditory speech stream into linguistic units. One such prelexical cue that listeners rely on in spoken word recognition is phonotactics. Phonotactics defines possible combinations of phonemes within syllables or words in a given language. The present study aimed at investigating both temporal and topographical aspects of the neuronal correlates of phonotactic processing by simultaneously applying ERPs and functional near-infrared spectroscopy (fNIRS). Pseudowords, either phonotactically legal or illegal with respect to the participants' native language, were acoustically presented to passively listening adult native German speakers. ERPs showed a larger N400 effect for phonotactically legal compared to illegal pseudowords, suggesting stronger lexical activation mechanisms in phonotactically legal material. fNIRS revealed a left hemispheric network including fronto-temporal regions with greater response to phonotactically legal pseudowords than to illegal pseudowords. This confirms earlier hypotheses on a left hemispheric dominance of phonotactic processing most likely due to the fact that phonotactics is related to phonological processing and represents a segmental feature of language comprehension. These segmental linguistic properties of a stimulus are predominantly processed in the left hemisphere. Thus, our study provides first insights into temporal and topographical characteristics of phonotactic processing mechanisms in a passive listening task. Differential brain responses between known and unknown phonotactic rules thus supply evidence for an implicit use of phonotactic cues to guide lexical activation mechanisms.
When we listen to an auditory speech input, we have to segment it into linguistic units such as words. The core mechanisms of spoken word recognition are the activation and competition of word candidates within the mental lexicon (Norris, 1994). That means that as the speech signal unfolds, several words that match the incoming input are activated and compete for recognition. Beyond semantic context, the activation and competition processes are modulated by various information sources that listeners exploit on the prelexical level. A large body of literature has shown that listeners rely on cues such as stress, allophonic details, and phonotactic constraints to segment the incoming speech stream and improve word recognition (see McQueen, 2007, for an overview). In this context, phonotactic constraints describe the possible ordering of phonetic segments within morphemes, syllables, and words in a specific language (Trask, 1996). For example, /fl/ is a possible (i.e., legal) combination at the onset of a German word (e.g., Flasche, engl. bottle), whereas /tl/ is not (i.e., it is illegal). In other languages, for example, in Slovak, /tl/ can form a legal word onset (e.g., tlak, engl. pressure). Similarly in Dutch, the consecutive consonants /mr/ are phonotactically illegal both at syllable onset and offset, and therefore, a syllable boundary is required between /m/ and /r/. Accordingly, McQueen (1998) found that real words (e.g., the Dutch content word rok, engl. skirt) were detected more easily in nonsense sequences (i.e., pseudowords) such as [fimrok], in which they were aligned (e.g., in [fim.rok], period indicates a boundary) than when they were misaligned with a phonotactic boundary (e.g., in [fi.drok], where /dr/ is legal at syllable onset). These results suggest that the language-specific knowledge of the sequence constraints can be used to signal likely word boundaries in the speech stream and, as such, lead to a stronger activation of the lexical entry [/rok/]. In sum, phonotactic rules constitute an essential prelexical source of information for the detection of word boundaries as well as of smaller units such as syllables and morphemes. Thus, they aid rapid and efficient language comprehension.
As mentioned above, phonotactic rules are language-specific. Hence, they are important during both first-language acquisition in infancy (e.g., Friederici, 2005; Jusczyk, 1999) as well as second-language acquisition (e.g., Weber & Cutler, 2006). Several behavioral studies investigated how phonotactic rules are perceived by infants (e.g., Mattys & Jusczyk, 2001; Friederici & Wessels, 1993). Converging evidence indicates that 9-month-old infants prefer listening to sound sequences which are phonotactically legal with respect to their native language (Mattys & Jusczyk, 2001; Friederici & Wessels, 1993). In adult listeners, behavioral studies on native and nonnative phonotactic processing found an influence of phonotactic rules of the native language (L1) on the processing of a second language (L2) (Weber & Cutler, 2006).
Apart from behavioral evidence, research on the neurophysiological underpinnings of phonotactic processing has been conducted over the last decades. Most of these studies, though, focused on the timing of phonotactic processing by monitoring fast on-line processing mechanisms by means of the electroencephalogram (EEG). In particular, ERPs have been shown to be suitable for the investigation of lexical and prelexical processing. One ERP component identified in relation to lexico-semantic processes is the N400 component, a negative shift occurring at about 400 msec after stimulus onset (for review, see Lau, Phillips, & Poeppel, 2008; Kutas & Federmeier, 2000). This electrophysiological component has been found to have larger amplitudes for pseudowords compared to real words (Chwilla, Brown, & Hagoort, 1995; Soares, Collet, & Duclaux, 1991; Bentin, McCarthy, & Wood, 1985). An increased N400 is assumed to reflect rather a search for lexico-semantic categorization than its success. Therefore, an increased amplitude is seen for pseudowords indicating a more laborious and unsuccessful lexico-semantic search for a possible candidate in the lexicon. Hence, pseudowords, which are illegal at the prelexical level by containing illegal phonotactic cues, can be expected to elicit a smaller N400, because phonotactically illegal pseudowords are classified as nonwords already at the prelexical level. Indeed, one ERP study investigating the specific prelexical cue of phonotactics found a stronger N400 effect for phonotactically legal (with respect to the participants' native language) compared to illegal pseudowords in adult participants (Friedrich & Friederici, 2005). This finding indicates that in a context of legal and illegal phonotactic rules, lexical search mechanisms can be better established for those pseudowords which follow native phonotactic rules. Although retrieval of a lexical entry from the lexicon will be unsuccessful for both kinds of pseudowords, the prelexical cue of phonotactic legality more strongly initiates search for a lexical entry.
So far, brain regions involved in phonotactic processing are rather obscure. However, the “Dynamic Dual Pathway Model” (Friederici & Alter, 2004) allows for a general prediction of how phonotactic information is processed based on its fundamental assumptions about the hemispheric specialization of segmental and suprasegmental linguistic information. Segmental aspects refer to phonological, syntactic, lexical, and semantic information. For this kind of information, the model postulates that processing occurs predominantly in the left hemisphere. On the other hand, suprasegmental information, including prosodic features, is assumed to be processed preferentially in the right hemisphere. This dissociation was successfully supported by quite a number of consistent neuroimaging evidence. Functional imaging studies investigating phonological cues in adults found activations in the left inferior frontal gyrus (IFG) and in the left superior temporal gyrus (STG), suggesting phonological rule processing as well as identification and storage of phonological features, respectively, and a left hemispheric network for processing semantics (for review, see Lau et al., 2008; Vigneau et al., 2006; Bookheimer, 2002). Vigneau et al. (2006) conducted a meta-analysis on 45 imaging studies on phonological processing and 67 studies on semantic processing and found the anterior part of the left STG to be involved in both auditory phonological and semantic processing. By contrast, prosodic features such as intonation were found to be predominantly processed by the right anterior STG and IFG (Zhao et al., 2008; Hesling, Clément, Bordessoules, & Allard, 2005; Meyer, Alter, Friederici, Lohmann, & von Cramon, 2002).
Another interesting aspect in relation to lexical processing refers to the localization of mechanisms associated with the processing of pseudowords in contrast to real words. Neuroimaging studies investigating this specific issue found greater blood oxygen level dependent (BOLD) signal changes for pseudowords in contrast to real words in the left IFG (Xiao et al., 2005; Mechelli, Gorno-Tempini, & Price, 2003) as well as in left STG (Kotz, Cappa, von Cramon, & Friederici, 2002).
Although neuroimaging evidence is available for lexical and some aspects of prelexical processing (Raettig & Kotz, 2008), no study specifically addressed the cerebral correlates of phonotactic processing. For that reason, our study aimed to examine both timing and lateralization of the neuronal underpinnings of phonotactic processing. To this end, we combined two neurophysiological methods, namely, EEG and functional near-infrared spectroscopy (fNIRS). Methodologically, the simultaneous application of these two methods is advantageous and undemanding because there is no interference between the optical and the electrophysiological parameters as opposed to combined EEG and fMRI measurements. Additionally, fNIRS, like EEG, is a silent method and allows for fine-grained auditory presentation in a comparatively “natural” auditory environment. The integration of these two methods allows the combination of an electrophysiological signal with excellent temporal resolution with a vascular signal characterized by good spatial resolution. Despite the abovementioned extremely relevant advantages of fNIRS, it should be noted that the methodology has some relevant limitations. When compared especially to fMRI, the much coarser spatial resolution on the range of centimeters clearly limits topographical ascription of the cortical activations found. Beyond the lateral resolution on the range of 1–2 cm near-infrared light penetrates biological tissue up to 2–3 cm. Thus, the sensitivity of the methodology is limited to the surface of the brain. A fine-grained differentiation between adjacent cortical areas is not possible with standard procedures. Similar to EEG spatial resolution can only be partially enhanced by larger or denser probe arrays (but see Zeff, White, Dehghani, Schlaggar, & Culver, 2007). In research targeting subtle auditory and phonemic contrasts, however, the advantages of fNIRS are apparent and are successfully used by an increasing number of groups (Lloyd-Fox, Blasi, & Elwell, 2010).
The stimuli that were presented to native German adult participants were all monosyllabic pseudowords, and each pseudoword onset varied with respect to German phonotactic legality. Unlike the study by Friedrich and Friederici (2005), the illegal pseudowords in this study were controlled for their legality in a different language, namely, Slovak (a Slavic language). Participants listened passively to the stimuli and did not have to perform any task. Thus, the processing of phonotactic cues is assumed to occur implicitly. Based on the abovementioned literature, we expected a stronger N400 effect for phonotactically legal compared to illegal pseudowords for the EEG (Friedrich & Friederici, 2005). According to the Dynamic Dual Pathway Model and neuroimaging studies on phonological processing, phonotactic cues representing segmental features are assumed to be processed predominantly by the left hemisphere. Furthermore, phonotactic rules trigger (pre)lexical processing or at least the initiation of lexical processing, even for pseudowords. Therefore, we hypothesized a stronger left hemispheric activation of language areas in response to phonotactic legality.
Additionally, all pseudowords were presented in two different speech modes: adult-directed speech (ADS) and infant-directed speech (IDS). IDS most often used to investigate language acquisition is characterized by an exaggerated pitch, longer duration, and high phonological clarity (Soderstrom, 2007). IDS represents a common speech mode adopted by parents talking to their children. The present study included IDS containing excessive suprasegmental (prosodic) features as it may lead to a differential lateralization in contrast to ADS. Although this speech mode is predominantly used for language acquisition in infants, it may be relevant for phonetic training also in adults (Zhang et al., 2009). The prominent suprasegmental information in IDS might influence phonotactic processing mechanisms and thereby affect lateralization.
Twenty-five participants took part in the study. Three of them were excluded from further analyses because they were ambidextrous (2 participants) or had some knowledge in Slavic languages (1 participant). The remaining 22 participants (12 women) were all native German speakers aged 24.5 years, on average (range: 19 to 30 years), and right-handed (97.3%, assessed according to Oldfield, 1971). None of these participants had any knowledge in Slavic languages. Participants were university students who were paid for participating and had no known neurological or hearing deficits. All 22 participants were included into EEG analysis. For fNIRS analysis, another seven participants were excluded due to a low signal-to-noise ratio. Thus, 15 participants (8 women; 24.1 years on average, range: 19–30 years) entered the fNIRS analysis.
The stimuli comprised a total of 216 monosyllabic pseudowords. One hundred eight (108) pseudowords were constructed so that the first two consonants at the onset of the pseudowords were phonotactically legal with respect to German (e.g., /brop/), the native language of the participants. The rhymes of the pseudowords (i.e., the vowel and the coda consonant) were well formed in German. One hundred eight (108) pseudowords were phonotactically illegal with respect to German (e.g., /bzop/). Phonotactic illegality was controlled in such a way that the onset consonant clusters were legal with respect to one other language, namely, the Slovak language. This was done in order to prevent any mixing of phonotactic rules of other languages. Slovak is an Indo-European language belonging to the West Slavic language group and is characterized by a high number of consonant cluster possibilities at syllable onset (Hanulíková, 2009). It thus offers a greater variety of possible phoneme combinations in the onset position than does German.
The frequency of the consonant clusters was controlled for both legal (German) and illegal (Slovak) pseudowords by assessing the lemma frequency per million using the German Celex corpus (Baayen, Piepenbrock, & van Rijn, 1995) and the Slovak National Corpus (http://korpus.juls.savba.sk/index.en.html), respectively. In both languages, low and high frequent consonant clusters were equally distributed (German consonant clusters: mean frequency per million: 752.27, range: 104.39–2500.19; Slovak consonant clusters: mean frequency per million: 738.72, range: 0.42–8335.14). A paired t test between the frequency of German and Slovak consonant clusters did not show any significant difference [t(17) = 0.028, p = .978].
All pseudowords were spoken by a German/Slovak early bilingual female speaker (with no foreign accent in either of the two languages) in a soundproof booth and recorded digitally with 16 bits at a sampling rate of 44 kHz. Each of the 216 pseudowords was spoken in two different modes of speech presentation: ADS and IDS. This manipulation was introduced to test whether IDS and ADS may influence phonotactic processing mechanisms and thereby affect lateralization. IDS is characterized by an exaggerated pitch, longer duration, and high phonological clarity (for details, see Soderstrom, 2007). Recordings of IDS were performed from “imagined” interaction with infants. The rationale was that pseudoword stimuli (without any semantic context) may be subject to multiple confounds in a real interaction with infants. Although some studies found no clear acoustic differences between ADS and IDS when recordings were derived from imaginary instead of real interactions (Knoll & Scharrer, 2008), our stimuli clearly differed substantially with respect to several acoustic parameters which confirm that known differences between IDS and ADS were present in our material. A paired t test on the mean of the pitch over the whole pseudoword between pseudowords spoken in ADS and IDS revealed a statistically significant difference [t(107) = −70.78, p < .001], suggesting an increased average pitch for IDS in contrast to ADS. Furthermore, the pitch range (i.e., minimum and maximum pitch) over the whole pseudoword between ADS and IDS showed a statistically significant difference for minimum pitch [t(107) = −6.54, p < .001] as well as for maximum pitch [t(107) = −35.90, p < .001]. This analysis corroborates the finding of an increased pitch for IDS in contrast to ADS in our material. Because hyperarticulation of phonemes (most prominently with regard to the vowel space) rather than pitch per se is responsible for enhanced didactic information and greater phonological clarity in IDS in contrast to ADS, we additionally performed formant analyses. F1 and F2 formants largely define the F1/F2 vowel space. Hence, we extracted the F1 and F2 formants at the steady-state part of the vowels /a/, /i/, and /u/ (the “corner” vowels which occur in all the world's languages; cf. Kuhl et al., 1997) and plotted them in a vowel triangle (Uther, Knoll, & Burnham, 2007; Burnham, Kitamura, & Vollmer-Conna, 2002; Kuhl et al., 1997). Figure 1 clearly shows the expansion of the vowel triangle for IDS (area: 444,917 Hz2) in contrast to ADS (area: 171,981 Hz2), indicating the expected larger vowel space for IDS when compared to ADS.
A paired t test was performed for the formants F1 and F2 for the vowels /a/, /i/, and /u/ comparing ADS and IDS. These analyses showed for the vowel /a/ a statistically significant difference between ADS and IDS for the formant F1 [t(21) = −6.92, p < .001] and F2 [t(21) = −3.27, p = .004], suggesting an increased F1 and F2 for IDS in contrast to ADS. For the vowel /i/, analyses showed a significant difference between ADS and IDS for the formant F1 [t(21) = 11.87, p < .001] and F2 [t(21) = −4.63, p < .001], suggesting a decreased F1 and an increased F2 for IDS in contrast to ADS. For the vowel /u/, formant analyses also revealed a significant difference between ADS and IDS for the formant F1 [t(20) = 4.85, p < .001] and F2 [t(20) = 6.18, p < .001], suggesting a decreased F1 and F2 for IDS in contrast to ADS.
The mean duration of the pseudowords differs between ADS and IDS [t(107) = −40.66, p < .001] with longer duration of pseudowords presented in IDS.
To exclude sequencing effects, all pseudowords were pseudorandomized in 10 different randomization versions.
To ensure that legal and illegal consonant clusters were perceived by native German speakers as familiar or as unfamiliar (i.e., potentially belonging to German or to an unknown language), a rating was performed prior to the actual neurophysiological measurements. Twenty participants (10 women, average age = 28.9 years, range: 23–55 years, none of whom participated in the topography study) had to rate the 108 phonotactically legal and 108 phonotactically illegal pseudowords as to whether they might form a potentially existing German word or not. Paired t tests between legal and illegal pseudowords that were rated as potentially being German words [t(19) = 13.30, p < .001] or non-German words [t(19) = −13.29, p < .001] reached significance. These results indicate that phonotactically legal pseudowords were correctly identified as potentially belonging to German and that phonotactically illegal pseudowords were correctly identified as non-German. This provides evidence that the adopted material is suitable for the investigation of native and nonnative phonotactic rules.
Participants taking part in the neurophysiological experiment first filled out a questionnaire which, along with demographical and medical questions, also contained questions on language history and handedness (Oldfield, 1971). Before starting the experiment, participants were instructed to minimize head and body movements during the acoustic presentation of the pseudowords. To minimize eye movements, participants were instructed to fixate a central fixation cross. The stimuli were pseudorandomized and acoustically presented in an event-related fashion via stereo loudspeakers at a distance of 1 m in front of the participants and at a sound intensity of 70 dB. The length of the silent pauses in between the pseudowords was jittered by means of the optseq software (http://surfer.nmr.mgh.harvard.edu/optseq/) to allow for a deconvolution of the hemodynamic response function (Birn, Cox, & Bandettini, 2002). In our study, the silent interstimulus intervals ranged from 2 to 28 sec (mean = 5.6 sec; Figure 2).
Participants were instructed to listen passively to the stimuli. No response was required. The presentation of the stimuli was organized in two consecutive blocks lasting 21 min each separated by a small pause.
Functional Near-infrared Spectroscopy
Cortical oxygenation changes were measured by means of fNIRS. Due to the relative transparency of biological tissue to light in the near-infrared (600–900 nm), spectroscopic measurement of cortical concentration changes in oxygenated [oxy-Hb] and deoxygenated hemoglobin [deoxy-Hb] can be assessed from a depth of several centimeters when applied on the adult head. These oxygenation changes lead to a change in tissue absorption at different wavelengths, similar to the visible difference in color of venous and arterial blood. Concentration changes in the two chromophores are therefore calculated from changes in optical densities at 690 nm and 830 nm applying the modified Lambert–Beer law (Cope & Delpy, 1988; for more details about the methodology, see Obrig & Villringer, 2003).
The near-infrared spectroscopy system (ISS, Omniat Tissue Oxymeter; Champaign, IL, USA) used in the present study included eight light emitters and four light detectors. The interprobe distance between light emitters and detectors was set to 2.5 cm. This emitter–detector configuration resulted in six measured positions over each hemisphere, distributed over fronto-temporal (Positions 1 and 2), temporal (Positions 3 and 4), and temporo-parietal areas (Positions 5 and 6) (Figure 3). Fiber-optic bundles (emitter: 1 mm; detector: 3 mm in diameter) were fixed into a commercially available elastic EEG cap (EasyCap, Herrsching, Germany). Data were recorded at a sampling rate of 10 Hz.
The EEG was recorded with 59 Ag/AgCl electrodes (Brainproducts, Munich, Germany) placed into the EEG cap at the following positions: Fp1/2, Fpz, AF7/8, AF3/4, AFz, F9/10, F7/8, F5/6, F3/4, Fz, FT9/10, FT7/8, FC5/6, FC3/4, FCz, T7/8, C5/6, C3/4, Cz, TP9/10, TP7/8, CP5/6, CP3/4, CPz, P9/10, P7/8, P5/6, P3/4, Pz, PO7/8, PO3/4, POz, O1/2, and Oz (nomenclature based on Sharbrough et al., 1991) (Figure 3). The vertical electrooculogram was recorded from two electrodes placed above and below the right eye; the horizontal electrooculogram was recorded from two electrodes at the outer canthus of each eye. The EEG recording was referenced on-line to the left mastoid and re-referenced off-line to averaged bilateral mastoids. Electrode impedance was kept below 5 kΩ, and the EEG signal was digitized on-line with 1000 Hz, afterward down-sampled to 500 Hz, and amplified within a band pass from DC to 130 Hz.
Functional Near-infrared Spectroscopy Data
Because fNIRS data can be affected by movement artifacts, data of individual participants were screened manually. Identified artifacts were removed by a linear interpolation approach. A 0.5-Hz low-pass filter (Butterworth, third order) was applied to attenuate high-frequency artifacts mainly arising from the heart beat. To correct drifts and slow fluctuations, an additional high-pass filter at 0.03 Hz was used. To assess the vascular response to the different conditions, the stimulation modeled as a boxcar was convolved with a canonical hemodynamic response function peaking at 6 sec. Data were then fed into a general linear model approach (similar to that adopted in Statistical Parametric Mapping) to obtain beta values for each condition and each of the two hemoglobins. Note that the typical oxygenation response consists of an increase in [oxy-Hb] and a decrease in [deoxy-Hb] (Obrig & Villringer, 2003), the latter physiologically tightly coupled to the BOLD contrast changes as assessed in fMRI studies. In brief, a decrease in [deoxy-Hb] can be explained by a faster washout during an increase in regional cerebral blood flow in an activated cortical area. The concentration of paramagnetic [deoxy-Hb] also represents the major determinant inversely predicting BOLD contrast changes (Kleinschmidt et al., 1996). Therefore, statistical analyses mainly focused on [deoxy-Hb] and were performed on the beta values.
All probe positions were subjected to statistical analyses and were grouped into six regions of interest (ROIs): left fronto-temporal (Positions 1 and 2 of the left hemisphere), right fronto-temporal (Positions 1 and 2 of the right hemisphere), left temporal (Positions 3 and 4 of the left hemisphere), right temporal (Positions 3 and 4 of the right hemisphere), left temporo-parietal (Positions 5 and 6 of the left hemisphere), and right temporo-parietal (Positions 5 and 6 of the right hemisphere) (see Figure 3 for probe assignments). For the main analysis, a repeated measures ANOVA was computed with the factors condition (legal vs. illegal phonotactics), speech mode (ADS vs. IDS), region (fronto-temporal vs. temporal vs. temporo-parietal), and hemisphere (left vs. right).
All statistical analyses followed a hierarchical schema (cf. Rossi, Gugler, Friederici, & Hahne, 2006; Rossi, Gugler, Hahne, & Friederici, 2005; Hahne & Friederici, 1999). Whenever the main analyses revealed a significant interaction between either region or hemisphere with the factors condition and/or speech mode at p < .05, further post hoc ANOVAs were calculated on the next level. ROI analyses were calculated whenever both factors region and hemisphere interacted with the factors condition and/or speech mode. A correction according to Greenhouse and Geisser (1959) was applied and reported as the corrected significance.
Trials contaminated with artifacts were excluded from further analyses. Artifacts arising from vertical eye movements were corrected using the algorithm of Gratton, Coles, and Donchin (1983). On average, 12.2% of the trials were rejected (legal pseudowords–ADS: 12.5%; illegal pseudowords–ADS: 12.6%; legal pseudowords–IDS: 12.1%; illegal pseudowords–IDS: 11.5%). An ANOVA with the factor condition (legal pseudowords–ADS; illegal pseudowords–ADS; legal pseudowords–IDS; illegal pseudowords–IDS) was performed in order to assess whether the amount of excluded trials differed across conditions. The ANOVA did not reveal a significant main effect of condition [F(3, 63) < 1].
ERPs were computed for each participant and each experimental condition. A 1000-msec window after pseudowords onset was averaged with a 100-msec prestimulus onset baseline.
All statistical analyses were performed on unfiltered data. Only for presentation purposes of the ERPs of Figure 5 was an 8-Hz low-pass Butterworth zero-phase filter (high cutoff: 8 Hz; slope: 12 dB/oct) applied. For statistical ERP analyses, the mean amplitudes in the time windows 150–250 msec (N100), 250–350 msec (P200), and 450–550 msec (N400) were chosen according to literature and visual inspection of the averaged data. The following electrodes entered statistical analysis and subdivided into six lateral ROIs: left anterior (F3, FC3), right anterior (F4, FC4), left central (C3, CP3), right central (C4, CP4), left posterior (P3, PO3), and right posterior (P4, PO4). In analogy to the fNIRS data for the main analysis at lateral electrodes, a repeated measures ANOVA was computed with the factors Condition (legal vs. illegal phonotactics), Speech mode (ADS vs. IDS), Region (anterior vs. central vs. posterior), and Hemisphere (left vs. right). For midline electrodes, a repeated measures ANOVA was performed with the factors Condition, Speech mode, and Electrode (Cz vs. CPz vs. Pz). Additionally, a peak amplitude analysis was conducted in the N400 time window 450–550 msec in order to better evaluate the strength of the N400 effect. The same factors as for the mean amplitude analyses entered the ANOVAs for lateral and midline electrodes.
To disentangle an effect of phonological neighborhood density as opposed to pure phonotactic processing (Vitevitch & Luce, 1998), we additionally performed an analysis between high- and low-frequency legal consonant clusters. This repeated measures ANOVA on the legal pseudowords included the factors Frequency (high- vs. low-frequency consonant clusters), Speech mode (adult-directed vs. infant-directed), and Electrode (Cz vs. CPz vs. Pz) in the time window 450–550 msec.
The same post hoc hierarchical schema that was used for the fNIRS data was also adopted for the ERP data. A correction according to Greenhouse and Geisser (1959) was applied and reported as the corrected significance.
The ANOVA for [deoxy-Hb] revealed a statistically significant interaction of Condition × Hemisphere [F(1, 14) = 5.25, p = .038]. Subsequent analyses on the left and right hemispheres showed a trend for left hemisphere activation [t(14) = −1.91, p = .077]. The right hemisphere, in contrast, showed no significant effect [t(14) = 1.60, p = .132]. To further identify single-probe positions contributing to the significant Condition × Hemisphere interaction, we conducted paired t tests on every position in both hemispheres. Two positions in the left hemisphere showed a significant difference between the two conditions, namely, Position 2 [fronto-temporal: t(14) = −3.14, p = .007] and Position 3 [temporal: t(14) = −2.20, p = .046]. These results indicate a greater decrease in [deoxy-Hb] for phonotactically legal compared to illegal pseudowords at fronto-temporal and temporal regions of the left hemisphere, irrespective of the speech mode in which pseudowords were presented. Note that a decrease in [deoxy-Hb] best reflects the BOLD signal change typically measured with fMRI.
No interaction with the factor speech mode reached statistical significance. This indicates that no differences are present between IDS and ADS.
The fNIRS data for [deoxy-Hb] comparing phonotactically legal to phonotactically illegal pseudowords are displayed in Figure 4.
Analyses on Mean Amplitudes
For the time window 150–250 msec only a statistically significant main effect of speech mode was present for lateral [F(1, 21) = 6.42, p = .019] as well as for midline electrodes [F(1, 21) = 5.31, p = .032], suggesting a stronger N100 amplitude for IDS in contrast to ADS.
For the time window 250–350 msec neither significant main effects of condition or speech mode nor significant interactions at lateral and midline regions were present, suggesting no differential P200 effect.
For the 450–550 msec time window, the ANOVA revealed neither significant main effects of condition or speech mode nor significant interactions at lateral regions. For midline electrodes, in contrast, the ANOVA resulted in a significant main effect of condition [F(1, 21) = 5.89, p = .024], suggesting a stronger negativity effect for phonotactically legal compared to illegal pseudowords, irrespective of the speech mode in which pseudowords were presented.1 No interaction with the factor speech mode reached statistical significance. This indicates that no differences are present between IDS and ADS.
The ERP data for phonotactically legal versus phonotactically illegal pseudowords are displayed in Figure 5. The plots displayed are filtered with an 8-Hz low-pass filter for presentation purposes only.
Analyses on Peak Amplitudes
Because analyses on mean amplitudes did not reveal any reliable lateralized effect in the N400 time window, we additionally performed peak amplitude analyses between 450 and 550 msec. For this time window, the ANOVA on peak amplitudes revealed a similar pattern of results. Neither significant main effects of condition or speech mode nor significant interactions were present at lateral regions. For midline electrodes, in contrast, the ANOVA resulted in a significant main effect of condition [F(1, 21) = 9.90, p = .005], suggesting a stronger negativity effect for phonotactically legal compared to illegal pseudowords, irrespective of the speech mode in which pseudowords were presented. Thus, these results resemble those of the ERP results on mean amplitudes.
Analysis on Frequency of Consonant Clusters
The repeated measures ANOVA on the legal pseudowords in the time window 450–550 msec did not yield any significant interaction with the factor frequency (p > .05). This suggests that the effects observed in the present study represent phonotactic processes rather than effects arising from phonological neighborhood density (for a detailed interpretation of these results, please refer to the Discussion section).
The present study investigates implicit phonotactic processing in adults in a combined electrophysiological and vascular approach. To this end, we compared the response to pseudowords, both legal and illegal, with respect to the rules of German phonotactics. Because the adult participants listened passively to the stimuli, our results, which demonstrated a stronger brain response to legal than to illegal pseudowords, confirm the hypothesis that prelexical cues trigger lexical processing irrespective of task relevance. Our approach is novel in that it complements the exquisite temporal resolution inherent to ERP techniques with fNIRS, yielding a rough topographical localization of cortical oxygenation changes. These oxygenation changes stem from a focal increase in regional cerebral blood flow over an activated area, and thus, integrate electrophysiological data with vascular-based imaging such as BOLD contrast fMRI. As predicted by the Dynamic Dual Pathway Model (Friederici & Alter, 2004), we supply first experimental evidence that phonotactic segmental cues are preferentially processed in left hemispheric language networks. This is in line with recent assumptions of lateralized processing of fast segmental acoustic input including phonological and lexico-semantic representations (Hickok & Poeppel, 2007). By efficiently applying the complementary advantages of vascular and electrophysiological techniques in language research, our study also demonstrates that these silent methods are easy to combine in a comparatively “natural” experimental environment.
Two classes of auditory stimuli were presented to the participants: phonotactically legal and phonotactically illegal pseudowords. Legality was defined with respect to phonotactic rules of German, the native language of the participants. To exclude confounding effects on the illegal pseudowords arising from a mixing of phonotactic rules from different languages, the stimuli used here were controlled for illegality in the sense that they corresponded to phonotactic rules of another existing language, namely, the Slovak language. Slovak contains a much richer variety of phoneme combinations in syllable onsets than German. In a pre-experiment we demonstrated that German native speakers without previous knowledge in Slavic languages classified the phonotactically illegal material reliably. It is crucial to control for the phonotactic rules of the illegal material, as participants might have been exposed to a number of foreign languages, which could confound the comparison. Due to the German educational system, expertise in Slavic languages is rare and exposition in the media is sparse.
In line with our hypothesis, electrophysiological findings showed a larger N400 for phonotactically legal than for phonotactically illegal pseudowords. Thus, adult speakers use prelexical cues to segment the acoustic speech stream into single linguistic units and thereby trigger lexical search mechanisms.
The brain response generated was differential although no explicit judgment on the material was required and all participants were naive to the material. We therefore argue that stimuli legal with respect to the listeners native phonotactic rules preferentially initiate lexical processing mechanisms and do so even when no lexical context or explicit task is given. In this context, the N400 difference, although statistically significant (legal vs. illegal pseudowords), was small compared to previous work. This could be confirmed by an analysis on the mean amplitudes as well as by a peak amplitude analysis, both yielding the same focal N400 effect at midline but not at lateral electrode sites. The attenuation and focal specificity of the N400 effect in a passive listening task is most likely caused by attentional factors. Compared to the previous study on phonotactic processing in adults (Friedrich & Friederici, 2005), subjects did not have to respond to the stimuli. The attenuation of ERPs when attention of the stimuli is reduced is known from numerous studies. This has even been demonstrated for specific task instructions in auditory mismatch paradigms (Müller, Achenbach, Oades, Bender, & Schall, 2002). Furthermore, in our study, pseudowords were presented outside of any semantic context. The design included neither real words nor pictures, which have frequently been shown to elicit a semantic response (Kutas & Federmeier, 2000). This “purely phonotactic” context is putatively another factor in attenuating the strength of the N400 effect in our study. Beyond these quantitative differences, the ERP results of the present study qualitatively confirm previous findings, thus highlighting the importance of phonotactic cues for lexical activation (Friedrich & Friederici, 2005). This is all the more notable in that the rather monotonous and highly controlled material spoken by the same speaker might conceivably completely abolish any effort for linguistic access to the material. We thus demonstrate the implicit nature of phonotactically driven lexical processing in auditorily presented pseudowords.
Although the N400 effect confirms the difference between the preferential processing of the phonotactically legal rules, the analysis of the ERPs did not yield a hemispheric lateralization of the effect (note that the N400 is described as a centro-parietal component; Kutas & Federmeier, 2000). However, the oxygenation response showed a clear lateralization, supplying the first experimental evidence for preferential left hemispheric recruitment for phonotactic processing independent of a lexico-semantic context. This leftward lateralization included both fronto-temporal and temporal areas. It might be argued that topographical localization with fNIRS is rough due to technical and physical limitations and that the ascription to specific gyral or sulcal anatomy is tentative (Okamoto & Dan, 2005). A more reliable “tomography” can be also obtained with fNIRS, however, such an approach requires the application of a huge amount of probes positioned at close vicinity to each other (or the direct comparison to each individual's structural MRI scans). Such a setting was not possible with the fNIRS machine adopted in the present study due to limited input channels. We chose a limited application of fNIRS probes as we were primarily interested in lateralization effects gained by the fNIRS. Further, we decided to use a limited probe configuration as we are currently conducting the same study in infant participants, and thus, aimed at comparing the topography of the effects in infants to adults. As the head space is restricted in infants in this case, it would not have been possible to apply a great amount of channels to get a higher spatial resolution approximating that of fMRI. Despite these limitations, the rough topographical ascription to fronto-temporal and temporal areas is in line with previous studies which found activations in the inferior frontal gyrus and superior temporal gyrus in relation to phonological and lexical processes (Lau et al., 2008; Vigneau et al., 2006; Friederici & Alter, 2004; Bookheimer, 2002; Kotz et al., 2002). The processing of phonotactics is partially related to phonemic perception. Many fMRI studies on phonemic perception contrasted native with nonnative phonemes such as /l/ and /r/ in native speakers of Japanese (Hara, Nakamura, Kuroki, Takayama, & Ogawa, 2007; but see also Wilson & Iacoboni, 2006; Liebenthal, Binder, Spitzer, Possing, & Medler, 2005) and found differential activations especially in left temporal areas. Although the involved fronto-temporal regions of the left hemisphere found in the present study by means of fNIRS may lead to the assumption that phonotactics resembles phonemic perception, one crucial difference has to be stated here. In our study, all phonemes used were part of the participants' phonological repertoire in his or her native language (German). Solely the combination of these native phonemes at the word onset elicits a nonnative feature due to a violation of a phonotactic rule. This is an essential difference pointing to the fact that even though some commonalities between phonotactics and phoneme perception are present, differential processing mechanisms must also play a role. In order to be able to topographically disentangle brain areas involved in phoneme perception from areas associated with phonotactic processing, future fMRI studies would be advisable.
The present study provides first evidence with respect to the topography and lateralization of phonotactic processing mechanisms and confirms hypotheses which assume a left hemispheric lateralization for processes involving phonotactics. This might be ascribed to the fact that (i) phonotactics is a prelexical cue which is part of phonology and (ii) phonotactics represents a segmental feature. Beyond the differential lateralization of the processing of legal as compared to illegal phonotactic rules, both methods (fNIRS and ERPs) confirm that familiar prelexical cues, such as legal phonotactics, successfully trigger lexical activation processes. Due to the much longer latency of the vascular response in the range of seconds, it might be argued that the lateralization of the NIRS response was caused by a process substantially later than the electrophysiologically recorded difference between 450 and 550 msec after stimulus onset. We cannot fully exclude this possibility; however, the choice of the “purely” phonotactic context selectively presenting highly controlled monosyllabic pseudoword material renders this explanation unlikely. Future studies using high-density EEG and fNIRS or EEG–fMRI arrays might supply more topographical specification to substantiate lateralization and topographical details of phonotactic processing.
Apart from phonotactic regularities which supply important prelexical cues in spoken word recognition, phonological neighborhood density also strongly guides word recognition. Phonological neighborhood density describes the number of words that are phonologically similar to a given word. It is known that phonotactic probability and neighborhood density are positively correlated (Vitevitch, Luce, Pisoni, & Auer, 1999; Landauer & Streeter, 1973). Thus, theoretically, the observed effects between phonotactically legal and illegal pseudowords in the present study might also reflect phonological neighborhood density as opposed to or in addition to phonotactic differences. This issue was advanced by a word/pseudoword auditory naming task in which Vitevitch and Luce (1998) found that subjects repeated real words with high phonotactic probability/neighborhood density more slowly when compared to words with low phonotactic probability/neighborhood density. This effect may index an increased lexical competition among real words. Interestingly, pseudowords elicited the reversed pattern: Pseudowords with high phonotactic probability/high neighborhood density were produced faster than low phonotactic probability/low neighborhood density pseudowords. This was most likely due to the greater similarity of pseudowords with a high phonotactic probability/high neighborhood density to real words. The authors explain this apparent discrepancy by postulating that phonotactic probability has a prelexical locus, whereas phonological neighborhood density is driven by lexical processes. Because pseudowords do not have lexical representations, Vitevitch and Luce (1999) state that pseudowords were processed at a prelexical rather than a pure lexical level. Thus, the observed effect in Vitevitch and Luce (1998) for pseudowords can be attributed to phonotactic probability. Note that studies focusing on phonological neighborhood density and phonotactic probability effects mostly adopted a production task (Vitevitch & Luce, 1998) or other experimental tasks such as speeded same–different judgment or lexical decision tasks (Vitevitch & Luce, 1999), and thus, cannot be compared directly to passive listening as investigated in the present study. However, in order to further elucidate this issue, we additionally performed an analysis between high- and low-frequency legal consonant clusters. This repeated measures ANOVA on the legal pseudowords included the factors frequency (high- vs. low-frequency consonant clusters), speech mode (adult-directed vs. infant-directed), and electrode (Cz vs. CPz vs. Pz) in the time window 450–550 msec and was carried out for the ERP data only due to the temporal comparability to reaction times used in Vitevitch and Luce (1998). Because frequency predominantly influences lexical rather than prelexical processes, a similar ERP pattern for high and low frequent consonant clusters would support a prelexical effect arising from pure phonotactic processes. Indeed, the results of this analysis did not yield any significant interaction with the factor frequency (p > .05). This suggests that the effects observed in the present study are tapping on prelexical processes and are therefore very likely to result from sensitivity to legal versus illegal phonotactic sequences rather than from phonological neighborhood density.
The present study also investigated potential effects arising from the presentation mode by contrasting ADS to IDS. We included this test as IDS in contrast to ADS contains specific suprasegmental features—characterized by a higher pitch contour, a longer duration and a high phonological clarity—and thus, this manipulation might lead to differential processing mechanisms. Furthermore, this variation allows for a comparison in future research on the development of phonotactic processing in young infants using the same material. Although it is known that infants at specific ages react differently to linguistic stimuli presented in IDS or ADS (Hayashi, Tamekawa, & Kiritani, 2001), the question whether presentation mode also leads to differential processing in adults has not yet been addressed. Results of the present study suggest neither differential temporal nor topographical characteristics in adult participants. Notably, the ERP and fNIRS results are also complementary with respect to this contrast, as both the N400 effect and the lateralization effects for phonotactically legal compared to illegal pseudowords elicit the same direction for ADS and IDS. Notably, the amplitude of the N100 component differed in response to IDS when compared to ADS. This held true irrespective of the phonotactic condition. Thus, the acoustic parameters differing between IDS and ADS were clearly perceived by the subjects signaling a differential stimulus input (Näätänen & Picton, 1987). On the contrary, the temporal neuronal processing of phonotactic rules reflected in the N400 component, however, remained unaffected by the speech mode but showed a critical dependence on phonotactic legality.
In sum, the neurophysiological findings of the present study argue for differential lexical activation processes based on the phonotactic rules of the participants' native language. Furthermore, fNIRS revealed the recruitment of a specialized left hemispheric network which is crucial for processing language-related aspects including phonotactics. The results presented here have important implications for rule-based processing, as phonotactic rules represent prelexical cues involved in the segmentation of an acoustic speech signal in both first-language acquisition and second-language learning. It is still unclear which neuronal processes, either temporal or topographic, are involved while learning a language during early infancy and while acquiring a second language later on in childhood or even in adulthood.
The research was supported by the EU (NEST 012778, EFRE 20002006 2/6; nEUROpt 201076), and BMBF (BNIC, BCCN, German–Polish cooperation FK: 01GZ0710). S. R. was partially supported by the Center for Stroke Research, Charité University Medicine Berlin. I. W. is supported by the Stifterverband für die Deutsche Wissenschaft (Claussen-Simon-Stiftung). We thank Jörg Dreyer from the Zentrum für Allgemeine Sprachwissenschaft (ZAS) for help with stimulus recording, Mariana Patak for speaking the material, Niki K. Vavatzanidis for help during material construction, Stefan P. Koch and Jens Steinbrink for helping with fNIRS data analysis, an anonymous native English speaker for proofreading, and two anonymous reviewers for helpful comments.
Reprint requests should be sent to Sonja Rossi, Max Planck Institute for Human Cognitive and Brain Sciences, Department of Cognitive Neurology, Stephanstraße 1A, 04103 Leipzig, Germany, or via e-mail: firstname.lastname@example.org.
As suggested by a reviewer, we performed the same analysis on off-line filtered data. Using a 70-Hz low-pass filter as well as a 30-Hz low-pass filter, the same statistical results were obtained.
These authors contributed equally.