Processing of an obligatory phonotactic restriction outside the focus of the participants' attention was investigated by means of ERPs using (reversed) experimental oddball blocks. Dorsal fricative assimilation (DFA) is a phonotactic constraint in German grammar that is violated in *[ɛx] but not in [ɔx], [ɛ∫], and [ɔ∫]. These stimulus sequences engage the auditory deviance detection mechanism as reflected by the MMN component of the ERP. In Experiment 1 (n = 16), stimuli were contrasted pairwise such that they shared the initial vowel but differed with regard to the fricative. Phonotactically ill-formed deviants elicited stronger MMN responses than well-formed deviants that differed acoustically in the same way from the standard stimulation but did not contain a phonotactic violation. In Experiment 2 (n = 16), stimuli were contrasted such that they differed with regard to the vowel but shared the fricative. MMN was elicited by the vowel change. An additional, later MMN response was observed for the phonotactically ill-formed syllable only. This MMN cannot be attributed to any phonetic or segmental difference between standard and deviant. These findings suggest that implicit phonotactic knowledge is activated and applied in preattentive speech processing.
The phonological knowledge of a native speaker includes the language-specific inventory of distinctive speech sounds (phonemes). Phonetic knowledge includes the specific articulatory implementation and acoustical properties of the speech sounds. Given a sequence of sounds in a word, phonetic knowledge also includes the degrees of coarticulation of the sounds and the factors regulating such coarticulation. Phonological knowledge in turn includes abstract principles that restrict possible sequences of speech sounds in words, that is, phonotactic restrictions. According to phonological theory, these aspects of phonological grammar are represented independently of the set of possible phonemes and are not included in the entries of the mental lexicon (Kenstowicz, 1994). Many of these phonotactic restrictions belong to one of three classes: requirements on the syllable structure of a language, requirements of similarity of certain (typically adjacent) sounds, as investigated in this study, and requirements of dissimilarity of certain (typically adjacent) sounds (De Lacy, 2007).
In speech processing, the cognitive system fast and efficiently accesses phonetic as well as phonological information. On the phonetic and segmental phonological processing level, the continuous and highly variable acoustical input is mapped to discrete and abstract linguistic categories by means of phonetic cues. In this regard, phonetic and phonological processing provides the basis for all higher ordered processes of structural and semantic analysis. The phoneme sequence can be evaluated for phonotactic well-formedness on the basis of the language-specific phonotactic constraints that are part of the phonological grammar. Phonotactic analysis differs from lexical processing, that is, the activation of adequate entries of the mental lexicon, because phonotactic evaluation takes place even if no corresponding word form is found in the lexicon. That means even pseudo words undergo language-specific phonotactic analysis and evaluation with regard to syllable structure, accent pattern, and contextual adjacent phoneme combinations.
Focussing on the processing of obligatory phonotactic restrictions, we investigated the involvement of implicit language-specific phonotactic knowledge in preattentive automatic speech processing, that is, when the acoustic stimulation is entirely outside the participants' focus of attention.
Electrophysiological Measure: Mismatch Negativity
As a tool in this investigation, we used the MMN component of the human ERP, which is an automatic brain response reflecting the operation of a preattentive auditory sensory-memory-based deviance detection mechanism. Here, the auditory system extracts regularities from the repetitive auditory stimulation and temporarily stores them in auditory sensory memory. New incoming stimuli are compared with this representation of regularity. If a deviancy is detected, the MMN is elicited. Deviations from various simple, complex, and even abstract auditory regularities elicit MMN (for a review, see Näätänen, Paavilainen, Rinne, & Alho, 2007). The MMN generating process is not volitional; it does not require attentional selection. MMN is elicited whether the sounds are attended or ignored. Thus, the MMN can be used to answer questions pertaining to what auditory regularities are detected when sounds are not in the focus of attention. Additionally, by assessing the detected regularities via MMN, it is possible to gain insight into the kinds of auditory analyses performed on task-irrelevant sounds. That is, MMN can be used as a probe in investigating the characteristics and the time course of auditory processing taking place before deviance detection without the interference of additional task-related processes and with a high temporal resolution. This rationale has been successfully applied in studies of segmental phonetic and phonological analysis (e.g., Sharma & Dorman, 2000; Dehaene-Lambertz, 1997; Näätänen et al., 1997; Winkler et al., 1999) and abstract phonological features (Eulitz & Lahiri, 2004; Phillips et al., 2000; for a review, see Näätänen et al., 2007; Näätänen, 2001).
Recent Neurophysiological Studies on Phonotactic Processing
So far, only a few studies used electrophysiological methods to investigate the influence of phonotactics in speech processing. Although investigating different kinds of phonotactic phenomena, they all demonstrate very early access of language-specific phonotactic knowledge in auditory speech processing.
Using a passive oddball protocol, Bonte, Mitterer, Zellagui, Poelmans, and Blomert (2005) investigated the processing effects of distributional probabilities of phoneme clusters by contrasting obstruent clusters that occur with high or low frequency in Dutch. Their results showed stronger MMN responses when the deviant stimulus was a frequently occurring phoneme cluster than when an infrequently occurring phoneme cluster served as deviant in the protocol.
Dehaene-Lambertz, Dupoux, and Gout (2000) investigated the influence of obligatory language-specific syllable structure rules by means of a cross-linguistic design. Their stimuli (such as igmo vs. igumo) were phonotactically well-formed in French, whereas the item igmo violated syllable structure restrictions in Japanese. Japanese speakers automatically compensated for the phonotactically ill-formed sequence *[gm]1 by inserting a vowel, thereby turning igmo into igumo. Although French native speakers showed brain responses similar to MMN, indicating the detection of a difference between the two stimuli presented, no MMN was observed for Japanese speaking participants. These results point to the involvement of very early processes of speech perception. However, no final conclusion can be drawn about whether phonotactics is processed preattentively or not because, in this study, a protocol was used that necessitated the participants' attention to be focused on the auditory stimulation.
Mitterer and Blomert (2003) showed the processing relevance of optional nasal place assimilation in Dutch using a passive oddball protocol. Two analogously constructed stimulus pairs were presented: tuinbank (“garden bank”) versus assimilated tuimbank; further tuinstoel (“garden chair”) versus tuimstoel, where the same change from [n] to [m] is not motivated by the assimilation rule. In this second stimulus pair, MMN was elicited, reflecting the processing of differences between the contrasted stimuli, whereas no comparable response was found for the first stimulus pair, where the assimilation rule allows either [n] or [m] in different renditions of the same word.
In a study using magnetoencephalography, Flagg, Cardy, and Roberts (2006) examined regressive nasal assimilation in English vowel–nasal sequences such as [an] versus [ãn]. The authors contrasted phonotactically adequate sequences as for example [aba] with sequences that start with a nasalized vowel thereby provoking a misled phonotactic expectation of a following nasal consonant [ãba]. In contrast to the phonotactically well-formed condition, the auditory processing of stimuli that contained an unfitting nasalization of the vowel resulted in a delay in the neuromagnetic activity evoked by the following consonant.
Concerning preattentive processing of phonotactic phenomena, the present study differs from what has previously been reported in this domain in two major ways:
We investigate the role of abstract phonotactic constraints as parts of language-specific phonological grammar. In this regard, our study is different to the research of Flagg et al. (2006) and Mitterer and Blomert (2003) who investigated phenomena that are phonotactically relevant but belong to the domain of nonobligatory coarticulation and assimilation processes. The present study also differs from Bonte et al. (2005) whose research concerned the impact of distributional probabilities of sound sequences on speech processing.
The present study focuses on the processing of ungrammatical speech material. Thus, we are interested in investigating the specific processes of phonotactic evaluation, when the system is not able to create a linguistically well-formed representation, but when it is forced to deal with ungrammatical linguistic input. In this regard, our study differs from the approach of Dehaene-Lambertz et al. (2000), which aimed at an inhibited differentiation between stimuli that are physically distinct but do not differ in a linguistically relevant manner because of an automatic phonological repair process.
Dorsal Fricative Assimilation
We investigate a fairly robust phonotactic phenomenon in German, namely, the distributional alternation of the palatal [ç] and the velar [x] dorsal fricatives. In phonological theory, these fricatives are not distinguished at the level of mental lexical entries but rather at the level of the abstract phonological representation (Noske, 1997; Merchant, 1996; Hall, 1989, 1992; MacFarland & Pierrehumbert, 1991). This means that [ç] and [x] are considered to be allophones. The choice between these fricatives is predictable as it depends on the preceding vowel. After front vowels, [ç] occurs as, for example, [ɛçt] (German echt, “real”). After back vowels, [x] occurs, which is back in articulation as well: [kɔx] (German Koch, “cook”). In other contexts, as for example, after consonants and word initially, the palatal fricative [ç] occurs.
This complementary distribution of [x] and [ç] is based on progressive phonological assimilation of the dorsal fricative to the preceding vowel for a place feature specifying tongue backness, [±back] (Hall, 1989). It can be described in terms of a phonotactic restriction, DFA, that demands a vowel and a following dorsal fricative to agree in their phonological feature specifications for backness (Féry, 2001). Sequences consisting either of a front vowel followed by the velar fricative such as *[ɛx] or of a back vowel followed by the palatal fricative, as for example *[ɔç], violate DFA and are therewith ungrammatical. In the present study, we focus on violations of DFA resulting in an ill-formed combination of a lax front vowel and a following velar dorsal fricative.
DFA belongs to the implicit phonotactic knowledge of native German speakers. During phonotactic processing, it is accessed by the cognitive system to evaluate the phonotactic accuracy of the incoming stream of speech. Evidence of the application of DFA in speech processing of German speaking participants has already been given by Weber (2001) who conducted a cross-linguistic behavioural study with German and Dutch speakers. Using a phoneme monitoring task, she presented stimuli that were ill-formed only for the German speaking participants because of a DFA violation (for further behavioural evidence for the impact of phonotactic restrictions on speech processing, see, e.g., Hallé, Segui, Frauenfelder, & Meunier, 1998). With the present study, we want to confirm and expand Weber's (2001) findings. By using ERP methods, we are not only able to investigate the processing of DFA independent of participants' task performance, but we can also test whether DFA is already processed when the speech input is entirely outside the focus of the participants' attention and thus preattentively.
This MMN study consists of two experiments that investigate the influence of DFA on preattentive speech processing. For this purpose, we use a passive oddball design. Stimuli are monosyllables, each composed of a vowel ([ɛ] or [ɔ]) and a fricative ([x] or [∫]) in a two by two design as shown in Figure 1.
The syllables that contain the coronal sibilant, [ɛ∫] and [ɔ∫], are not affected by DFA and do not violate any other phonotactic constraints of German. In the syllables containing the velar fricative, *[ɛx] and [ɔx], DFA applies and is violated in *[ɛx].
In Experiment 1, contrasting stimulus pairs share the vowel and differ with regard to the fricative ([ɔx] vs. [ɔ∫]; *[ɛx] vs. [ɛ∫]). In Experiment 2, contrasting stimulus pairs share the fricative and differ with regard to the vowel ([ɛ∫] vs. [ɔ∫]; *[ɛx] vs. [ɔx]). In both experiments, the critical experimental condition contains the phonotactically ill-formed syllable *[ɛx]. The other analogously structured contrast consists of phonotactically well-formed syllables and serves as a control condition.
For both experiments, we expect a stronger MMN response when the deviant, in addition to acoustical and phoneme-related discrepancies, contains a phonotactic violation. If such a modulation of the MMN amplitude caused by a phonotactically ill-formed deviant is observed, we take this as evidence for the influence of the phonotactic constraint DFA on preattentive speech processing.
In addition, ERP responses to the syllables presented in 100% and 50% conditions were examined in both experiments to estimate possible effects of phonotactic processing per se and to investigate context influences on the processing without reliance on the memory-based deviance detection mechanism.
In Experiment 1, standard and deviant of each oddball contrast ([ɔx] vs. [ɔ∫], *[ɛx] vs. [ɛ∫]) shared the vowel and differed in respect to the fricative (see Figure 1). Thus, MMN is expected to be elicited by the change of the acoustically and phonologically differing fricatives in both oddball contrasts. In addition, we hypothesized the phonotactic violation to affect the deviance detection mechanism when *[ɛx] is presented as deviant among the standard [ɛ∫]. Because the detection of the violation of DFA in *[ɛx] coincides with the recognition of the fricative, this additional effect is expected to enhance the MMN.
In a comparison across blocks, the amplitude of the MMN elicited by the phonotactically ill-formed stimulus *[ɛx] is expected to be greater than the MMN amplitude elicited by the analogously constructed well-formed stimulus [ɔx] from the control contrast. Because the syllables [ɛ∫] and [ɔ∫] are both phonotactically well-formed, we do not expect any comparable difference between the MMN amplitudes [ɛ∫] and [ɔ∫].
In a comparison within blocks, the amplitudes of the MMN responses elicited by the first oddball contrast (*[ɛx] vs. [ɛ∫] and vice versa) do not differ in the same way as the MMN amplitudes elicited by [ɛ∫] the stimuli from the second oddball contrast ([ɔx] vs. [ɔ∫] and vice versa). A statistically significant interaction between the experimental factors Vowel and Fricative is expected.
Sixteen right-handed volunteers participated in Experiment 1 (eight women; median age = 27 years; range = 22–32), all of them native speakers of German. None of the participants reported any relevant experience with languages or varieties of German, where [ɛx] is a phonotactically well-formed syllable such as Dutch or Swiss German. All participants reported normal auditory and normal or corrected-to-normal visual acuity and no neurological, psychiatric, or other medical problems. Handedness was assessed using an inventory adopted from Oldfield (1971). Participants gave informed consent and received monetary compensation.
Four vowel–consonant syllables were used: [ɛ∫], *[ɛx], [ɔ∫], and [ɔx]. None of these syllables have lexical meaning in German. The stimuli are phonotactically well-formed in German, except for the syllable *[ɛx], which violates the constraint of DFA. Stimulus material was digitally recorded with a 48-kHz sampling rate. The syllables were articulated numerous times by a professional female speaker. To include acoustic variability into the stimulus material, we selected 10 different utterances of each syllable category, resulting in a set of 40 stimulus syllables in total (see Eulitz & Lahiri, 2004; Jacobsen, Schröger, & Alter, 2004). After low-pass filtering with a cutoff frequency of 10 kHz, duration and pitch manipulations of each syllable exemplar were performed using the PSOLA tool of Praat software (Boersma & Weenink, 2008). Duration of each stimulus was equated to 280 msec, in doing so the vowel part of the syllable was set to 100 msec, the fricative to 180 msec (original range = *[ɛx] 112/194 msec, [ɛ∫] 110/230 msec, [ɔx] 105/201 msec, [ɔ∫] 111/215 msec; mean vowel duration = 109.5 msec; and mean fricative duration = 189.5 msec). Measures of fricative onset are approximate because of the acoustic variation in the material. The pitch contour had to be manipulated because in the raw material, pitch contour was confounded with syllable type. This was done by matching the pitch contour of two tokens of different syllable types at a time. For example, the first token of *[ɛx] was matched with the first token of [ɛ∫] and the fifth token of [ɔx] was matched with the fifth token of [ɔ∫]. Intensities were normalized using the root mean square of the whole sound file.
Experimental Design and Procedure
In the experimental conditions (Figure 1), oddball stimulus sequences of 1400 trials in total were presented per condition. In each sequence, one syllable type served as standard (85% of the trials) and another as deviant, delivered in a pseudo-randomized order forcing at least two standards to be presented between successive deviants. Each blocked oddball condition was split into two blocks. Six additional blocks were run: The 10 exemplars of each syllable type were presented in separate blocks with pseudo-randomized order (four 100% blocks). The exemplars of two syllable types were presented as they were contrasted in oddball blocks but with equal probabilities (two 50% blocks). All of these blocks contained 210 trials per syllable type, respectively. Stimulus sequences were presented with a stimulus onset asynchrony randomly varying from 550 to 900 msec in units of 10 msec. Altogether, 14 stimulus blocks were administered to the participants. The order of the blocks was counterbalanced between participants. Participants were seated comfortably in a sound-attenuated and electrically shielded experimental chamber and were instructed to ignore the auditory stimulation while watching a self-selected silent subtitled movie. Stimuli were presented binaurally at approximately 65 dB SPL (artificial head HMS III.0; HEAD acoustics) through headphones. All participants reported that they were able to ignore the auditory stimulation. Informal questioning of the participants revealed that they had perceived all stimulus types as speech sounds. An experimental session lasted approximately 2 hr (plus additional time for electrode application and removal) including three breaks of about 5 min each.
The EEG (Ag/AgCl electrodes, Falk Minow Services, BrainAmp EEG amplifier; BrainAmp Products GmbH, Garching, Germany) was recorded continuously from 26 standard scalp locations according to the extended 10–20 system (American Electroencephalographic Society, 1991; FP1, FPz, FP2, F7, F3, Fz, F4, F8, FC5, FC1, FC2, FC6, T7, C3, Cz, C4, T8, CP1, CP2, P7, P3, Pz, P4, P8, O1, and O2) and the left and right mastoids. The reference electrode was placed on the tip of the nose and the ground electrode at the right cheekbone. Electroocular activity was recorded with two bipolar electrode pairs, the vertical EOG from the right eye by one supraorbital and one infraorbital electrode and the horizontal EOG from electrodes placed lateral to the outer canthi of both eyes. Impedances were kept below 5 kΩ. On-line filtering was carried out using a 0.1-Hz high-pass, a 250-Hz low-pass, and a 50-Hz notch filter. The signal was digitized with a 16-bit resolution and a sampling rate of 500 Hz.
Off-line signal processing was carried out using EEP 3.0. The raw EEG data were band-pass filtered with a finite impulse response filter: 2501 points and critical frequencies of 1.5 Hz (high-pass) and 15 Hz (low-pass). EEG epochs with a length of 650 msec, time locked to the onset of the stimuli, including a 100-msec prestimulus baseline, were extracted and averaged separately for each condition (syllable, standard, deviant, 100% condition, and 50% condition) and for each participant. The ERP responses to the first five stimuli of each block as well as to standard stimuli immediately following deviants were not included in the analysis. Epochs showing an amplitude change exceeding 100 μV at any of the recording channels were rejected. Grand-averages were subsequently computed from the individual-subject averages.
Deviant-related effects were examined with deviant-minus-standard difference waveforms that were calculated separately for each syllable (across oddball blocks) by subtracting the standard ERPs from the respective deviant ERPs, for example, *[ɛx] as deviant minus *[ɛx] as standard. This was done to exclude potential effects of physical differences between the stimuli from the MMN computation (see Eulitz & Lahiri, 2004; Jacobsen, Schröger, et al., 2004). To quantify the deviance-related effects, we measured amplitudes as the mean voltage in a fixed 40-msec time window, which was centered on the averaged peak latencies of the grand-average difference waves of all four syllables at F3, Fz, F4, C3, Cz, and C4 electrode sites. For quantification of the ERPs from the 100% and 50% condition, a 40-msec window was centered on the averaged peak latency of the grand-average ERP wave for the syllable *[ɛx] at C3, Cz, and C4 electrode sites.
To quantify the full MMN amplitude, we rereferenced the scalp ERPs to the averaged signal recorded from the electrodes positioned over the left and right mastoids. This computation results in an integrated measure of the total neural activity underlying the auditory MMN (e.g., Schröger, 1998).
Only effects significant at the α level p < .05 that were relevant to our hypotheses were reported. Deviance-related effects, the presence and amplitude of MMN responses, were analyzed on the basis of data from FZ electrode where MMN is typically maximal (Schröger, 1998). To test the presence of MMN for each syllable separately, we compared the deviant responses and the corresponding standard responses to the physically identical syllables by means of dependent t tests. The sizes of the MMN responses were analyzed by means of a three-way repeated measures ANOVA with the factors Stimulus (standard, deviant), Vowel ([ɛ], [ɔ]), and Fricative ([x], [∫]). Finally, pairwise post hoc comparisons between syllable types were drawn calculating two-way repeated measures ANOVAs with the factors Stimulus (standard, deviant) and syllable (the two respective syllables to compare). Bonferroni-adjusted α level was set to p < .01.
Effects on the ERPs from the 100% and 50% conditions were analyzed on the basis of the data collected at Cz electrode site, where the negative-going deflection of the grand-averaged ERP elicited by *[ɛx] was numerically maximal in the respective time window. Two-way repeated measures ANOVAs with the experimental factors Vowel ([ɛ], [ɔ]) and Fricative ([x], [∫]) were calculated separately for the 100% and 50% condition. Bonferroni-adjusted pairwise post hoc comparisons between *[ɛx] and the well-formed syllable types were drawn using dependent t tests. All statistical tests were also run on the basis of the nose referenced data and separately for the data recorded from the mastoid electrodes. Results, with regard to the hypotheses formulated in advance, did not differ depending on which type of data was used (Figure 2).
Deviance-related Effects (MMN)
The time window for ERP quantification was set from 192 to 232 msec after stimulus onset, that is, 92–132 msec after the onset of the fricative. To test the presence of MMN for each syllable type separately, we compared the ERP amplitudes to the deviants and the ERP amplitudes to the standards elicited by the same syllable type by means of two-tailed dependent t tests. In the first oddball contrast, the phonotactically ill-formed syllable *[ɛx] elicited a significant MMN response (MMN peak latency = 214 msec after stimulus onset, peak amplitude = −1.516 μV, mean amplitude = −1.476 μV), t(15) = −8.9, p < .001, but the syllable [ɛ∫] did not (MMN peak latency = 204 msec, peak amplitude = −0.488 μV, mean amplitude = −0.463 μV), t(15) = −1.9, p = .081. In the second oddball contrast, [ɔx] did not evoke a significant MMN response (MMN peak latency = 220 msec, peak amplitude = −0.470 μV, mean amplitude = −0.412 μV), t(15) = −1.7, p = .102, whereas [ɔ∫] did (MMN peak latency = 208 msec, peak amplitude = −1.100 μV, mean amplitude = −1.018 μV), t(15) = −3.3, p = .005.
The three-way repeated measures ANOVA of the factors Stimulus, Vowel, and Fricative yielded a significant main effect of the factor Stimulus, F(1, 15) = 75.5, p < .001, and significant interactions Vowel × Fricative, F(1, 15) = 19.9, p < .001, and Stimulus × Vowel × Fricative, F(1, 15) = 9.4, p = .008. This latter interaction reflected an asymmetry with respect to the MMN amplitudes across the four syllable categories, as was expected in our hypotheses.
Bonferroni-adjusted pairwise post hoc comparisons between the MMN responses elicited by the syllables within each oddball contrast revealed a significant difference between *[ɛx] and [ɛ∫], indicated by a significant interaction Stimulus × ɛx_ɛ∫, F(1, 15) = 14.3, p = .002, whereas the MMN responses elicited by [ɔx] and [ɔ∫] did not differ significantly, Stimulus × ɔx_ɔ∫, F(1, 15) = 2.0, p = .183. Across the oddball contrasts, the MMN responses to syllables that shared the fricative were compared. The MMN responses to *[ɛx] and [ɔx] differed significantly, indicated by a significant interaction Stimulus × ɛx_ɔx, F(1, 15) = 16.2, p = .001, but there was no significant difference between [ɛ∫] and [ɔ∫], Stimulus × ɛ∫_ɔ∫, F(1, 15) = 1.5, p = .246. Finally, there was no significant difference either between *[ɛx] and [ɔ∫], Stimulus × ɛx_ɔ∫, F(1, 15) = 1.3, p = .272, or between [ɔx] and [ɛ∫], Stimulus × ɔx_ɛ∫, F(1, 15) > 1, p = .872.
Results from the 100% Condition and the 50% Condition
For the 100% condition, the time window for ERP quantification was set from 242 to 282 msec after stimulus onset. The two-way ANOVA revealed only a significant main effect of the factor Fricative, F(1, 15) = 11.3, p = .004. Although the ERP elicited by *[ɛx] showed a numerically stronger negative-going deflection in the investigated time window as compared with the three well-formed syllables, the interaction between the experimental factors Vowel and Fricative did not reach significance. Post hoc comparisons showed significant differences between *[ɛx] and [ɛ∫], F(1, 15) = 15.4, p = .001, as well as between *[ɛx] and [ɔ∫], F(1, 15) = 11.7, p = .004, but no significant difference between *[ɛx] and [ɔx].
For the 50% condition, the analysis window was set from 236 to 276 msec. The two-way repeated measures ANOVA revealed a significant main effect only for Fricative, F(1, 15) = 7.6, p = .014. Bonferroni-adjusted post hoc comparisons did not show any significant differences between *[ɛx] and each of the well-formed syllable types (α level at p < .01).
The phonotactic constraint of DFA in German had an effect on the participants' processing of spoken syllables when they were presented outside the focus of attention. We observed a deviance-related effect attributable to the violation of the restrictions imposed by DFA. When presented as deviant, the phonotactically ill-formed stimulus syllable *[ɛx] elicited a significantly stronger MMN than the corresponding well-formed syllable [ɔx] from the control contrast. The MMN responses elicited by the well-formed deviant syllables [ɛ∫] and [ɔ∫], on the other hand, did not differ significantly from each other, although [ɔ∫] showed a numerically greater MMN amplitude compared with [ɛ∫]. In accord with our hypothesis, we found a statistically significant interaction between the experimental factors Vowel and Fricative, which was caused by the phonotactic violation of the stimulus syllable *[ɛx].
When comparing the MMN responses within one respective oddball contrast (e.g., MMN to [ɔ∫] vs. MMN to [ɔx]), the acoustical differences between the fricatives have to be considered because the size of the MMN reflects the specific differences between the fricatives [x] and [∫]. In the oddball contrast containing the well-formed syllables, an asymmetry in MMN amplitude was observed depending on the syllable presented as standard or as deviant: [ɔ∫] elicited a significant MMN when serving as deviant, but [ɔx] did not. For an explanation of this asymmetry, the different spectral and amplitudinal properties of [x] and [∫] that result from the respective place of articulation may be taken into account. The sibilant [∫] is characterized by an energy concentration at higher frequencies and by a greater noise amplitude compared with the velar fricative (e.g., Gordon, Barthmaier, & Sands, 2002; Johnson, 2002; Jongman, Wayland, & Wong, 2000). If [ɔ∫] is acoustically more salient than [ɔx], it might be easier to detect as a deviant than the less salient deviant [ɔx] in the reversed oddball condition. A comparable pattern of results has been reported by Bishop, O'Reilly, and McArthur (2005). They found asymmetries in MMN responses when contrasting frequency modulated tones with un-modulated ones. The more salient modulated tones elicited MMN when presented as deviants among unmodulated standards, but not the other way around. In our data, MMN responses to the deviants [ɛ∫] and [ɔx] were not only diminished compared with the MMN amplitude from the respective reversed oddball block, but entirely missing (cf., e.g., Pettigrew et al., 2004).
Recent studies showed that the MMN generating deviance detection mechanism is affected by one's (language) familiarity with the stimuli. If the participant is familiar with the deviant stimulus, the corresponding MMN response will be enhanced, whereas unfamiliar deviants elicit weaker MMN responses (e.g., Bonte et al., 2005; Jacobsen, Schröger, Winkler, & Horváth, 2005; Jacobsen, Schröger, et al., 2004; Pulvermüller et al., 2001; Sharma & Dorman, 2000; Winkler et al., 1999; Dehaene-Lambertz, 1997; Näätänen et al., 1997; for reviews, see Näätänen et al., 2007; Schröger, Tervaniemi, & Huotilainen, 2004). Our results, however, show the reversed pattern: The phonotactically ill-formed stimulus *[ɛx], the most unfamiliar deviant, with an occurrence probability of zero in German, elicited the strongest MMN response of all. The phonotactically well-formed stimuli, by contrast, occur in German words as for example [ɛ∫] in fesch (smart), [ɔx] in Koch (cook), and [ɔ∫] in Frosch (frog). The concept of the present study differs from the studies concerning effects of familiarity as mentioned above in one point: In addition to the factor familiarity, the present study varies the grammatical well-formedness of the presented stimuli. Our results suggest a categorical difference between the processing of grammatically ill-formed stimuli and the processing of stimuli that are grammatically well-formed but vary with regard to their occurrence frequency. We assume that the grammatical violation leads to additional processing.
Effects of the phonotactic violation in *[ɛx] on auditory preattentive processing without the context of another stimulus (100% condition) and in an equal probability context (50% condition) were obtained. In both conditions, the numerically largest negative-going ERP deflection, peaking between 200 and 300 msec, was elicited by the phonotactically ill-formed stimulus *[ɛx] compared with the ERPs of the correct stimulus syllables. The observed negativities were maximal at central electrode sites and numerically smaller than in the corresponding deviant ERPs from the oddball blocks. This was also reflected in the less clear-cut pattern of the statistical analyses. Although the effects of the phonotactic violation were larger in the oddball blocks, we, nonetheless, regard these results as corroborating evidence for effects of phonotactic ungrammaticality on early auditory processing.
The goal of Experiment 2 was to temporally separate the effect of the phonotactic violation from the acoustical and phoneme-related changes. The contrasting syllables (*[ɛx] vs. [ɔx], [ɛ∫] vs. [ɔ∫]) differed each with regard to their vowel, whereas the fricatives matched (see Figure 1). The difference between the initial vowels was expected to elicit an early MMN response. The violation of DFA in *[ɛx], however, could only be detected later, with the onset of the fricative. As the same fricative was present in both, in the standard and in the deviant, every difference observed in processing could clearly be attributed to the phonotactic violation present in only one of the syllables. Therefore, we hypothesized the following: (1) Around 100 and 200 msec after stimulus onset, each deviant syllable elicits an MMN response due to the fact that standard and deviant differ with regard to their initial vowel. Amplitudes of the MMN responses to the four deviant syllable types do not differ in this time window. (2) The phonotactically ill-formed stimulus *[ɛx] elicits a second MMN response between 200 and 350 msec after stimulus onset, whereas the phonotactically well-formed syllables [ɛ∫], [ɔx], and [ɔ∫] do not. In this time window, we expect a statistically significant interaction between the experimental factors Vowel and Fricative with regard to the magnitude of the mean amplitude of the MMN.
Sixteen volunteers (eight women; median age = 22 years; range = 19–27 years; two left-handed), all native German speakers without any relevant experience with Dutch or Swiss German, took part in Experiment 2. None of them had participated in Experiment 1. All participants reported normal auditory and normal or corrected-to-normal visual acuity and no neurological, psychiatric, or other medical problems. They gave informed consent and received monetary compensation.
We used the same stimulus material as in Experiment 1.
Experimental Design and Procedure
Oddball contrast 1 contained the stimulus pair *[ɛx] versus [ɔx], and Oddball contrast 2 consisted of the syllables [ɛ∫] versus [ɔ∫]. We used the same experimental setting as in Experiment 1.
The settings for the electrophysiological recordings were the same as in Experiment 1.
EEG data were analyzed in the same way as in Experiment 1.
In Experiment 2, two effects of deviance were expected: The MMN elicited by the change of the initial vowel was obtained by using a time window with a length of 100 msec ranging from 100 to 200 msec after stimulus onset. The effect related to the phonotactic violation was quantified using a 40-msec analysis window centered on the averaged peak latency of the grand average difference waves for the syllable *[ɛx] at F3, Fz, F4, C3, Cz, and C4 electrode sites.
Statistical analyses were performed in the same way as described for Experiment 1, with the exception that the effects of deviance related to the vowel change were not analyzed in detail.
Deviance-related Effects (MMN)
MMN to vowel change (first time window 100–200 msec)
The general presence of MMN was indicated by a significant main effect of the factor Stimulus, F(1, 15) = 6.3, p = .024. As expected, there were no interactions between the factor Stimulus and the experimental factors Vowel and Fricative.
MMN to phonotactic violation (second time window 266–306 msec)
Two-tailed dependent t tests between the respective standard and deviant ERPs of each syllable type showed that the phonotactically ill-formed syllable *[ɛx] had elicited a significant MMN response in the investigated time window (MMN peak latency = 288 msec, peak amplitude = −0.992 μV, mean amplitude = −0.892 μV), t(15) = −4.0, p = .001, whereas the other syllables did not elicit significant MMN responses; mean amplitudes of the respective difference waves amounted to the following: [ɛ∫] = 0.3058 μV, t(15) = 1.6, p = .134; [ɔx] = 0.0205 μV, t(15) = 0.1, p = .918; [ɔ∫] = −0.1024 μV, t(15) = −0.5, p = .592 (Figure 3).
A three-way repeated measures ANOVA revealed no significant main effects but significant interactions Stimulus × Fricative, F(1, 15) = 14.0, p = .002, as well as Stimulus × Fricative × Vowel, F(1, 15) = 11.5, p = .004. Bonferroni-adjusted pairwise post hoc comparisons revealed significant differences between the phonotactically ill-formed syllable *[ɛx] and the well-formed syllable [ɛ∫], Stimulus × ɛx_ɛ∫, F(1, 15) = 20.8, p < .001, between *[ɛx] and [ɔx], Stimulus × ɛx_ɔx, F(1, 15) = 10.1, p = .006, and between *[ɛx] and [ɔ∫], Stimulus × ɛx_ɔ∫, F(1, 15) = 8.7, p = .010, but no significant differences between syllable groups containing no phonotactic violation.
Results from the 100% Condition and the 50% Condition
For the 100% condition, the time window for ERP quantification was set from 228 to 268 msec after stimulus onset. The two-way repeated measures ANOVA revealed a significant main effect of the factor Vowel, F(1, 15) = 4.6, p = .049. Bonferroni-adjusted comparisons showed no significant differences between *[ɛx] and any of the well-formed syllable types.
The time window for the ERP quantification from the 50% condition was set from 230 to 270 msec. The two-way repeated measures ANOVA revealed significant main effects for Vowel, F(1, 15) = 9.5, p = .008, and Fricative, F(1, 15) = 24.1, p < .001. Although the ERP elicited by *[ɛx] showed a numerically stronger negative-going deflection in the investigated time window as compared with the three well-formed syllables, the interaction between the experimental factors Vowel and Fricative did not reach significance. Bonferroni-adjusted pairwise post hoc comparisons revealed significant differences between *[ɛx] and [ɛ∫], F(1, 15) = 15.8, p = .007, and between *[ɛx] and [ɔ∫], F(1, 15) = 23.0, p = .001, but not between *[ɛx] and [ɔx].
In Experiment 2, the phonetic–phonological deviation, carried by the vowel, and the phonotactic deviation were separated in time. As predicted, the initial vowel change elicited MMN between 100 and 200 msec after stimulus onset for all syllables when serving as a deviant. No differences in amplitude of the vowel-related MMN were observed between syllables. In general, this brain response shows rather small amplitudes and a broad latency jitter, which we ascribe to the high acoustical variability of the stimulus material.
In Oddball contrast 1, containing the syllables *[ɛx] and [ɔx], an additional deviance-related effect was observed. The phonotactically ill-formed deviant *[ɛx] presented among realizations of the standard syllable [ɔx] evoked a negativity in the ERP between 250 and 350 msec after stimulus onset, that is to say 150 to 250 msec after the phonotactic violation could be discovered. In contrast to this, analogous negative-going deflections are absent from the ERPs of the phonotactically well-formed deviant syllables [ɛ∫], [ɔ∫], and [ɔx]. We take this negativity as an additional MMN response elicited because the deviant, *[ɛx], is not in accord with the standard, [ɔx], with regard to the abstract feature of phonotactic well-formedness. In this respect, our data support the assumption of an abstract phonotactic evaluation process accessing implicit phonotactic knowledge and affecting the deviance detection mechanism. If the deviant violates phonotactic constraints (and the standard is a well-formed syllable), a discrepancy between the sensory-memory representation of the deviant and the central sound representation of the standard with regard to the deviant's status of phonotactic well-formedness is detected. This comparison elicits the MMN response. Furthermore, the phonotactic violation affected the processing of the ungrammatical stimulus syllable *[ɛx]. Without relying on the deviance detection mechanism, *[ɛx] elicited a larger negativity between 200 and 300 msec than the phonotactically well-formed syllables in the additional 100% and 50% conditions.
However, the fricatives may have been coarticulated with the preceding vowels, leading to differences in their spectral properties. In principle, MMN due to such coarticulatory differences in the fricatives in *[ɛx] versus [ɔx] might occur around the time of the effect that we attribute to the phonotactic violation. For the oddball contrast containing the coronal fricative ([ɛ∫] vs. [ɔ∫]), no systematic differences due to such coarticulatory variations were observed in the ERPs. Hence, we argue that the ERP effect we found in the oddball contrast containing the velar fricative (*[ɛx] vs. [ɔx]) is not, at least not mainly, caused by any acoustical or phonetic difference between the vowel-dependent fricative realizations of both syllables. We tested this assumption by analyzing ERPs, computed separately for every single token of the ill-formed syllable *[ɛx].
Effects of the phonotactic violation in *[ɛx] on auditory preattentive processing without the context of another stimulus (100% condition) and in an equal probability context (50% condition) were also obtained in Experiment 2. Again, in both conditions, the numerically largest negative-going ERP deflection, peaking between 200 and 300 msec, was elicited by the phonotactically ill-formed stimulus *[ɛx] compared with the ERPs elicited by the correct stimulus syllables. The observed negativities were maximal at central electrode sites and numerically smaller than in the ERPs elicited by *[ɛx] as deviant, which was also reflected, as in Experiment 1, in a less clear-cut pattern of the statistical analyses. Although the effects of the violation of DFA were larger in the oddball blocks, we nonetheless regard these results as corroborating evidence for effects of phonotactic ungrammaticality on early auditory processing.
Our study addressed the question whether, and to which extent, phonotactic constraints as part of the abstract and implicit phonological knowledge are involved in automatic, preattentive speech processing. Our results support the assumption that abstract phonotactic information is activated and applied in preattentive speech processing. We provided evidence that a phonotactic violation contained in a syllable serving as deviant causes a modulation of MMN. The data of Experiment 1 showed an enhanced MMN amplitude in case a phonotactic violation is present in the deviant in addition to other deviating features. Moreover, Experiment 2 provided evidence that a phonotactic violation elicits MMN even if acoustical or segment-related phonological differences are absent at the point in time when the violation occurs.
We interpreted the enhanced MMN amplitude to the phonotactically ill-formed deviant in Experiment 1 as well as the second negativity to the ill-formed deviant in Experiment 2 as the result of an comparison between the deviant and the standard representations with regard to the abstract feature of phonotactic well-formedness. The data of the additional 100% and 50% conditions of both experiments provided a measure for assessing preattentive processing of a phonotactic violation without relying on MMN. The phonotactically ill-formed syllable *[ɛx] elicited a negative-going deflection between 200 and 300 msec with a slightly higher amplitude than the respective components in the ERPs elicited by the three well-formed syllables.
Latency Differences between Experiment 1 and Experiment 2
The data sets of Experiment 1 and Experiment 2 differ with regard to the latency of the phonotactic MMN. In the data set of Experiment 1, the grand-averaged MMN to *[ɛx] is maximal at 214 msec after stimulus onset at FZ. Because both the acoustical deviance and the information about the phonotactic ill-formedness are not available until the fricative onset, the genuine MMN latency amounts to 114 msec. In contrast in the data set of Experiment 2, the respective peak occurred at 288 msec after stimulus onset at FZ. This difference may largely be the result of an overlap of two separate MMN responses in Experiment 1: one reflecting the acoustical and phonemic deviance due to the fricative change, the other occurring as a response to the violation of DFA.
Because of the inherent acoustical variability of the unmanipulated stimulus material acoustical transitions specific to each fricative were already present in the signal before the defined vowel offset. For this reason, a relatively early MMN response seems likely. The phonotactic violation, however, cannot be processed until the fricative is identified by means of a segmental phonological analysis. The assumption of such a two-phased response due to the change of fricative in Experiment 1 is also supported by the morphology of the grand-averaged difference waves of *[ɛx] at parietal electrode positions (see Figure 4).
At Pz, the difference wave of *[ɛx] shows two negative peaks: an early maximum of −0.829 μV at 186 msec after stimulus onset (86 msec after fricative onset) and a later maximum of −0.679 μV at 258 msec after stimulus onset (158 msec after the phonotactic violation occurred). We regard this second negative peak as an equivalent of the negativity elicited by the deviant *[ɛx] in the data set of Experiment 2 at 288 msec. The latency of the MMN elicited by the phonotactic violation allows us to draw conclusions about the time which is necessary to completely analyze the phonological features of a sound: In our experiment, up to approximately 150 msec are available for the identification of the velar fricative until the phonotactic constraint is evaluated.
However, even if we take the above-described overlap in Experiment 1 into account, a latency difference between the data of Experiments 1 and 2 remains. As an explanation, we propose the following considerations: Activating and using abstract phonotactic knowledge take a certain amount of time. For a sound occurring in the standard of an oddball protocol, the repetition of the standard activates phonotactic knowledge that concerns possible segments following that sound. Phonotactic knowledge activated by the standard of the oddball protocol may be applied immediately to the deviant.
Applied to the experiments reported in the present study, this leads to the following implications: In Experiment 1 with standard [ɛ∫] and deviant *[ɛx], the occurrence of [ɛ] in the standard activates knowledge about possible following sounds. When *[ɛx] occurs as deviant, this already activated knowledge can be applied immediately to determine that *[ɛx] is ill-formed. In Experiment 2 with standard [ɔx] and deviant *[ɛx], phonotactic knowledge about sounds following [ɔ] would be activated but no phonotactic knowledge about sounds following [ɛ], which does not occur in the standard. When the deviant *[ɛx] is encountered, the phonotactic knowledge for determining its well-formedness is not activated and so would require additional time to be activated and applied.
Problems of Natural Spoken Stimulus Material
Using naturally spoken material involves important advantages compared with using synthetic material. The risk of getting incoherent brain responses because of misleading properties of the signal, inadequate technical manipulations, or due to a basic unnaturalness of the signal is quite small when using natural speech material (e.g., Ikeda, Hayashi, Hashimoto, Otomo, & Kanno, 2002; Jaramillo et al., 2001). However, we had to consider the following problem when using naturally spoken stimuli: Our design requires the articulation of a sound sequence that is ungrammatical in German. This underlying paradox of combining articulatory naturalness with grammatically impossible linguistic phenomena cannot be fundamentally resolved. The chosen venue of stimulus design, however, constitutes the best methodological option, in our view.
In the present MMN study, we investigated whether and to which extent language-specific phonotactic knowledge is available and activated in preattentive speech processing. To this end, we focused on the DFA, a phonotactic constraint in German grammar. Concretely, we targeted the question whether and to what extent a violation of DFA affects preattentive speech processing by presenting phonotactically ill-formed stimuli. Our data indicate that the violation of DFA actually influences the process of auditory deviance detection by eliciting an additional MMN component in the ERP. These results suggest that phonotactic knowledge stored in long-term memory is activated and applied even in preattentive speech processing.
The authors are grateful to Ursula Kirmse and Anja Roye for technical help, to Mira Müller for technical help and for proofreading, and to four anonymous reviewers for very helpful comments on an earlier version of this article. We have compiled supplementary material for this study, which is available from the authors at www.hsu-hh.de/epu.
This work was supported by the DFG SPP 1234 grant JA1009/10-1 to T. J. and H. T.
Reprint requests should be sent to Johanna Steinberg, BioCog-Cognitive & Biological Psychology, Institute of Psychology I, University of Leipzig, Seeburgstrasse 14-20, 04103 Leipzig, Germany, or via e-mail: firstname.lastname@example.org.
The asterisk (*) represents grammatical ill-formedness/violation of grammatical principles such as DFA.