The human brain stores an immense repertoire of linguistic symbols (morphemes, words) and combines them into a virtually unlimited set of well-formed strings (phrases, sentences) that serve as efficient communicative tools. Communication is hampered, however, if strings include meaningless items (e.g., “pseudomorphemes”), or if the rules for combining string elements are violated. Prior research suggests that, when participants attentively process sentences in a linguistic task, syntactic processing can occur quite early, but lexicosemantic processing, or any interaction involving this factor, is manifest later in time (ca. 400 msec or later). In contrast, recent evidence from passive speech perception paradigms suggests early processing of both combinatorial (morphosyntactic) and storage-related (lexicosemantic) properties. A crucial question is whether these parallel processes might also interact early in processing. Using ERPs in an orthogonal design, we presented spoken word strings to participants while they were distracted from incoming speech to obtain information about automatic language processing mechanisms unaffected by task-related strategies. Stimuli were either (1) well-formed miniconstructions (short pronoun–verb sentences), (2) “unstored” strings containing a pseudomorpheme, (3) “ill-combined” strings violating subject–verb agreement rules, or (4) double violations including both types of errors. We found that by 70–210 msec after the onset of the phrase-final syllable that disambiguated the strings, interactions of lexicosemantic and morphosyntactic deviance were evident in the ERPs. These results argue against serial processing of lexical storage, morphosyntactic combination and their interaction, and in favor of early, simultaneous, and interactive processing of symbols and their combinatorial structures.
The human brain sets itself apart by its great ability to store a huge vocabulary of coherent pieces of information and by its capacity to combine these stored units flexibly to yield virtually endless sets of possible sequences. Such stored units and sequences are manifest in different domains, including those of overt body actions (e.g., reach and grasp) and of meaningful linguistic symbols (i.e., words functioning as parts of subject and predicate; Jackendoff, 2011; Rizzolatti & Arbib, 1998). An action sequence or symbol string is therefore held together by two fundamentally different mechanisms, respectively underpinning the retrieval of stored elements (i.e., basic action schemas; morphemes and words) and the combination of these elements. In linguistics, the traditional separation between a “lexicon” and a “grammar”/“morphosyntax” aims at capturing this important distinction, however with the obvious limitation that combinatorial mechanisms are at work even within “lexical” items (especially morphological and phonotactic rules) and a relevant subset of complex multiword utterances can be seen as stored, unitary whole forms (Langacker, 2008; Goldberg & Jackendoff, 2004). Pinker's “words and rules” framework takes this fact into account (Pinker, 1997), and in the construction grammar framework, a related distinction is sometimes offered between whole-form-stored unitary constructions, including whole idioms as well as elementary meaningful units, such as single words and even morphemes, and more flexible combinatorial schemas, for example, specific argument structure constructions (such as the “double object construction”; Goldberg, 2006). As the lexicon versus morphosyntax division lacks precision when it comes to the mechanisms of symbol retrieval versus combination, we prefer the latter distinction. These terms can also cover the analogous mechanisms of action schema retrieval and combination in the domain of general action (Pulvermüller, 2014; Jackendoff, 2007, 2011; Pulvermüller & Fadiga, 2010).
Because of their fundamental role in human language and action, a main strand of current cognitive and brain research focuses on the theoretical and experimental study of the retrieval of stored meaningful symbols (i.e., lexicosemantic units) and their combination. In much current work, this has led to investigations comparing the processing of words to that of meaningless novel “pseudowords” and that of grammatically correct against incorrect strings. Because language comprehension is a rapid process and symbol/lexical access and combinatorial/syntactic processes are extremely fast and emerge in close temporal succession, if not in parallel, precise measurement techniques are required to pin down their relative time course. Fortunately, a range of brain correlates exists that seem to distinguish lexicosemantic access and combinatorial morphosyntactic processing.
ERP Research on Stored Symbol Retrieval
Access to stored lexical items was first found to be reflected in the N400 ERP/F, whose earliest precursors sometimes yield significant neurophysiological responses already as early as at 250 msec after the onset of written meaningful words versus meaningless pseudowords. Meaningful familiar spoken and written words elicit smaller N400 responses compared with meaningless pseudowords (Holcomb & Neville, 1990), and the familiarity or “lexical frequency” of words is also inversely correlated with N400 size (Kutas & Federmeier, 2011). Earlier (<200 msec) effects of lexical access had been discovered outside the language modality, especially in face processing (Schweinberger, Pickering, Jentzsch, Burton, & Kaufmann, 2002; Bentin, Allison, Puce, Perez, & McCarthy, 1996), and linguistic studies confirmed consistent neurophysiological differences between words and pseudowords (Dien, 2009; Hinojosa, Martı́, & Rubia, 2001; Hinojosa, Martin-Loeches, et al., 2001; Pulvermüller et al., 2001; Rudell, 1991), between low- and high-frequency words (Hauk & Pulvermüller, 2004; Sereno, Rayner, & Posner, 1998), and between words of different linguistic and semantic types (Shtyrov, Butorina, Nikolaeva, & Stroganova, 2014; Kiefer, Sim, Herrnberger, Grothe, & Hoenig, 2008; Pulvermüller, Shtyrov, & Ilmoniemi, 2005; Pulvermüller, Lutzenberger, & Birbaumer, 1995). These results provide strong evidence for early, “pre-N400,” brain responses reflecting stored form access and meaning processing (Dien, 2009; Pulvermüller, Shtyrov, & Hauk, 2009). It has been suggested that the absence of early-latency indices of lexicosemantic (and morphosyntactic) processing in some previous studies may, in part, be related to the fact that sentences and critical words substantially vary with regard to their physical and psycholinguistic properties and to their word recognition dynamics in the spoken language modality, so that the unavoidable jitter in the brain responses brought about by such variance in the stimuli cancels short-lived, early, and focal responses, but not late long-lasting and widely distributed ones (Penolazzi, Hauk, & Pulvermüller, 2007).
ERP Research on Combinatorial Processing
An early ERP, emerging as early as 80–200 msec after appearance of the critical word, indicates whether this critical word fits into its syntactic sentence context or not (Hasting & Kotz, 2008; Pulvermüller & Shtyrov, 2003; Shtyrov, Pulvermüller, Näätänen, & Ilmoniemi, 2003; Deutsch & Bentin, 2001; Friederici, Hahne, & Mecklinger, 1996; Neville, Nicol, Barss, Forster, & Garrett, 1991). This ERP, first called the “N125,” was later relabeled to “early left anterior negativity” (ELAN) to highlight its earliness and the fact that its cortical topography typically shows a maximum at left anterior recording electrodes. This early response was first found to reflect violations of “phrase structure rules,” for example, in sentences such as “The man admired Don's *of sketch the landscape” (Neville et al., 1991) or “Das Eis wurde *im gegessen” (“The ice cream was *in-the eaten”; Hahne & Friederici, 2002), and is considered to be related to a similar, slightly later response, the “left anterior negativity” (LAN; 300–500 msec), which shows a broader sensitivity to various types of syntactic violations (“The plane took *we to paradise and back”; Coulson, King, & Kutas, 1998). The ELAN emerges in the same time window as the N400, known to reflect semantic combinatorial violations (“He drinks his tea with sugar and *socks”; Kutas & Hillyard, 1984). At even longer latencies (500–700 msec), both syntactic and semantic combinatorial violations seem to be reflected in the P600 response (Gunter, Friederici, & Schriefers, 2000). It is in this late interval that the first interaction effects between semantic and syntactic processing indicators have frequently been reported (“Das Gewitter wurde im *gebügelt”–“The thunderstorm was in *ironed”; Friederici, 2002), thus motivating the postulate that syntax is processed in a modular fashion first and only later interacts with meaning processing (see, e.g., Friederici, 2002; Gunter et al., 2000). Furthermore, the presence of an early syntactic effect (ELAN) and concomitant absence of a semantic N400 in case of double syntactic–semantic violations was interpreted as further support for serial–modular models of sentence processing (Friederici, 2002).
However, other researchers have found syntax–semantics interaction effects earlier than this, in the N400 time range (Guajardo & Wicha, 2014; Palolahti, Leino, Jokela, Kopra, & Paavilainen, 2005; Hagoort, 2003). The neurocognitive argument about serial and modular “syntax-first” processing is further complicated by recent methodological concerns regarding the ELAN response, which may arguably be an artifact of spill-over and offset effects from context (Steinhauer & Drury, 2012), thus calling into question the main syntactic index preceding the (semantic) N400 and (syntactic) LAN. On the other hand, the late effects (including both P600 and N400/LAN), for which there is strong evidence across studies, are in danger of reflecting not the first symbol access and combinatorial processes but rather the secondary, potentially epiphenomenal late processes following the understanding of linguistic symbols and their context. Much psycholinguistic evidence supports early, almost instantaneous word comprehension and context integration (Marslen-Wilson & Tyler, 1975), and in single word comprehension, participants can already reliably press buttons to express semantic decisions at 400-msec latency, thus proving that brain responses at this latency occur at a postunderstanding stage (Hauk, Coutout, Holden, & Chen, 2012). Therefore, it is not clear whether brain indices of syntax and semantics at 400 msec and later can speak directly to the question of whole-form access and combinatorial processing.
Evidence for Early Effects of Linguistic Symbol Access and Combination: MMN Research
To investigate the earliest brain manifestations of stored form access and combination, as well as their relative time course and potential interaction, it is advantageous to use a brain response that has a history of successfully revealing early cognitive processes. The MMN (Näätänen, Paavilainen, Rinne, & Alho, 2007) has demonstrated this capability, showing sensitivity to differences between words and pseudowords, as well as well- and ill-formed word combinations (Pulvermüller & Shtyrov, 2006). For recording the MMN, word strings are typically presented as rare “deviant” stimuli against a background of frequently repeated “standard” stimuli. The deviants, but not the standards, elicit the MMN, whose amplitude and distribution reflect a range of stimulus features in addition to the perceived acoustic differences between standard and deviant stimuli and, importantly, the status of the deviants as familiar elements matching a memory representation (see Näätänen et al., 2007, for an extensive discussion on the neurophysiology of the MMN). It is important for our present context that the MMNs elicited by linguistic symbols differ in size (and sometimes polarity and distribution) between words and pseudowords (Frangos, Ritter, & Friedman, 2005; Pettigrew et al., 2004; Korpilahti, Krause, Holopainen, & Lang, 2001; Pulvermüller et al., 2001) and equally between linguistic symbols placed in syntactic and ungrammatical contexts (Hanna, Shtyrov, Williams, & Pulvermuller, 2016; Hanna et al., 2014; Pulvermüller, Shtyrov, Hasting, & Carlyon, 2008; Hasting, Kotz, & Friederici, 2007; Pulvermüller & Assadollahi, 2007; Pulvermüller & Shtyrov, 2003; Shtyrov et al., 2003). Crucially, these MMN responses reflecting symbol storage and combination occur at similar early latencies, ca. 50–250 msec after the input signals first allow for unique stimulus identification. This finding suggests early access both to stored linguistic representations and to the rules flexibly combining them. No study, however, has yet attempted to test both phenomena in a single experiment, and therefore, a simultaneous investigation of storage- and combination-related effects in the same subjects is desirable.
Notably, because of meticulous methodological precision borrowed from the tradition of psychophysical research (Carlyon, 2004)—in particular the use of orthogonal designs controlling for the influence of both context and critical MMN-eliciting stimuli—previously reported early “syntactic” and “lexical” MMN effects are not subject to concerns that have been raised against the ELAN and other very early language-related ERP components (see discussion above). MMN paradigms even allow the use of identical recordings of syllables for eliciting brain responses to well- and ill-combined strings. For example, Pulvermüller et al. (2001) used the critical syllables “kko”1 and “kku” presented, respectively, in the contexts of the syllables “pa” and “ta,” where they complete meaningful words, and the reverse combinations “pa-kku” and “ta-kko,” which are Finnish pseudowords, and found larger MMNs in meaningful word contexts. This orthogonality of context and critical stimuli to the investigated factor rules out issues previously claimed to confound traditional language responses, such as offset, context, and physical features of the critical stimuli. A further advantage of the MMN (apart from its earliness) is that it can be elicited without participants paying active attention to language stimuli or performing an overt task such as judgment of grammaticality or meaningfulness. Such tasks have frequently been applied when N400 and P600 responses were elicited, and it is therefore unclear to which degree these late components reflect task strategy-related processes or stored symbol and combinatorial processing per se. In contrast, paradigms where participants are distracted from linguistic stimuli elicit linguistic MMNs indexing stored symbol retrieval and combination may allow for conclusions on automatic language processes, or at least on processes that are independent from the participants' active attention to verbal materials. On the negative side, MMN designs require frequent repetitions of (the same or very similar) critical deviant stimuli (and even more frequently occurring standards), thus leading to a somewhat artificially repetitive use of language, which carries a risk that differences normally present in everyday language processing may fall victim to repetition effects. However, given that previous MMN paradigms yielded reliable and reproducible early linguistic effects and concordant results have been reported with more variable but still precisely controlled paradigms (MacGregor, Pulvermüller, van Casteren, & Shtyrov, 2012; Pulvermüller & Shtyrov, 2006), it appears to be a fruitful strategy to use this component for probing the earliest brain indices of linguistic symbol retrieval, combination, and the interaction of these factors. In addition, it cannot be denied that, to a degree, repetitive and “artificial” paradigms have paved the way to a better understanding of a range of cognitive processes, including memory (Fuster, 2015), attention (Näätänen et al., 2007; Mangun, 1995) and prediction (Schultz, 2008), and it is therefore not impossible that useful information can be obtained from such paradigms in the language domain as well.
Overview of the Present Study
Here, we used a variant of the classic oddball paradigm optimized to minimize the duration of complex experiments, the so-called “optimal” or “multifeature” paradigm (Pakarinen et al., 2009; Näätänen, Pakarinen, Rinne, & Takegata, 2004), to record deviant stimulus responses and MMNs to word and pseudoword/-morpheme combinations, which were either (1) well-formed phrases (such as “I jump”) or they included (2) a not-stored (or “unstored”) pseudoword and -morpheme (“I junt*”), (3) a combinatorial violation of syntactic agreement (“*He jump”), or (4) both (“*He junt*”; see also Table 1). In contrast to previous studies, we used a fully orthogonal design in which the same critical syllables appeared equally frequently in all four conditions, and all the items building up the contexts were also exactly balanced between conditions (see Table 1; Figure 1).
|.||Stimuli .||Storage .||Combination .|
|Well-formed||ich leide (I suffer), ich zeige (I show), wir schweigen (we keep silent), wir scheiden (we separate)||+||+|
|Unstored||ich schweide*, ich scheige*, wir leigen*, wir zeiden*||−||+|
|Ill-combined||*wir leide, *wir zeige, *ich schweigen, *ich scheiden||+||−|
|Double violation||*wir schweide, *wir scheige*, *ich leigen*, *ich zeiden*||−||−|
|.||Stimuli .||Storage .||Combination .|
|Well-formed||ich leide (I suffer), ich zeige (I show), wir schweigen (we keep silent), wir scheiden (we separate)||+||+|
|Unstored||ich schweide*, ich scheige*, wir leigen*, wir zeiden*||−||+|
|Ill-combined||*wir leide, *wir zeige, *ich schweigen, *ich scheiden||+||−|
|Double violation||*wir schweide, *wir scheige*, *ich leigen*, *ich zeiden*||−||−|
The stimuli were miniconstructions consisting of three syllables each: a pronoun, followed by a verb-initial syllable, and finally a verb-final syllable. The verb-final syllable was the “critical stimulus” (given in boldface), which, at the same point in time, disambiguated and specified the verb stem (or not-stored pseudostem) and the agreement of the verb-final inflectional affix (or combinatorial “ill-formedness”). Analogous English examples are provided in the Introduction.
On the basis of preexisting work, we expected early neurophysiological differences following critical syllable onset, with 50–250 msec latency between well-formed and unstored stimuli (“lexical” effect) and between well-formed and ill-combined strings (“morphosyntactic” effect). A serial modular-type model would predict interaction effects between storage and combination only at a longer latency, significantly later than the early storage-related and combinatorial effects, whereas simultaneous access and interactive processing accounts suggest a simultaneous interaction effect and divergence between well-formed, single-violation, and double violation conditions early-on.
Twenty-four (15 women; average age = 22.7 years, range = 19–35 years) right-handed (average laterality quotient = 87.4, range = 50–100, SE = 3.1; Oldfield, 1971) monolingual German native speakers with no history of neurological or psychiatric disease, normal hearing, and normal or corrected-to-normal vision participated in this study. Data from 17 participants (10 women; average age = 22.9 years, range = 19–35 years; average laterality quotient = 89, range = 50–100, SE = 3.4) were included in the final analysis (see EEG Data Processing). All participants gave written informed consent and were paid for their participation. The study was approved by the ethics committee of the Charité Universitätsmedizin Berlin, Campus Benjamin Franklin, and was conducted in compliance with the principles of human research.
As stimuli, we used 16 short sentences each consisting of a German pronouns, ich (meaning I) or wir (we), followed by one of four inflected German verbs, overtly marked either by the suffix -e for first person singular,2leide, zeige (suffer, show), or by the suffix -en for first person plural, schweigen, and scheiden (keep silent, separate), or by one of four phonotactically legal pseudowords consisting of the same inflectional suffixes attached to a meaningless “pseudomorpheme” not part of the German lexicon, leigen*, schweide*, zeiden*, scheige* (see also English examples in the Introduction, study overview section). These stimuli were chosen because, in typical pronunciation, the acoustic information about the voiced stop consonant in the middle of each word/pseudoword and the information about its inflectional suffix, which was either an [ǝ] or the [n] (to which the unvoiced syllable [ǝn] reduces in informal and fast discourse), become available at the same time. Note that, because of coarticulation effects, it is the earliest part of the formant transitions at the start of the last syllable that discloses both the identity of the stop consonant crucial for identifying the verb stem and that of the immediately adjacent and overlapping voiced vowel or nasal consonant critical for recognizing the inflectional affix. Therefore, this kind of words and pseudowords—with amalgamated stop consonant and adjacent minimal affix—offers a unique opportunity to study the brain responses to strings in which stored verb stems are contrasted with unstored pseudomorphemes whereas combinatorially regular miniconstructions (i.e., syntactically correct short sentences) are contrasted with morphosyntactically ill-formed strings containing agreement violations.
To avoid coarticulation effects between syllables that could reveal information about the identity of the word-final syllable already during the word-initial one, we produced all bisyllabic words and pseudomorphemes by cross-splicing and combining syllables spoken in isolation. We recorded multiple repetitions of the two pronouns and of the eight syllables lei, schwei, zei, lei, de, gen, ge, and den, each spoken in isolation by a female native German speaker; among these, only one token of each pronoun and syllable was selected. Note again that the syllables gen and den were pronounced in reduced form, as [gn] and [dn], as it is usual in connected speech, where the “schwa” sound [ǝ], which may appear in-between the consonants, is frequently omitted.
Isochronous perception of rhythmic speech is not related to regularity of the interval between syllable onsets, but to regularity of occurrence of a perceptual property immanent to syllable structure, the so-called “perceptual center” or p-center (Morton, Marcus, & Frankish, 1976; Allen, 1972). For this reason, and to match the rhythmic structure between our stimulus constructions, we aligned the first syllables of every (pseudo)verb according to the largest change in acoustic energy identifiable in the sound envelope; this resulted in a regular beat and isochronous presentation of stimulus phrases. A similar adjustment relative to the p-center was chosen for the first syllable of the phrase, that is, the pronoun, although the same pronoun was used throughout each block of the experiment. The phrase-final syllables were aligned by onset as their onsets and rising flanks of their acoustic envelopes were similar, so that no break in perceived rhythms was caused. Careful stimulus selection and slight adjustment using Adobe Audition CS5.5 software (Adobe Systems Inc., San Jose, CA) ascertained that corresponding items were matched pairwise for length (ich-wir were 365 msec long, lei-schwei 310 msec after p-center, schei-zei 366 msec after p-center, de-gen 275 msec, den-ge 230 msec), average sound pressure level, and fundamental frequency (F0). The matching for the first syllables in the (pseudo)verbs was relative to the p-center (see Figure 1).
In a fully balanced orthogonal design, 16 phrases each including a pronoun and an inflected verb varied in their combinatorial (well- vs. ill-combined) and storage-related (stored vs. unstored) status, resulting in the four conditions (see Table 1), which we call the “well-formed,” “unstored,” “ill-combined,” and “double violation” conditions. Note once again that each of the four conditions included all first words (pronouns) and first and second syllables of the second word in a balanced fashion so that any acoustic, physical, and psycholinguistic differences between contexts or critical syllables could not explain between-condition differences. In particular, the four construction-final syllables de, gen, ge, and den, to which neurophysiological responses were analyzed, appeared in each of the four conditions the same number of times. Therefore, the four condition and stimulus types varied with regard to the features [+/− symbolically stored] and [+/− regularly combined], that is, in their morphosyntactic agreement between the personal pronoun and the inflectional suffix (factor Combination) and in their inclusion of lexical elements (factor Storage), but not with regard to critical or context stimulus materials. As the voiced stop consonants [d] and [g] orthogonally distinguished stem morphemes from pseudomorphemes, the use of two different inflectional suffixes, [ǝ] and [n], equally contributed to defining combinatorial acceptability.
Each of our 16 two-word miniconstructions was presented 80 times during the experiment to allow for the calculation of ERPs with high SNRs. Each pronoun was considered a standard stimulus and the inflected verb or pseudoverb a deviant stimulus of a multifeature MMN design (Pakarinen et al., 2009; Näätänen et al., 2004). There were two blocks, each with one of the pronouns as standard stimulus. After each standard stimulus, one of the eight deviant words or pseudowords followed in random order. The SOA between subsequent standard and deviant stimulus words was ca. 650 msec. To ascertain that a constant rhythm was perceived by participants, standard and deviant words were timed relative to the p-center of their first syllable (see explanation in the “Stimuli” section above). The onset of the word and phrase disambiguating final syllable followed 650 msec upon the p-center of the deviant stimulus word's first syllable. Note that all ERPs were measured relative to the onset of the critical syllables [dǝ], [gǝ], [dn], and [gn]. The delay was therefore 650 msec between syllable p-centers. The SOA of subsequent trials was 1950 msec. Blocks lasted 20 min each, and block order was counterbalanced across participants; the sequence of stimuli in each block was pseudorandomized, not allowing direct succession of two copies of the same deviant.
Stimuli were presented through headphones at a comfortable hearing level. Participants were seated in a dimly lit, electrically shielded, and acoustically isolated chamber. They watched a silent movie throughout the duration of the experiment and were instructed to ignore the acoustic stimuli and focus their attention on the movie.
The EEG was recorded (0.1–250 Hz band pass, 1000 Hz sampling rate) through a 128-channel EEG setup (BrainProducts, Gilching, Germany) using active electrodes mounted in an extended 10–05 system specific cap and a reference electrode on the tip of the nose of the participant. Three electrodes were mounted at the left supraorbital and infraorbital margins and the right outer canthus, respectively, to record the horizontal and vertical EOG.
EEG Data Processing
Data were down-sampled offline to 200 Hz. Signals from EOG electrodes were converted offline to bipolar vertical and horizontal EOG signals. Channels containing no signal or substantial artifacts were rejected after visual inspection in 9 of the 24 subjects (between one and seven channels per subject). Subsequently, an offline 30 Hz lowpass filter was applied. Epochs time-locked to critical, construction-final, syllable onset (time 0), starting 100 msec before and ending 600 msec after it, were then obtained. Independent component analysis was performed on zero-mean epoched data, decomposing them into 35 components. Components correlating with the EOG signal (r ≥ .3 or r ≤ −.3) were rejected (Hanna & Pulvermüller, 2014; Hanna et al., 2014; Groppe, Makeig, & Kutas, 2009). An additional analysis was run with a threshold of r ≤ ±.5, and producing very similar results. Those bad EEG channels that had been rejected were then spherically interpolated. The processing steps to this point were carried out with the EEGLAB 11.5.4b suite (Delorme & Makeig, 2004) in Matlab R2012b programming environment (The MathWorks, Natick, MA).
The following steps were carried out in the SPM8 suite (Litvak et al., 2011) for Matlab R2012b. The mean voltage value in the interval from −100 to 0 was subtracted from the whole epoch. Epochs with voltage variation of ±80 μV from 0 were then rejected, and only subjects with a rejection rate below 20% (Lopez-Calderon & Luck, 2014; Luck, 2014) were included in the final analysis (n = 17). The reason for the rejection of four data sets lay in poor quality of the recordings (because of line noise on signals and/or too low impedances during the recordings); three further rejections were due to excessive blink rates. The average trial rejection rate of the remaining 17 data sets included in the final analysis was ca. 6% with an even distribution across conditions. ERPs to the critical deviant stimuli (final syllables) were then obtained averaging trials over each condition for each subject. The ERP to the standard was obtained by averaging together all the trials in response to both standard stimuli. The MMNs were obtained by subtracting the averaged standard response from the ERPs to the critical syllable of each the four deviant conditions. Both the deviant responses and the MMNs were used for statistical analysis.
All statistical analyses were performed using Statistica 12 software (Statsoft, Tulsa, OK). To target the well-known frontocentral maximum of auditory and speech-evoked ERP components (Kutas & Federmeier, 2011; Pulvermüller & Shtyrov, 2006; Näätänen, 1990), 13 left frontocentral electrodes (AF3, AFF1h, AFF5h, F3, F1, F5, FFC1h, FFC3h, FFC5h, FC1, FC3, FCC1h, FCC3h) and the 13 homotopic ones on the right were chosen. Additional analyses were performed also including the midline electrodes, which confirmed the reported results unless mentioned otherwise. To estimate the time course of neurophysiological activity elicited by the critical syllable stimuli the root mean square (RMS) of the grand-averaged ERP signal across these channels and across all deviant conditions was calculated (Figure 2). The RMS showed one large early deflection (ca. 50–250 msec) and one smaller and later one (ca 300–500 msec). The early time window for analysis (70–210 msec) was identified as the time range comprising 70% of the area under the largest deflection of the RMS curve. As a double peak emerged in this early time window (see Figure 2), this window was divided into two halves to allow for more fine-grained temporal analysis. A second analysis was performed on the conventional N400 window, 300–500 msec; also this second window was split into smaller parts, in this case of 50 msec width each, for fine-grained temporal analysis. Average voltage values were calculated for each time window, hemisphere, and condition for each subject and submitted to two separate repeated-measures ANOVAs with the factors Time (2 or 4 levels) × Laterality (2) × Combination (2) × Storage (2). Significant interactions were further investigated using post hoc comparisons with the Bonferroni-corrected least significant difference t statistic, which is justified in case of absence of sphericity violations in the results (Hsu, 1996; Scheirs, 1992; Mitzel & Games, 1981). ANOVAs were carried out on both deviant responses and on MMN responses.
Early Time Window
As Figure 3A shows, a positive frontocentral ERP component dominated the time interval from ca. 50 to 250 msec, which was elicited by the construction-final syllable. This early positivity corresponded to the RMS response in the same time window (Figure 2), whose double peak was well captured by the time window between 70 and 210 msec. A similar positive ERP was elicited by the standard stimuli. The subtraction of the standard response from the deviant responses yielded MMNs for all four conditions, which were also comprised in the 70–210 msec time windows (Figure 3B). Whereas the positivity elicited by the critical deviant stimuli (final syllables) showed a symmetrical frontocentral topography, the ERP to the standard was right-lateralized (Figure 3A) and thus the MMN topography was right-lateralized. Within the first half (70–140) of the early time window (70–210 msec), deviant ERPs and MMNs to well-formed constructions tended to be less positive- and more negative-going than those to both single violation conditions and double violations. In the second half (140–210 msec), the double violation condition appeared to elicit a more positive-going (less negative) response than all other conditions.
Statistical analysis on ERPs confirmed significant early differences between conditions. A significant interaction effect of the factors Time, Storage, and Combination (F(1, 16) = 6.59, p = .02, ηp2 = 0.29; Figure 4A) showed a nonadditive influence of these factors in the 70–210 window. The Bonferroni test (corrected for 16 comparisons) confirmed that the response to the well-formed stimuli was significantly different (more negative-going) from the ones to the unstored (p = .018) and double violation (p = .006) conditions in the first half (70–140 msec) of the early time window. In contrast, the second half-window (140–210 msec) showed less negative-going ERPs to the double violation condition as compared with those to well-formed (p < .001), unstored (p = .008) and ill-combined (p = .02) strings (Figure 4A and B). Comparing condition-specific differences in ERP amplitudes between the first and second halves of the early window revealed a significant increase only for the single-violation conditions (unstored: p < .001; ill-combined: p = .004; see Figure 4A). From the same analysis emerged a significant interaction of Time and Laterality (F(1, 16) = 6.24, p = .023, ηp2 = 0.28). Repeating the same analysis without the Laterality factor and including the midline electrodes confirmed the interaction of Time, Storage, and Combination.
MMN responses obtained by subtracting ERPs to the standard stimuli from the critical deviant syllable ERPs confirmed the Time × Storage × Combination interaction, resulting in the same statistical results as the deviant ERP analysis (F(1, 16) = 6.59, p = .02, ηp2 = 0.29), because the same standard response had been subtracted from all four deviant conditions. Post hoc tests also confirmed the differences revealed by the deviant responses. Likewise, the interaction of the factors Time and Laterality was confirmed (F(1, 16) = 6.24, p = .023, ηp2 = 0.28), and a significant main effect of Time (F(1, 16) = 24.9, p < .001, ηp2 = 0.6) emerged because of larger MMNs in the second subwindow. Note that the Time × Storage × Combination and the Time × Laterality interactions were present in the deviant responses already; therefore, these interaction effects can be attributed to the deviant responses.
Late Time Window
The positive deflection elicited by the critical deviant syllables was followed by a negative ERP ranging from ca. 300 to 500 msec (Figure 3A), which corresponded to the second deflection of the RMS curve (Figure 2). In this time range typical for the N400, single violation conditions tended to show more “less positive-”/more negative-going ERPs/MMN. However, the ANOVA performed on the data from the late interval following the critical construction-final syllable did not confirm any between-condition differences.
In an orthogonal design balancing out any additive effects of the contexts and the critical disambiguating syllables, we found early MMN brain responses indexing processing differences between stored words versus “unstored” pseudowords, differences between grammatical phrases and strings violating combinatorial morphosyntactic agreement rules, and, crucially, an interaction effect of these storage- and combinatorially related properties. This interaction between Storage and Combination emerged between 70 and 210 msec after critical disambiguating syllable onset, where MMNs to well-formed phrases were larger than those to double violation strings, but unstored and asyntactic “single violation” conditions produced initially small MMNs, which significantly increased over time and paralleled those of well-formed phrases toward the end of the early time window. The differential dynamics across the four conditions were manifest in a significant interaction of the factors Storage, Combination, and Time. These results indicate that the brain computes combinatorially flexible parts of constructions and whole form-stored ones at similarly early latencies and within (one or more) interactive system(s). Phrased in more traditional terms, our results suggest that “lexical/symbol access” and “syntactic rule application” run in parallel, interacting with each other within the first ca. 200 msec after the onset of the disambiguating syllable. An anterior negativity dominating the time interval around 400 msec failed to reveal significant differences between conditions in the present “nonattend” task. The early interaction involving Storage and Combination factors may raise questions about psycholinguistic models claiming that modular syntax processing emerges first and interacts with other mechanisms only later. Below, we will explain and discuss the present results in context of previous research in the neurophysiology of language, also highlighting their potential implications for linguistic and brain theory.
Relationship to Preexisting Research
Our study employed a relatively new method for recording the MMN brain response, the so-called “optimal” or “multifeature” paradigm (Pakarinen et al., 2009; Näätänen et al., 2004). This paradigm is similar to a classic oddball experiment but allows investigation of brain responses to a range of deviant stimuli in one block, saving substantial time, especially if several different conditions (like our 16 different constructions) are in the focus. A number of previous language studies successfully used the multifeature paradigm to investigate early brain correlates of linguistic processes (Cappelle, Shtyrov, & Pulvermüller, 2010; Shtyrov, Kujala, & Pulvermüller, 2010; Garagnani, Shtyrov, & Pulvermüller, 2009). In the present setup, pronouns were used as standard stimuli alternating with deviant verb stimuli whose final and critical syllables simultaneously revealed information about the lexical status of the final word or pseudoword and its combinatorial syntactic agreement with the context pronoun. This information became manifest neurophysiologically in the first 210 msec after critical syllable onset.
Preexisting work had already shown neurophysiological dissociations between stored meaningful words and not-stored meaningless pseudowords at early latencies using a range of different brain responses (Hanna & Pulvermüller, 2014; MacGregor et al., 2012; Dien, 2009; Hinojosa, Martı́, Munoz, Casado, & Pozo, 2004; Shtyrov & Pulvermüller, 2002; Hinojosa, Martı́, et al., 2001; Pulvermüller et al., 2001). The latencies at which these “lexical” or storage-related effects occurred varied to a degree across study, ranging from “early” (150–200 msec; Pulvermüller et al., 2001) to “ultraearly” (even around 50 msec; MacGregor et al., 2012), but all of these effects tended to appear before the typical N400 time window of 250–500 msec where lexical and semantic effects have most frequently been reported3 (Kutas & Federmeier, 2011; Pylkkänen & Marantz, 2003). Similarly, early neurophysiological signs of grammaticality and syntactic violation have been reported in equally early time windows, in the so-called ELAN and syntactic MMN responses (see Introduction above; Hanna et al., 2014, 2016; Hasting & Kotz, 2008; Pulvermüller et al., 2008; Hasting et al., 2007; Pulvermüller & Assadollahi, 2007; Pulvermüller & Shtyrov, 2003; Shtyrov et al., 2003; Deutsch & Bentin, 2001; Friederici et al., 1996; Neville et al., 1991). The earliest of these syntactic responses were seen already within the first 100 msec after critical stimulus onset (Herrmann, Maess, Hasting, & Friederici, 2009; Pulvermüller et al., 2008); these combinatorial syntax-related responses were therefore of the same latency as the “ultraearly” lexicality responses related to storage. In the literature on the ELAN—the oldest and most established early syntax response—a range of methodological criticisms have been raised (see Introduction). For example, it was argued that the ELAN may be an artifact of poorly controlled baseline differences or differences in the stimulus-eliciting verbal materials (see Introduction; Steinhauer & Drury, 2012). Notably, the orthogonal designs, in which several of the syntactic MMN responses were recorded (Hanna et al., 2014; Pulvermüller & Shtyrov, 2006; Shtyrov et al., 2003), controlled both context and critical stimulus effects by using the same contexts and critical stimuli in different combinations—the technique we also adopted here—thus ruling out any additive context or stimulus effects. Syntactic MMNs consistently appeared substantially earlier than the most commonly investigated index of syntax processing, the P600 (Osterhout, McLaughlin, & Bersick, 1997).
Our present study is to the best of our knowledge the first to find early neurophysiological effects of lexicality and morphosyntax within the same study and the same participants, presented in response to minisentence stimuli. Therefore, these data are consistent with findings about early lexical and syntactic responses. The conclusion suggested (but not proven) by the literature, namely that the early lexical and syntax effects are strictly simultaneous, therefore gains support from the present data set. In a time window between 70 and 210 msec after critical stimulus onset, the brain response reflected the storage-related and combinatorial features of the strings. In addition, these factors interacted with each other, which is consistent with not only simultaneous but also interactive processing of storage-related (including lexical) and combinatorial (including morphosyntactic) information.
The polarity or the recorded ERPs to standard and deviant stimuli was positive relative to the prestimulus baseline. This may seem unexpected in an MMN experiment, where negative frontocentral ERPs may be expected. However, several auditory MMN studies of linguistic materials using SOAs comparable with those in our present study have previously reported similarly positive deviant and standard responses (Truckenbrodt, Steinberg, Jacobsen, & Jacobsen, 2014; Jacobsen, Steinberg, Truckenbrodt, & Jacobsen, 2013; Steinberg, Truckenbrodt, & Jacobsen, 2011). Such positive shifts of early ERPs seem common when stimulus words are presented close to each other, with only short delays in-between, as it is common in everyday language usage (Shtyrov, 2011).
Statistical analyses were performed on deviant responses and, in a second step, on MMNs calculated from the critical parts of the deviants and the averaged standard responses. Results were the same, as the same values were subtracted from all deviant condition responses, and this was so because all of our conditions used the same contexts in a balanced fashion (cf. Table 1). However, we should note that, for calculating each MMN subtraction, physically different deviant and standard stimuli were used, and therefore, the subtracted MMN responses were affected by the brain correlates of these physical stimulus differences, however, in an additive fashion not affecting the statistics. The topography of the main early brain response (see Figure 3B) was a frontocentral negativity with posterior polarity inversion, approximately at the level of the mastoids, which matched the typical topography of an MMN and thus suggests that a true MMN was recorded. The laterality of the MMN responses was mainly due to right laterality of the subtracted response to standard stimuli (Figure 3A); it will therefore not be interpreted. A further reason for not interpreting the laterality of the MMN in the present context is the absence of any statistically reliable interaction effect of this variable with the factors of interest, Storage and Combination.
Advantages and Disadvantages of the MMN Paradigm
As mentioned above, the four conditions—“well-formed,” “unstored,” “ill-combined,” and “double violation”—all used the same, physically identical critical syllables and all used the same context syllables (see Figure 1; Table 1). This multiple orthogonal design (both storage-related and combinatorial features were orthogonalized, see Methods) ruled out the possibility that additive effects of physical differences of contexts or critical syllables could explain any main effects or interactions obtained from the present experiment. Therefore, an attribution of between-condition differences to Storage- and Combination-related cognitive processing is justified. The fact that the MMN offers such precise control of stimulus context and features can certainly be seen as an advantage over designs where a range of natural sentences with and without pseudowords and/or syntactic rule violations are presented, but either context, critical stimuli, or both differ between conditions. However, this advantage comes at the price of restricting the investigation to the processing of a small number of stereotypically occurring items, as in our present case of 16 miniconstructions. This limitation of the stimulus space may be seen as unnatural, although, on the other hand, it has a strong tradition in acoustic psychophysics (Carlyon, 2004). Instead of forcefully arguing against such possibly “unnatural” designs, we should rather suggest the possibility that cognitive phenomena otherwise difficult to access properly can be explored using such psychophysically inspired methods. It is certainly possible that a repetitive MMN design abolishes processes normally occurring in language comprehension because of the frequent repetitions; however, positive effects even present under such “boring” repeat conditions cannot easily be ignored and might shed light on the automatic nature of some linguistic processes. In case of the present results, it appears that the early simultaneous effects of stored symbol access, combination, and their possible interaction are phenomena worthy of further exploration. It may be that MMN paradigms, where participants are actively distracted from the incoming sounds and words, are well suited to reveal aspects of automatic language processing that are shared between innovative and stereotypical language use.
Implication for Language Theories
On first view, the result of a significant interaction between lexical, morphosyntactic, and temporal factors at early latencies (70–210 msec) contradicts predictions of so-called “syntax first” models (Friederici, 2002; Garrett, 1989), which are commonly understood as predicting a primacy of syntax effects compared with most other psycholinguistic processes. However, it is evident that syntax and combinatorial information processing requires phonological and lexical information to operate on, and looking more closely at actual models and their predictions, word form access is typically seen as a processing step preceding syntactic analysis in strictly serial “syntax first” models of language comprehension (Figure 1 in Friederici, 2002). Our present, early near-simultaneous lexicality/storage and syntax/combination effects seem consistent with this position. It is noteworthy, however, that syntax effects were seen with subject–verb agreement violation, a type of syntactic process that has been claimed to follow upon earlier stages of “phrase structure building.” Here we confirm earlier observations that syntactic agreement, or lack thereof, is neurophysiologically reflected at the earliest stages of the speech comprehension and combinatorial analysis process (Hasting & Kotz, 2008; Shtyrov et al., 2003; Deutsch & Bentin, 2001) rather than only at later stages. Our results suggest that interactions between syntactic and lexical (or, more precisely, as explained in the Introduction, combinatorial and storage-related) processes occur early-on, a feature difficult to explain in the modular tradition.
Previous studies on the interaction between combinatorial processes of syntactic nature and semantic processing had reported differential responses and interactions around 400 and 600 msec latency, as discussed in the Introduction (Guajardo & Wicha, 2014; Palolahti et al., 2005; Hagoort, 2003; Gunter et al., 2000). Our present finding of an interaction between lexicality and syntax (storage vs. combination mechanisms) may be seen as indicating interactive processing very early on. Whether the lexical storage feature monitored by the present MMN paradigm also taps into semantic processing interacting with syntax early-on needs to be addressed by future research.
Rather than interpreting our present findings in terms of lexical and syntactic processing, we suggest that our results may reveal interacting brain mechanisms for symbol retrieval and combination. We believe that this is a promising strategy, because recent research has shown that many aspects of constructions that have traditionally been described by combinatorial rules may in fact be related to whole forms storage and, vice versa, some aspects of “lexical storage” may be underpinned by combination. For example, there are several reasons to assume that idiomatic constructions (such as “catch some sun”) are, at least in part, whole form-stored although they are open to a description in terms of combinatorial syntactic rules (Langacker, 1991), and there are, without any doubt, multiple combinatorial affixation processes at work in morphology, which produce complex words that may be seen as entries of the lexicon (Marslen-Wilson, Tyler, Waksler, & Older, 1994). As already mentioned in the Introduction, these and other facts render the traditional distinction between lexicon and syntax insufficient for capturing the relevant neurocognitive processes. Part of what is traditionally seen as the “lexicon” is not whole form-stored but instead dynamic and combinatorial, whereas some “syntactic strings” are stored as whole form, rather than combined (for discussion, see Goldberg, 2006; Pinker, 1997; Langacker, 1991). The storage/combination distinction therefore appears to discriminate between the relevant fundamental mechanisms, whereas both lexicon and syntax appear, to a degree, as “mixed bags.” One may still argue that, in our present context, the storage-related brain responses were elicited “only” by words rather than by longer stored constructions and the combinatorial ones by violations of rules of syntactic agreement between subject and verb, so that the two distinctions could be used interchangeably. However, in the wider context of brain language research, the storage/combination distinction captures those findings that seemingly contradict the lexicon/syntax opposition (Hanna et al., 2014; Leminen, Leminen, Kujala, & Shtyrov, 2013; Cappelle et al., 2010). Also, in the construction grammar framework, single words are seen as one type of construction on par with more syntactically complex ones, for example, argument structure constructions. The agreement construction “pronoun verb-affix” would be an elementary, but in this case, flexible member of this family (Goldberg, 2003, 2006). Although construction grammar subsumes storage-based and combinatorial mechanisms under the unifying key term of “constructions,” there is room and need for distinguishing storage-related and combinatorial mechanisms. The present research result is consistent with a key idea immanent to most construction grammar frameworks, namely that “lexicon” and “grammar” are not modularly separated entities but dynamically interact from the earliest stages of linguistic understanding.
In a fully orthogonal MMN design, brain indices of early interactive effects of linguistic symbol retrieval and combination were manifest. Miniconstructions (i.e., mini-sentences), including stored words and not-stored pseudowords, as well as agreement between subject and verb or violation thereof, showed a modulation of the MMN response recorded while subjects tried to ignore acoustic target stimuli. These interactive effects of lexical and syntactic processing were present ca. 70–210 msec after onset of the critical syllable that first allowed subjects to detect the lexical/syntactic status of the strings. Our results are consistent with early interactive retrieval of whole linguistic symbols and processing of their combinations and sit comfortably with current theories in the construction grammar framework.
We thank Verena Büscher, Luigi Grisoni, and Sarah von Saldern for their help at different stages of this work and three anonymous referees for their comments that greatly helped us in improving this paper. This work was supported by the Freie Universität Berlin, the Deutsche Forschungsgemeinschaft (Pu 97/16-1, Excellence Cluster Languages of Emotion), the Medical Research Council (UK) (MC_US_A060_0034, U1055.04.003.00001.01 to F. P.), the Engineering and Physical Sciences Research Council and Behavioural and Brain Sciences Research Council (UK) (BABEL grant, EP/J004561/1), and the Deutscher Akademischer Austauschdienst (fellowship to G. L.).
Reprint requests should be sent to Prof. Friedemann Pulvermüller or Guglielmo Lucchese, Brain Language Laboratory, Department of Philosophy and Humanities, WE4, Freie Universität Berlin, Habelschwerdter Allee 45, 14195 Berlin, Germany, or via e-mail: firstname.lastname@example.org, email@example.com.
We use double letters to indicate geminate consonants.
The -en suffix can also mark third person plural and infinitives in German, and the -e suffix can also index imperative; however, the context of first person pronouns biases their parsing toward the first person understanding.
We note that our latencies are calculated from the onset of the second, construction-disambiguating syllable but most previous N400 studies on spoken word processing calculated latencies from word onset. Calculated relative to word onset, our present results would have a latency of >600 msec. Note, however, that the first syllable of the phrase-final word did not include any information about lexical status or grammaticality of the constructions. This information was first manifest at the onset of the critical syllable and therefore the proper calculation of latencies should start at this “divergence point.”