An ongoing debate is whether and to what extent access to cortical representations is automatic or dependent on attentional processes. To address this, we modulated the level of attention on auditory input and recorded ERPs elicited by syllables completing acoustically matched words and pseudowords. Under nonattend conditions, the word-elicited response (peaking at ∼120 msec) was larger than that to pseudowords, confirming early activation of lexical memory traces. However, when attention was directed toward the auditory input, such word–pseudoword difference disappeared. Whereas responses to words seemed unchanged by attentional variation, early pseudoword responses were modulated significantly by attention. Later on, attention modulated a positive deflection at ∼230 msec and a second negativity at ∼370 msec for all stimuli. The data indicate that the earliest stages of word processing are not affected by attentional demands and may thus possess certain automaticity, with attention effects on lexical processing accumulating after 150–200 msec. We explain this by robustness of preexisting memory networks for words whose strong internal connections guarantee rapid full-scale activation irrespective of the attentional resources available. Conversely, the processing of pseudowords, which do not have such stimulus-specific cortical representations, appears to be strongly modulated by the availability of attentional resources, even at its earliest stages. Topography analysis and source reconstruction indicated that left peri-sylvian cortices mediate attention effects on memory trace activation.
It has been a matter of debate to what extent the distinctively human capacity to process language draws on attentional resources and whether it possesses a certain degree of automaticity. The ease with which we can perceive the entire complexity of incoming speech and seemingly instantaneously decode syntactic, morphological, semantic, and other information while doing something else at the same time prompted suggestions that linguistic activity may be performed by the human brain in a largely automatic fashion (Garrod & Pickering, 2004; Pickering & Garrod, 2004; Garrett, 1984; Fodor, 1983). Indeed, a body of studies pointed toward potential automaticity of access to different linguistic information types. These may include phonological analyses (Morsella & Miozzo, 2002), lexical access (Glaser & Glaser, 1989) and morphological parsing (Frost, Deutsch, Gilboa, Tannenbaum, & Marslen-Wilson, 2000). Furthermore, some automaticity has been posited even for “higher-level” language functions, such as semantic and syntactic processing (Menning et al., 2005; Hahne & Friederici, 1999) up to the level of dialogue (Pickering & Garrod, 2004). Still, the currently available neuroscientific data do not allow for firm conclusions on language automaticity. This is hindered, at least in part, by methodological issues. Typically, in language experiments, human subjects are asked to attend to presented words or sentences (i.e., listen or read). Often, the task is to assess properties of the stimulus material (e.g., familiar/unfamiliar, correct/incorrect) or even perform a specific linguistic task (e.g., lexical decision, grammar judgment). When such continuous active attention on the input is required, the automaticity of language processing cannot be established. Further still, in such tasks, the extent to which the registered responses are influenced by correlates of attention rather than by the language-related activity per se also remains obscure. Attention-related phenomena have been shown to modulate a variety of brain-evoked responses for a substantial time after stimulus onset and to involve a number of brain structures including those close to, or overlapping with, the core language areas (see e.g., Woods, Alho, & Algazi, 1993; Woods, Knight, & Scabini, 1993; Alho, 1992; Picton & Hillyard, 1974; see e.g., Hugdahl, Thomsen, Ersland, Rimol, & Niemi, 2003; Yamasaki, LaBar, & McCarthy, 2002; Yantis et al., 2002; Hugdahl et al., 2000; Escera, Alho, Winkler, & Näätänen, 1998; Tiitinen, May, & Näätänen, 1997). It is also likely that, in linguistic tasks, subjects pay more attention to unusual or incorrect stimuli (pseudowords, nonwords, nonsense sentences, and semantic or grammatical violations are among the most commonly used in language experiments) as they make futile attempts at understanding them, and that they may apply different strategies to process proper and malformed items (Pulvermüller & Shtyrov, 2006). These different stimulus-type-specific strategies and attention variation may find their reflection in the event-related measures, overlapping with true language-related activity.
To address the issue of language automaticity, it therefore seems essential to rule out such attention-related interferences. A number of successful attempts at this were done using the mismatch negativity (MMN) brain response. MMN is an evoked brain response elicited by rare (so-called deviant) acoustic stimuli occasionally presented in a sequence of frequent (standard) stimuli (Alho, 1995; Näätänen, 1995). Importantly, MMN can be elicited in the absence of the subject's attention to the auditory input (Schröger, 1996; Tiitinen, May, Reinikainen, & Näätänen, 1994). It therefore became considered to reflect the brain's automatic discrimination of changes in the auditory sensory input, and thus, to be a unique indicator of automatic cerebral processing of acoustic events (Näätänen, 1995). More recently, the MMN has been increasingly utilized for investigating the neural processing of speech and language (Pulvermüller & Shtyrov, 2006; Näätänen, 2001). First, increased left-hemispheric MMN responses to native language phonemes and syllables were found, suggesting that this specific MMN increase is produced by automatic activation of preexisting long-term memory traces for speech sounds (Shtyrov, Kujala, Palva, Ilmoniemi, & Näätänen, 2000; Alho et al., 1998; Dehaene-Lambertz, 1997; Näätänen et al., 1997). Furthermore, it was shown that MMN in response to individual words was greater than for comparable meaningless word-like (i.e., obeying phonological rules of the language) stimuli (Korpilahti, Krause, Holopainen, & Lang, 2001; Pulvermüller et al., 2001). In a series of studies on this topic, we presented subjects with sets of acoustically matched word and pseudoword stimuli and found an increased MMN response whenever the deviant stimulus was a meaningful word (Shtyrov, Pihko, & Pulvermüller, 2005; Pulvermüller, Shtyrov, Kujala, & Näätänen, 2004; Shtyrov & Pulvermüller, 2002b; Pulvermüller et al., 2001). This enhancement, typically peaking between 100 and 200 msec, took place when the subjects were distracted from the auditory input and were even specifically instructed to ignore it. The enhanced MMN response was explained by the activation of cortical memory traces for words realized as distributed strongly connected populations of neurons (Pulvermüller & Shtyrov, 2006). This lexical enhancement of the MMN was also demonstrated by other groups using various stimulus set-ups and languages (Endrass, Mohr, & Pulvermüller, 2004; Pettigrew et al., 2004; Sittiprapaporn, Chindaduangratn, Tervaniemi, & Khotchabhakdi, 2003; Kujala et al., 2002; Korpilahti et al., 2001), all of which indicated word-specific activations in the absence of focussed attention on the stimuli, thus suggesting automatic brain access to items in the mental lexicon. A number of experiments using this approach were able to provide a great level of detail on the spatio-temporal patterns of activation involved in early lexical access, semantic category-specific processing, and even in syntactic processes (Menning et al., 2005; Pulvermüller, Shtyrov, & Ilmoniemi, 2003, 2005; Pulvermüller et al., 2004; Shtyrov, Hauk, & Pulvermüller, 2004; Pulvermüller & Shtyrov, 2003; Shtyrov, Pulvermüller, Näätänen, & Ilmoniemi, 2003; Shtyrov & Pulvermüller, 2002a), all of which were shown to take place outside the focus of attention.
This previous research has been important in suggesting a great deal of independence of the cerebral language system from attentional control. It has, however, left open at least two major caveats. First, the issue of experimental control over attention gives rise to one of the most frequent criticisms of MMN research. It traditionally uses passive distraction paradigms, in which the subjects are engaged in watching a video, playing a computer game, or reading a book, while the auditory stimuli are delivered; the subjects are explicitly instructed to ignore the auditory input and to concentrate on the primary distracter task. Although the monotonous auditory stimulation repetitive in nature may not be difficult to ignore, such a passive task in itself does not allow to unambiguously delineate attention effects on early language processing (Carlyon, 2004). For this, a more systematic modulation of attentional resources allowed for the linguistic input would be more beneficial.
The second obvious drawback of the MMN research has been the inherently small selection of stimuli that can be used in the context of an MMN experiment. Typically, there is one standard and one deviant stimulus; in a few experimental blocks, the subject may therefore be exposed to a small number of such combinations due to time constraints. This approach has a number of benefits, for example, strict control over stimulus physical parameters or a possibility to delineate precisely activation processes at a single-word level (Pulvermüller & Shtyrov, 2006). It does, however, lead to inevitable difficulties in generalizing results obtained with very few tokens to the entire language. Replication of results using different stimuli as well as different languages and experimental set-ups helped to mitigate this concern to an extent. Still, a better way would be to use larger groups of stimuli, thus introducing certain stimulus variability typical of natural language settings. Using more auditory stimuli while still maintaining stringent control over their acoustic and psycholinguistic features is not, however, a trivial task, and has so far not been successfully implemented at a word level.
In the current study, we attempted to address the issue of interaction between attention and language systems using the MMN paradigm. We tried to improve on the previous research by introducing a small controlled group of linguistic stimuli. Crucially, we also varied attention on the linguistic input in a systematic fashion in order to delineate possible dependence of word processes on attention influence at different stages.
In line with recent neurocomputational models of language and attention, we expected a stronger attention dependence of brain responses to pseudowords (Garagnani, Wennekers, & Pulvermuller, 2008). Pseudowords have no memory representation as such but can be expected to partially activate a range of lexical representations belonging to the same cohort or phonological neighborhood (Marslen-Wilson, 1987). When attention is withdrawn from the linguistic input, this activation may be minimal; when, however, attentional resources are ample, the range of activated lexical neighbors may grow, leading to increased overall activity. On the other hand, early activation processes for words can be expected to be relatively immune to attentional modulation: As discrete memory circuits have been preformed for words, they may be robust enough to be automatically activated irrespective of the available attentional resources (Garagnani et al., 2008; Shtyrov, Osswald, & Pulvermüller, 2008).
We presented our subjects with an acoustically controlled group or minimally different word and pseudoword stimuli. We used a traditional nonattend condition as a benchmark and compared it with a condition where explicit attention to the stimulus material was required.
Seventeen healthy right-handed (handedness assessed according to Oldfield, 1971) native Finnish-speaking university students (10 women) with normal hearing and no record of neurological diseases were presented with spoken Finnish language stimuli in four separate experimental conditions (see Figure 1).
Complete stimulus design is presented in Figure 1. For stimulus presentation, we utilized the so-called optimum or multifeature paradigm (Kujala, Tervaniemi, & Schroger, 2007; Näätänen, Pakarinen, Rinne, & Takegata, 2004), which can accommodate up to five deviants in order to use experimental time efficiently. In this design, every second stimulus is recommended to be the standard sound, whereas the remaining stimulus locations are randomly distributed between the different types of deviants. We implemented this design in combination with a previously reported MMN paradigm (Pulvermüller et al., 2001), and used two Finnish syllables ([pa] and [ta]) as frequent standards in different sessions, whereas infrequent deviants were syllables [ko], [ku], [ke], and [ki] in all sessions. The syllable length was 100 msec and offset-to-onset interval was set at 275 msec, which resulted in various combinations of standards and deviants: pakko, *pakku, pakki, *pakke, in two of the conditions, and *takko, takku, takki, and *takke in two further conditions (double consonant in Finnish stands for a geminate stop signifying the extended silence before the [k] −275 msec in this case; pseudowords are marked with *). The cue for such a perceptual grouping of standard and deviant syllables was the stress pattern (always on the standard, making it the first syllable, as in Finnish the stress is always on the first syllable of a word); an additional, fifth type of deviant, nonspeech signal-correlated noise (SCN) further helped in resetting the perceptual grouping to start at the standard stimulus [pa]/[ta]. This way, we employed a small group of controlled stimuli with varying lexical properties (4 words, 4 pseudowords), an improvement on previous studies which usually used only one token of each type in a block.
Note that the stimulus combinations were minimally different in their acoustic features with the final consonant–vowel transition being sufficient to set the words apart from pseudowords as well as differentiate between different words and identify each item per se. This made sure that the time point when any possible lexical effects could commence was the same across all stimuli of interest. This is essential for analyzing auditory ERP recordings that are highly sensitive to temporal and other physical–acoustic features of the stimuli; in this design, we could time-lock responses to the same time point for all stimuli. These minimal word-final differences also meant that the stimuli within each block belonged to the same cohort, that is, had common lexical neighbors with similar word onsets. Effectively, the range of possible alternatives was restricted by the experimental settings to the set of stimulus pairs as no other completions were possible in each experimental block.
For stimulus production, we recorded multiple repetitions of these syllables uttered by a female native speaker of Finnish and, with great care, selected a combination of the six items whose vowels matched in their fundamental frequency (F0) as well as sound energy and overall duration. The sounds were normalized to have the same loudness by matching their root-mean-square (RMS) power; this was separately normalized for the standard stimuli [pa]/[ka] and for the deviant (“word-final”) syllables. The SCN deviant was produced by subjecting acoustic white noise to a fast Fourier transform filter, whose profile was modeled after the actual spoken stimuli; the filtered noise was then given a temporal envelope of a CV-syllable (Figure 1). For the analysis and production of the stimuli, we used the Cool Edit 2000 program (Syntrillium Software, Scottsdale, AZ).
Standard stimuli [ta] and [pa] were presented in different blocks in 50% of trials, whereas the deviant stimuli [ka], [ko], [ke], [ki], and SCN, which always alternated with the standards to create word/pseudoword perception, were presented in both types of blocks equiprobably, with 9% probability each. A further 5% were occupied by a modified standard stimulus which was reduced in sound intensity by 25% and served as an infrequent target in some of the blocks. Following the recommendations for the optimal multifeature sequence (Näätänen et al., 2004), each consecutive group of 10 stimuli contained all five deviants randomized in different order within each group (except groups with rare targets, in which the target replaced one of the five deviants with equal probability).
The above auditory stimulation was always accompanied by continuous visual stimulation using a self-selected video film. In two conditions (with standards [ta] and [pa] respectively), the subjects were instructed to ignore the auditory stimulation and concentrate on the visual stream; this represents the typical passive design as used in previous studies. In two further conditions (using the same stimuli), the instructions for this dual task were changed: The subjects were instructed to pay close attention to the stimulation and respond to the infrequent targets by pressing a button while still watching the video. The left index finger was used for responding in order to avoid movement-related artifacts in the language-dominant left hemisphere. This way we manipulated the subjects' level of attention to the critical auditory stimulus material. From the methodological prospective, this enabled a direct comparison between the conventional design and a more attention-demanding condition.
Subjects were seated in an electrically and acoustically shielded chamber. During the stimulation, electric activity of the subjects' brain was continuously recorded (band pass 0.01–100 Hz, sampling rate 500 Hz) with a 64-channel EEG set-up (Compumedics Neuroscan, El Paso, TX), using Ag/AgCl electrodes mounted in an extended 10–20 system electrode cap (Virtanen, Rinne, Ilmoniemi, & Näätänen, 1996). The nasion was used as the reference electrode during the recording. To control for eye movement artifacts, horizontal and vertical eye movements were recorded using two bipolar EOG electrodes.
EEG Data Processing
The recordings were later filtered off-line (band pass 1–20 Hz). ERPs were obtained by averaging epochs, which started 50 msec before the stimulus onset and ended 500 msec thereafter; the −50 to 0 msec interval was used as a baseline. Epochs with voltage variation exceeding 100 μV at any EEG channel or at either of the two EOG electrodes were discarded. The MMN was obtained by subtracting the response to the standard from that to the deviant stimulus. For each experimental subject, the averaged responses contained at least 100 accepted deviant trials in each condition. All responses were recalculated off-line against average reference for further analysis.
Frontal (F), fronto-central (FC), and central (C) electrodes were used for the analysis because electric MMN is usually reported to be most prominent at the fronto-central sites (Picton, Alain, Otten, Ritter, & Achim, 2000; Näätänen, 1999); this was also the case in the present study. Response amplitudes were calculated by determining mean values over a 40-msec window around the absolute peaks in the averaged ERPs. These amplitudes were further subjected to statistical analysis which was carried out using ANOVA, with lexicality (word vs. pseudoword) and attention (attend vs. ignore conditions) as within-subject factors; where necessary, additional topography factors were used (posterior–anterior and left–right dimensions). Huynh–Feldt sphericity corrections were applied where appropriate.
In an attempt to further delineate possible attention effects, their cortical sources were estimated using L2 minimum-norm current estimates (L2 MNE). The minimum-norm method provides a solution to the inverse problem of localizing neural activity in the brain from its external EEG or MEG recordings by revealing the unique constellation of active neuronal current elements that models the recorded potential or field distribution with the smallest amount of overall activity (Hämäläinen & Ilmoniemi, 1984, 1994; Ilmoniemi, 1993). The L2 MNE does this by determining the combination of sources that have a minimum total of squared current amplitudes. MNE solutions were calculated for grand-average responses rather than individual data; calculating solutions on grand-average data has a benefit of substantially reduced noise and therefore improved signal-to-noise ratio (SNR), which MNE solutions are highly sensitive to. A three-layer boundary element model with triangularized gray matter surface of a standardized brain (Montreal Neurological Institute) was used for computing MNE solutions in order to improve the reconstruction results: As EEG currents (unlike magnetic fields) travel and spread through multiple tissues, this makes simpler models (e.g., spherical single-shell) less suitable for EEG data. The solutions were restricted to the gray matter surface. CURRY 5.08 software (Compumedics Neuroscan, Hamburg, Germany) was used for these procedures.
All items elicited evoked responses (Figure 2), and MMNs were successfully calculated for the word and pseudoword stimuli in both attend and nonattend conditions (Figures 2,34–5). These negative responses had a biphasic nature: The first negative peak was present in all conditions with an average latency of 120 msec, whereas a second negative deflection, somewhat flattened in the nonattend blocks, had a clear peak at 370 msec in attended blocks. Between the two negative waves, a positive deflection was registered which was especially pronounced at 230 msec when attention to the stimuli was required. As it is typical of the MMN, all effects were maximal at fronto-central recording sites and exhibited a reversal of polarity at mastoid electrodes (Figure 2). We will first cover the effects for the passive and attend conditions separately, and then investigate the attention effects per se by combining the data from the two types of tasks:
In nonattend conditions, the first peak was more pronounced for words than for pseudoword deviants [F(1, 16) = 7.83, p < .013; see also Figure 3], replicating the previous findings of lexical MMN enhancement for words (Shtyrov et al., 2005; Shtyrov & Pulvermüller, 2002b; Pulvermüller et al., 2001). This word advantage enhancement was most pronounced in the left hemisphere [F(1, 16) = 11.64, p < .004].
This lexical enhancement of the MMN for words was extinct already some 250 msec later, in the second negative wave. In this time range (350–390 msec), where no main lexical effects were present, we found differential laterality for the two stimulus types [interaction Lexicality × Laterality significant: F(1, 16) = 3.59, p < .048]. Planned comparisons indicated that this was due to the response to pseudowords being marginally [F(1, 16) = 3.27, p < .09] left lateralized, with no such difference for words (p > .5) that appeared to have a more bilateral distribution.
Finally, the positive deflection between the two negative waves was not pronounced in the nonattend conditions and did not indicate any differences between words and pseudowords in its time range (∼210–250 msec).
When attention to the stimuli was required, the lexical advantage of words over pseudowords in the first negative peak at ∼120 msec was eliminated (p > .8; see Figure 3). This initial activation was not lateralized for any of the stimulus types.
At 370 msec in attended blocks, there was no significant word–pseudoword main effect. However, we found a significant Lexicality × Topography interaction [F(3, 48) = 3.29, ɛ = .478, p < .028]. This was due to a tendency for a more pronounced left laterality for pseudowords than for words [Lexicality × Laterality: F(1, 16) = 3.14, p < .096] and a tendency toward a pseudoword advantage (more negative-going response) at the left anterior sites, whereas the reverse was observed in the right hemisphere (planned comparison ns).
Again, the positive deflection, now clearly present in the attended blocks at ∼230 msec between the two negative waves, was not specific for any stimulus type and did not indicate any differences between words and pseudowords when attention was directed to the acoustic input.
Attention effect on the first negative peak was markedly different for the two types of stimuli (Figures 4 and 6): Whereas it was absent for the words taken separately (p > .25), the pseudoword responses were significantly increased by attention allocated to the acoustic input [F(1, 16) = 14.33, p < .002]. This differential influence of attention of the two lexical types was further substantiated as Lexicality × Attention interaction [F(1, 16) = 7.19, p < .0164]. Disentangling this interaction, planned comparisons confirmed an attention effect for pseudowords [F(1, 16) = 5.19, p < .038] but not for words (p > .612). There was also a significant interaction Lexicality × Attention × Topography [F(3, 48) = 3.03, ɛ = 0.499, p < .038] that was due to attention effects for pseudowords at left [F(1, 16) = 4.65, p < .046] but not right (p > .545) anterior recording sites, and absence of attention effects for words over both left and right hemispheres (p > .448).
Attention also strongly increased the positive deflection causing a prominent peak at ∼230 msec [F(1, 16) = 13.27, p < .003]. At this time, however, the effect was pronounced for both words [F(1, 16) = 15.52, p < .002] and pseudowords [F(1, 16) = 5.26, p < .036] alike. The attention effect was more pronounced in the left than in the right hemisphere [Attention × Laterality interaction: F(3, 48) = 3.00, ɛ = 0.500, p < .04; see also Figure 6]. Planned comparisons indicated that this was primarily due to attention effect being most pronounced at the left frontal location [F3, F(1, 16) = 22.43, p < .001].
The response increase caused by attention was also significant for the later negative deflection at ∼370 msec for all conditions [F(1, 16) = 5.81, p < .028]. At this latency, the attention-related response enhancement appeared to have a more global effect and did not show any specificity with respect to stimulus type or recording sites.
In an attempt to localize the attention effects, L2 MNE were performed on difference waves between attend and nonattend conditions. These generally confirmed the attention-driven increase in activation specific to pseudowords at the early stages of activation and potentially mediated by left peri-sylvian areas (Figure 7). MNE analysis also indicated a tendency for involvement of left inferior frontal cortex in the attention-generated activity increase. Whereas focal peri-sylvian effects were seen early-on in the activation time course, more global source distribution was found for the later peak at ∼370 msec. As grand-average data were used in this analysis in order to improve the SNR for computing the solutions, these results could not be verified statistically and should therefore be treated with caution.
In the current study, we investigated the issue of interaction between attention and language processes using auditory-evoked responses to a group of acoustically controlled words and pseudowords. To achieve our goal, we systematically varied attention on the linguistic input by either only engaging the subjects in an unrelated visual task or, in addition, asking them to pay close attention to the stimuli by performing an acoustic stimulus detection task. All conditions showed a clear negative-going mismatch response with two fronto-central peaks (120 and 370 msec) and a positive deflection peaking at 230 msec. The word and pseudoword difference became apparent as early as 120 msec in nonattend conditions. Attention did not affect the early word response but significantly increased the early pseudoword activity. At later stages, both pseudoword and word responses were affected by attention. The attention effects seemed to be predominant over the left hemisphere (Figure 6) with involvement of left peri-sylvian areas, especially inferior frontal cortex (Figure 7). Let us now discuss these findings in more detail.
Early Word Advantage
In nonattend conditions, the first peak was significantly larger for word stimuli than for pseudowords (Figure 3). This replicates the previous findings which consistently indicated an increased MMN response to word deviants as opposed to acoustically similar pseudowords (Shtyrov et al., 2005; Endrass et al., 2004; Pettigrew et al., 2004; Pulvermüller et al., 2001, 2004; Sittiprapaporn et al., 2003; Kujala et al., 2002; Shtyrov & Pulvermüller, 2002b; Korpilahti et al., 2001). Although it may still be possible to explain some of linguistic increase in MMN by phonological familiarity (Shtyrov, Kujala, Palva, et al., 2000; Shtyrov et al., 1998), those earlier studies that controlled precisely for phonological and psycholinguistic properties predominantly linked the word-elicited MMNs enhancement to lexico-semantic properties of the stimuli. More specifically, this enhancement has been interpreted as a neurophysiological signature of long-term memory traces for words in the brain which become automatically activated whenever the word is presented even if it is not specifically attended to (Shtyrov & Pulvermüller, 2007; Pulvermüller & Shtyrov, 2006). Such lexical traces are formed as a consequence of frequent processing of words, which yields, by Hebbian learning, long-term memory circuits with strong internal connections (Pulvermüller, 1999; Braitenberg & Schüz, 1992; Hebb, 1949) that can support the circuit activity even under low-attention conditions. Our present data are fully concordant with this interpretation.
One important difference between this and the previous studies, however, is that here we introduced a small group of words and pseudowords (4 and 4) as opposed to the earlier studies which typically used a single deviant in each block (e.g., Shtyrov et al., 2005; Shtyrov & Pulvermüller, 2002b). This demonstrates that this lexical enhancement is not limited to highly artificial single-deviant settings, and that reliable linguistic MMN effects can be acquired in multifeature paradigms if the acoustic stimulus properties are carefully controlled. The present responses, however, were generally small in size. We take this as an indication that extreme caution should be exercised when using many deviants in nonattend designs; rigorous matching of acoustic properties seems to be vital to avoid smearing of early ERP effects due to stimulus variance (Pulvermüller & Shtyrov, 2006).
The early lexical enhancement of the auditory response was not present when active attention was paid to stimuli. This appears to be a result of increased pseudoword response under such conditions, which we will discuss below. Here, it is important to note that lexical enhancement appears to be specific to conditions where the subjects' attention is directed away from the stimuli. This implies that non-attend experiments may be of special benefit for investigating memory traces for language elements in the brain.
Another important feature of the current design is the intended perceptual grouping between the standard and deviant stimuli that generated word or nonword perception. Although a similar approach was used previously (Pulvermüller et al., 2001) and its surface features are in full agreement with the multifeature MMN paradigm (Näätänen et al., 2004), such standard–deviant grouping is relatively new in the literature. In spite of this, the responses exhibited a typical MMN pattern: increased fronto-central negativity at >100 msec latency for deviants as opposed to standards, and a clear reversal at mastoid electrode sites (Figure 2). Still, given this unconventional design capitalizing on binding of information between syllabic stimuli, we would like to refrain from interpreting the current results in relation to the MMN generation per se. Instead, we would like to stress the established position that linguistic oddball responses are different from the conventional acoustic MMNs in that they are more related to activation of long-term memory traces in the brain than to acoustic change identification or short-term memory processes (Näätänen, Paavilainen, Rinne, & Alho, 2007; Shtyrov & Pulvermüller, 2007; Pulvermüller & Shtyrov, 2006; Näätänen et al., 1997), and can therefore have other sources than the typical auditory MMNs. We would also like to note that such designs minimize the time required by normally rather lengthy MMN experiments as they only require single syllables rather than complete words to be used in standard and deviant positions.
Automaticity and Attention Modulation
The first phase of the word-elicited response was not significantly affected by attention modulation at least until 140 msec (Figure 4). This indicates that early processing of lexical items may not be dependent on the level of attention given to the auditory input or on the task performed. In this sense, it may have a certain degree of automaticity. Memory traces for individual words may be conceptualized as strongly interconnected networks of neurons distributed over core language areas and possibly outside them as well (Pulvermüller, 2001). Strong mutual connections between the subparts of such memory networks may be the reason why the entire circuit becomes quickly activated to its full capacity even when little or no attention is allocated to language. Whereas modulation of these connections by attention cannot be fully ruled out, it does not appear relevant for increasing the early network activity. As we have recently reported a similar independence from attention for early neurophysiological correlates of syntactic processing (Pulvermüller, Shtyrov, Hasting, & Carlyon, 2008), it may be that the initial stages of a variety of linguistic processes are not subject to influence of attention.
The pseudoword response exhibited, however, a markedly different behavior. It increased significantly with attention to the auditory stream, even to the extent that early word and pseudoword responses were not significantly different in attended condition.
Whereas it appears that pseudowords, phonological in nature and phonotactically legal, may activate language areas in the brain (Shtyrov et al., 2005, 2008; Newman & Twieg, 2001; Posner, Abdullaev, McCandliss, & Sereno, 1996; Price, Wise, & Frackowiak, 1996; Petersen, Fox, Snyder, & Raichle, 1990), the absence of long-term memory traces for such items implies that no activation of such traces per se is possible. Hence, they produce a reduced response in nonattend conditions possibly reflecting acoustic and phonological aspects of their processing as well as partial activation of related lexical traces. The latter assumes a path of pseudoword perception compatible with cohort (and some other) models of speech processing (Marslen-Wilson, 1987). That implies that all lexical neighbors become partially activated as soon as the relevant information is available. However, as additional information arrives, they become progressively excluded until there is only one candidate left. In the context of the current study, the design makes sure that this crucial information arrives at the same time: at the consonant–vowel transition in the second syllable. Thus, although the initial contact with competing representations may be made by both words and pseudowords, only word activation becomes fully fledged, in 100–150 msec after the complete information is present. At this time, apparently, the traces for lexical neighbors, activated partially and equally for words and pseudowords, already become extinguished as they are ruled out (once the stimulus is identified), if there are no additional resources (attention) to develop and maintain their activity. The initial word activation seems to reach its maximum regardless of the level of attention due to strong links within the network subserving it. In the associative Hebbian logic of memory trace formation (Pulvermüller, 1999; Braitenberg & Schüz, 1992; Hebb, 1949), the strength of synaptic connections between subparts of such networks is dependent on the degree of coactivation between these parts. Thus, frequent use of words should lead to representations robust enough to such an extent that they are activated by the respective stimulus regardless of amount of attention paid to it. Pseudowords as such have no such representations and produce a reduced response in nonattend conditions. However, with more attentional resources becoming available, the early word activation still remains at its maximum, whereas partial but multiple neighbor activations appear to be able to benefit from these additional resources, and thus, be maintained at a higher level, leading to the cancellation of the early word advantage effect (even with a potential possibility for its reversal, cf. Garagnani, Shtyrov, & Pulvermüller, 2009).
This influence of attention and specific increase of pseudoword-elicited activation in the attend condition may be explained in terms of strength of cortical feedback connections. Following the biased competition model of attention (Duncan, Humphreys, & Ward, 1997; Desimone & Duncan, 1995; Duncan, 1980), it was suggested that low attention resources may be realized as strong feedback inhibition, whereas reduction in this inhibition leads to greater availability of attentional resources (Garagnani et al., 2008). Such global adjustment of cortical activity levels, a regulation system that can be viewed as a basis for attentional resources, will allow pseudowords to activate related cortical lexical traces only minimally when attentional resources are scarce. Conversely, in the situation of increased attention (reduced inhibition), stronger activation of previously inhibited traces becomes possible. The attentional modulation of pseudoword-induced responses may thus be traced to differential levels of partial but multiple activations of a cohort of words bearing similarity to the presented pseudoword (Marslen-Wilson, 1987). Discrete memory circuits for existing words, on the other hand, appear to be fully activated irrespective of exact attention level when the respective words are heard (Garagnani et al., 2008).
Whereas, as reviewed above, such early automaticity in lexical access was suggested by the previous studies, they only used passive distraction from the auditory input, making automaticity claim difficult to verify. Here, attention was modulated systematically by varying instructions in the dual task, in which the subjects were told to either concentrate on the visual input or on the auditory one, while both were continuously present. This modulation resulted in a number of effects on the ERPs but did not modify the early word response pointing toward its independence from attentional influence.
Later stages of linguistic processing, which were reflected by a positive wave at ∼230 msec and a second negative shift in sub-400 msec interval, were strongly affected by attention. Unlike the early response, these later effects were not differential for the two lexical types and involved words and pseudowords alike. This implies that proposed language automaticity is limited to the very first stages of linguistic processing, whereas later steps (here after ∼150–200 msec) are affected by attention allocation. These later steps, which possibly reflect a more in-depth, secondary processing or reanalysis and repair of incoming speech (Friederici, 2002; Osterhout, Holcomb, & Swinney, 1994; Kutas & Hillyard, 1980), are thus dependent on the amount of resources allocated to language. Full processing of spoken language is therefore not possible without allocating attentional resources to it; this allocation, in itself, may be triggered by the putative early automatic stages in the first place.
The current study is still limited in that it cannot fully guarantee the withdrawal of attention from the lexical material in the nonattend block: The subjects were passively distracted by the visual task that, in itself, may not fully stop them from attending to the sounds. This is the conventional MMN approach used previously with little variation in a very large number of studies, and it was employed here in order to directly compare this traditional paradigm with more attentionally demanding conditions. Our overall design still modulates the level of attention between the different blocks, thereby delineating attention effect per se. The results also suggest that although the previous studies may have been correct to suggest early processing automaticity, this may only be true to a certain point in time (here ∼140 msec), and these claims need to be re-examined for later latencies. To fully elucidate the issue of automaticity and preattentive linguistic processing, a different study, in which a more stringent distraction from the auditory stimulation could be used, would be a necessary future step (Garagnani et al., 2009; cf. Pulvermüller et al., 2008).
Role of Left Peri-sylvian Areas in Attention–Language Interactions
Topographical analysis of the data indicated that the attention effects were strongest in the left hemisphere (see also Figure 6). More specifically, left anterior sites were found to be linked to the early attention-related effects. This was confirmed by the source analysis (Figure 7) which indicated a consistent tendency for involvement of left inferior frontal cortex in attention-generated activity increase. As these solutions were computed on grand-average data in order to improve the SNR, and thus, could not be subject to statistical scrutiny, they must be treated with caution. Still, they correspond well to the statistically verified ERP results (particularly the left peri-sylvian foci), and therefore, may be seen as indicative with respect to the effects' origin. Whereas such focal peri-sylvian effects were present at the early stages of the MMN response, a more distributed bilateral activation was found for the later peak at ∼370 msec
The role of left peri-sylvian cortices in a variety of aspects of neural language processing is well known (e.g., Näätänen et al., 1997; Binder et al., 1995; Martin & Chao, 2001; Pulvermüller, 2001; Friederici, Wang, Herrmann, Maess, & Oertel, 2000; Price, 2000; Shtyrov, Kujala, Lyytinen, Ilmoniemi, & Näätänen, 2000; Helenius, Salmelin, Service, & Connolly, 1998). From the present work, however, it seems that these areas may also be involved in modulating the amount of attentional resources dedicated to language processing. More specifically, left frontal sites seemed to be consistently activated in the attention-demanding conditions. The role of frontal cortices in selective attention processes was repeatedly documented (Peers et al., 2005; Duncan & Owen, 2000; Fuster, 2000; Duncan, Emslie, Williams, Johnson, & Freer, 1996) and the current data are in agreement with this. Frontal cortex was also suggested to be involved in automatic reorientation of auditory attention triggered by a stimulus change (Alho, 1995; Näätänen & Alho, 1995), which is also supported by the present data. In the context of the current research, the early activation of frontal sources may be hypothesized as a possible mechanism for switching the auditory attention toward linguistic input.
In addition to the left peri-sylvian areas, we also noted a more distributed activation possibly involving parietal, temporal, and frontal cortices, especially in the later time interval (Figure 7). Whereas these could not be substantiated statistically, this is in line with the previous research which suggested that attention may be mediated by a distributed network involving temporal, parietal, and frontal neocortex (Peers et al., 2005; Zatorre, Mondor, & Evans, 1999).
We investigated the issue of interaction between attention and language systems using brain's auditory event-related responses to a group of acoustically controlled words and pseudowords. We varied attention on the linguistic input by using a passive oddball presentation and an acoustic stimulus detection task. We found that:
Using multiple word and pseudoword stimuli, MMN responses could be elicited. They exhibited two fronto-central negative peaks (∼120 and 370 msec) and a positive deflection peaking at 230 msec.
The word-elicited response was stronger than that evoked by pseudowords already at ∼120 msec in nonattend conditions.
Attention variation did not affect the early word response but significantly increased the early pseudoword activity.
At later stages (after 150 msec), both pseudoword and word responses were affected by attention.
The results suggest that early word processing may be independent of attention or stimulus-related tasks, and may thus indeed possess a degree of automaticity and be possibly independent from other cognitive processes. Later stages of speech analysis, on the contrary, are affected by the attentional control and may thus be dependent on it. The attention effects on linguistic processing seem to be predominant in the left hemisphere and appear to specifically involve left peri-sylvian cortical areas.
We wish to thank Pasi Piiparinen, Teemu Peltonen, Marja Riistama, Oleg Korzyukov, William Marslen-Wilson, and four anonymous referees for their contribution at different stages of this work. The work was supported by the Medical Research Council, UK (U.1055.04.003.00001.01), University of Helsinki, and Academy of Finland.
Reprint requests should be sent to Yury Shtyrov, Medical Research Council Cognition and Brain Sciences Unit, 15 Chaucer Road CB2 7EF, Cambridge, UK, or via e-mail: firstname.lastname@example.org.