Abstract
We used ERPs to investigate the time course of interactions between lexical semantic and sublexical visual word form processing during word recognition. Participants read sentence-embedded pseudowords that orthographically resembled a contextually supported real word (e.g., “She measured the flour so she could bake a ceke…”) or did not (e.g., “She measured the flour so she could bake a tont…”) along with nonword consonant strings (e.g., “She measured the flour so she could bake a srdt…”). Pseudowords that resembled a contextually supported real word (“ceke”) elicited an enhanced positivity at 130 msec (P130), relative to real words (e.g., “She measured the flour so she could bake a cake…”). Pseudowords that did not resemble a plausible real word (“tont”) enhanced the N170 component, as did nonword consonant strings (“srdt”). The effect pattern shows that the visual word recognition system is, perhaps, counterintuitively, more rapidly sensitive to minor than to flagrant deviations from contextually predicted inputs. The findings are consistent with rapid interactions between lexical and sublexical representations during word recognition, in which rapid lexical access of a contextually supported word (CAKE) provides top–down excitation of form features (“cake”), highlighting the anomaly of an unexpected word “ceke.”
INTRODUCTION
Visual word recognition involves the rapid extraction of representations at multiple levels of analysis—including the extraction of sublexical word form features (e.g., letters and multiletter patterns), phonological, semantic, and grammatical representations—within approximately half a second after stimulus onset. Cognitive models of word recognition widely assume that this process involves mutually constraining recurrent interaction between the multiple levels of analysis (Grainger & Holcomb, 2009; Harm & Seidenberg, 2004; Stone, Vanhoy, & Van Orden, 1997; Plaut, McClelland, Seidenberg, & Patterson, 1996; McClelland & Rumelhart, 1981). Recurrent feedback provides an explanation for robust behavioral phenomena, such as the word superiority effect, in which behavioral responses to a target letter are facilitated when it occurs inside a word, relative to isolated presentation (McClelland & Rumelhart, 1981). The existence of recurrent connectivity is consistent with anatomical evidence of dense re-entrant connectivity from higher order to sensory cortex (Lamme & Roelfsema, 2000). Although recurrent interaction is widely assumed within models of word recognition, the time course and functional role of recurrent information flow remain underspecified and controversial. Recent studies indicate that top–down predictions about low-level form representations modulate very early stages of word recognition, within the initial ∼200 msec of processing the word. Here, we investigated the mechanisms that allow such recurrent processing effects to occur.
A number of neuroanatomical models of both word and object recognition do not directly address the role of recurrent interaction and instead focus on feedforward information flow within the ventral visual cortex, which extracts a progression of increasingly complex visual features, culminating in word (e.g., Dehaene, Cohen, Sigman, & Vinckier, 2005) or object (Riesenhuber & Poggio, 1999) recognition. One recent proposal posits that the ventral visual system implements a hierarchy of feature detectors providing sensitivity to low-level features (e.g., edges) in V1, letters in V4, and multicharacter n-grams and whole-word orthographic patterns in ventral occipital-temporal cortex (e.g., Vinckier et al., 2007; Dehaene et al., 2005; McCandliss, Cohen, & Dehaene, 2003). This view places a logical priority on extracting low-level features before retrieving lexical semantic representations, perhaps as part of a feedforward sweep that precedes recurrent interaction.1
Consistent with a feedforward, form-first perspective, some ERP and magnetoencephalography (MEG) studies find that brain responses before ∼200 msec poststimulus onset distinguish alphabetic character strings from nonalphabetic stimuli—suggesting sensitivity to visual word form features—but do not distinguish among alphabetic stimuli on the basis of semantic properties or lexical status (Mariol, Jacques, Schelstraete, & Rossion, 2008; Pylkkänen, Stringfellow, & Marantz, 2002; Bentin, Mouchetant-Rostaing, Giard, Echallier, & Pernier, 1999; Tarkiainen, Helenius, Hansen, Cornelissen, & Salmelin, 1999; Nobre, Allison, & McCarthy, 1994). Tarkiainen et al. (1999), for instance, found enhanced MEG signal of ∼150 msec after an onset of letter strings, relative to strings of nonlinguistic symbols, with estimated generators in ventral occipital-temporal cortex. In contrast to this early sensitivity to low-level form features, manipulations of lexical variables (e.g., semantic expectancy, lexical frequency, and lexical neighborhood size) robustly modulate the later N400 component (∼250–500 msec) of the ERP (e.g., Hagoort, Hald, Bastiaansen, & Petersson, 2004; Holcomb, Grainger, & O'Rourke, 2002; Bentin, McCarthy, & Wood, 1985; Kutas & Hillyard, 1980). Together, these findings are compatible with a feedforward architecture in which word form analysis occupies the initial ∼250 msec of word recognition, with lexical access occurring after that (e.g., Pylkkänen & Marantz, 2003; Tarkiainen et al., 1999).
Recent ERP and MEG studies indicate substantially earlier contact (∼100–200 msec) between word form analysis and lexical semantic processing (e.g., Dikker & Pylkkanen, 2011; Dell'Acqua et al., 2010; Dambacher, Rolfs, Göllner, Kliegl, & Jacobs, 2009; Penolazzi, Hauk, & Pulvermüller, 2007; Hauk, Davis, Ford, Pulvermüller, & Marslen-Wilson, 2006; Sereno, Brewer, & O'Donnell, 2003) or lexical syntactic processing (Dikker, Rabagliati, Farmer, & Pylkkänen, 2010; Dikker, Rabagliati, & Pylkkänen, 2009). Sereno et al. (2003), for instance, found that sentence context modulated the ERP elicited by ambiguous words 132–192 msec after stimulus onset. Dikker et al. (2009) found enhanced MEG signal ∼100–130 msec following syntactically unexpected words, relative to expected controls.
These early-latency effects of lexical semantic and syntactic variables occur during sentence reading, as opposed to the more common usage of isolated word processing tasks (e.g., lexical decision) in the study of visual word recognition. Several accounts of these effects involve context-driven predictions about word forms, allowing rapid sensitivity to word form inputs that deviate from predictions (Dikker et al., 2009, 2010; Dambacher et al., 2009; Solomyak & Marantz, 2009). This recurrent processing view is consistent with growing evidence that language processing operates in an anticipatory manner, using context to predict the linguistic input at multiple levels (e.g., Altmann & Mirković, 2009; DeLong, Urbach, & Kutas, 2005).
The mechanisms that underlie prediction-based mismatch responses and their implications for the nature of visual word recognition remain poorly characterized. Dikker et al. (2010) report data suggesting that early mismatch responses are proportional to the degree of physical discrepancy between predicted and actual inputs. Such findings could reflect mechanisms that compare top–down predictions about low-level form features against the actual input (Dikker et al., 2010; Dambacher et al., 2009). We suggest here, however, that conflict between predicted and actual inputs may not always increase monotonically with the degree of physical discrepancy between predicted and actual inputs. Within interactive activation models (McClelland & Rumelhart, 1981), for instance, a stimulus that deviates from but resembles a predicted word (e.g., “ceke” in a context predicting “cake”) could drive access of the anticipated word's lexical representation (CAKE), which could then provide top–down excitation of lower-level representations that conflict with the input (e.g., “ca–” vs. “ce–”). This feedforward–feedback dynamic could, if rapid enough, yield early conflict to minor deviations from predictions (e.g., “ceke” vs. “cake”), which does not occur for more flagrant deviations (e.g., “tont” vs. “cake”).
Here, we investigated the impact of recurrent feedforward–feedback interaction on the earliest stages of word recognition by recording brain responses as participants read sentence-embedded words (e.g., 1), pseudowords (word-like nonwords) that were either orthographically similar to contextually supported real words (“supported pseudoword,” e.g., 2) or were orthographically highly deviant from contextually driven expectations (“no-support pseudoword“, e.g., 3) and orthographically illegal consonant strings (nonword, e.g., 4).
- (1)
She measured the flour so she could bake a cake … CONTROL
- (2)
She measured the flour so she could bake a ceke … SUPPORTED PSEUDOWORD
- (3)
She measured the flour so she could bake a tont … NO-SUPPORT PSEUDOWORD
- (4)
She measured the flour so she could bake a srdt … NONWORD
METHODS
Participants
Twenty right-handed, native English-speaking University of Colorado students participated (12 men, mean age = 22 years) and gave informed consent. All had normal or corrected-to-normal vision.
Materials
Stimuli were sentence-embedded highly plausible control nouns (Control; 1), pseudowords that orthographically resembled a contextually supported real word (supported pseudoword; 2), pseudowords that resembled no plausible real word (no-support pseudoword; 3), and nonword consonant strings (nonwords; 4). Critical word length was constant within an experimental item (1–4) and was always either 4 or 5 characters (mean = 4.54), minimizing any confounding impact of word length on early sensory ERPs (Hauk & Pulvermüller, 2004). Supported pseudowords were derived from control words by replacing a single noninitial letter (e.g., 1 vs. 2). No-support pseudoword sentences were created by swapping the pseudowords from two supported pseudoword sentences (e.g., (2) with “The backpacker found a campsite and set up the tont before dark”). Predicted words within these sentence pairs (“cake,” “tent”) were matched in frequency (American National Corpus; Reppen & Ide, 2004). Four experimental lists were created, each containing 45 sentences in each of the four experimental conditions. Each stimulus sentence appeared once in an experimental list, with condition assignments rotated across lists in a Latin Square design. Each participant was randomly assigned to one list. Sentences were pseudorandomly ordered within a list.
The predictability of the control word for each sentence was estimated with a cloze test. A separate group of participants provided the first completion that came to mind after reading fragments of our experimental stimuli up to but excluding the critical words (e.g., “She measured the flour so she could bake a ______”). We divided the full set of 180 fragments randomly into two groups of 90 fragments and assigned each subgroup of items to separate groups of participants (n = 9 for each group) to minimize the number of ratings performed by each individual. The average cloze probability of our control nouns was 90% (range = 85–100%).2 Thus, pretarget contexts in our experimental stimuli generally afforded the prediction of a specific word.
Procedure
Participants sat in a comfortable chair 105 cm from an LCD monitor in a soundproof, dimly lit experimental booth. Sentences were presented one word at a time, with each word appearing for 250 msec followed by a 300-msec blank screen. Critical stimuli subtended approximately 2° of visual angle. Participants were instructed to read normally and to try to understand the sentences, in spite of the presence of occasional nonwords. Participants answered yes–no comprehension questions following one third of the sentences that were pseudorandomly selected. The entire session lasted for approximately 45 min.
EEG Recording
Continuous EEG was recorded from 64 sintered Ag/Ag–Cl electrodes embedded in an elastic cap (Neuroscan QuikCaps) arranged according to the extended 10–20 system (Figure 1). Vertical eye movements and blinks were monitored with two electrodes placed above and below the left eye, and horizontal eye movements were monitored by electrodes placed at the outer canthi of each eye. EEG was also recorded over the left and right mastoid sites. Impedances were maintained below 10 kΩ. EEG was referenced on-line to a vertex electrode and later rereferenced to linked mastoids.
Sixty-four-channel scalp electrode array. Channels used in statistical analyses are highlighted with thick-lined circles.
EEG was amplified and digitized at 1000 Hz (Neuroscan Systems). After recording, data were down-sampled to 200 Hz and filtered with a bandpass of 0.1–30 Hz. Eyeblink artifact was corrected using a subject-specific regression-based algorithm (Semlitsch, Anderer, Schuster, & Presslich, 1986). Any remaining voltages exceeding ±100 μV were rejected. ERPs were averaged in epochs of activity spanning −200 to 700 msec relative to the onset of the target stimulus.
ERP Analyses
ERPs were quantified as mean amplitudes within four time windows, centered at the peaks of two prominent early posterior components and two later central–parietal components: P1 (125–145 msec), N170 (170–205 msec), N400 (300–500 msec), and P600 (500–700 msec). Each component was analyzed at three channels, which reflected the component's scalp distribution in our data as well as typical analysis sites for that component in the literature. The P1 was analyzed at three occipital channels (O1, OZ, and O2), the N170 at occipital and occipital–temporal channels (OZ, PO7, and PO8), and the N400 and P600 at three frontal-to-parietal midline channels (FZ, CZ, PZ). Figure 1 highlights channels used in analyses. Analyses were repeated measures ANOVAs with factors Condition (control, supported pseudoword, no-support pseudoword, and nonword) and Channel (three levels, depending on component; see above). Significant main effects of Condition were followed by pairwise comparisons between conditions. The Greenhouse–Geisser (1959) correction was applied to comparisons with more than 1 degree of freedom.
RESULTS
All experimental conditions elicited a positive-going occipital P1 peak at 120–130 msec followed by a negative-going N170 peak at 170–185 msec at occipital and occipital-temporal channels (Figure 2A and C). The critical finding we report here is that supported pseudowords enhanced the occipital P1 component around 130 msec after word onset (P130 effect; Figure 2A and B), whereas no-support pseudowords and nonwords enhanced the later N170 component (Figure 2C). Statistical analyses are reported below for the novel P130 effect as well as for effects on the N170 and on the later N400 and P600 components, which are more typical of ERP effects reported for sentence-embedded words. These later effects show that the P130 effect occurs in the context of normal later effects and are presumably part of normal language processing.
(A) Grand-averaged ERP waveforms illustrating the P130 effect at occipital channel Oz. Plotted are waveforms for the control (black), supported pseudoword (red), no-support pseudoword (blue), and nonword (green) conditions. Note that voltage is plotted positive down by convention. ERPs begin 200 msec before and end 300 msec after stimulus onset. The voltage-ticked vertical bar is aligned with stimulus onset. (B) Difference image showing the scalp distribution of the P130 effect by subtracting control voltages from supported pseudoword voltages within the 125–145 msec time window. (C) Grand-averaged ERP waveforms illustrating the N170 effect at occipital-temporal channels PO7 and PO8.
(A) Grand-averaged ERP waveforms illustrating the P130 effect at occipital channel Oz. Plotted are waveforms for the control (black), supported pseudoword (red), no-support pseudoword (blue), and nonword (green) conditions. Note that voltage is plotted positive down by convention. ERPs begin 200 msec before and end 300 msec after stimulus onset. The voltage-ticked vertical bar is aligned with stimulus onset. (B) Difference image showing the scalp distribution of the P130 effect by subtracting control voltages from supported pseudoword voltages within the 125–145 msec time window. (C) Grand-averaged ERP waveforms illustrating the N170 effect at occipital-temporal channels PO7 and PO8.
P130 (125–145 msec)
Supported pseudowords enhanced the occipital P1 component (P130 effect; Figure 2A and B), whereas no-support pseudowords and nonwords did not. This was confirmed by a significant main effect of Condition [F(3, 57) = 3.2, p < .05], which reflected more positive voltages for supported pseudowords than for controls [F(1, 19) = 6.93, p < .05], nonwords [F(1, 19) = 5.1, p < .05], and no-support pseudowords [F(1, 19) = 7.7, p < .05]. The control, nonword, and no-support pseudoword did not differ from each other. The supported pseudoword P130 effect appeared larger over the left than the right hemisphere occipital channels (Figure 2B), but there was no Condition × Channel interaction.
N170 (175–205 msec)
No-support pseudowords and nonwords enhanced the occipital–temporal N170 component (Figure 2C). This was confirmed by a main effect of Condition [F(3, 57) = 5.01, p < .005], which reflected more negative voltages for no-support pseudowords relative to controls [F(1, 19) = 4.54, p < .05] and for nonwords relative to controls [F(1, 19) = 6.76, p < .05]. Supported pseudowords did not differ from controls. There were no interactions between condition and channel, although visual inspection appeared to indicate hemispheric asymmetries (Figure 2C).
N400 (300–500 msec)
No-support pseudowords enhanced the central–parietal N400 (Figure 3), confirmed by a main effect of Condition [F(3, 57) = 12.23, p < .0001], which reflected more negative voltages for no-support pseudowords than controls [F(1, 19) = 13.1, p < .005]. Supported pseudowords and nonwords did not differ from controls. The N400 effect was larger at central–parietal channels (Cz and Pz) than at the frontal channel (Fz), as typical of the word-elicited N400 effect, reflected in a Condition × Channel interaction [F(6, 114) = 8.82, p < .001].
N400 and P600 effects. Grand-averaged ERPs are shown at a central–parietal channel (Pz) for the control (black), supported pseudoword (red), no-support pseudoword (blue), and nonword (green) conditions. Note that voltage is plotted positive down by convention. ERPs begin 200 msec before and end 700 msec after stimulus onset. The voltage-ticked vertical bar is aligned with stimulus onset.
N400 and P600 effects. Grand-averaged ERPs are shown at a central–parietal channel (Pz) for the control (black), supported pseudoword (red), no-support pseudoword (blue), and nonword (green) conditions. Note that voltage is plotted positive down by convention. ERPs begin 200 msec before and end 700 msec after stimulus onset. The voltage-ticked vertical bar is aligned with stimulus onset.
P600 (500–700 msec)
Supported pseudowords and nonwords enhanced the central–parietal P600 (Figure 3), confirmed by a main effect of Condition [F(3, 57) = 10.96, p < .0001] and significantly more positive voltages for supported pseudowords relative to controls [F(1, 19) = 16.35, p < .001] and for nonwords relative to controls [F(1, 19) = 15.71, p < .001]. No-support pseudowords did not differ from controls. The P600 effect was larger at central–parietal channels (Cz, Pz) than at the frontal channel (Fz), as typical of the P600 effect, reflected in a Condition × Channel interaction [F(6, 114) = 26.23, p < .001].
DISCUSSION
Sentence-embedded pseudowords resembling a contextually supported real word (e.g., “… bake a ceke …”) enhanced the occipital P1 component around 130 msec after word onset (P130 effect). Pseudowords that did not resemble a contextually supported real word (no-support pseudowords) elicited a later enhancement of the N170 component, as did nonwords. At later points in the ERP, no-support pseudowords enhanced the centro-parietal N400, whereas the nonwords and supported pseudowords enhanced the P600. Note that this effect pattern cannot be attributed to stimulus item differences (e.g., “ceke” vs. “tont”), because stimuli were counterbalanced such that each pseudoword exemplar appeared in both supported pseudoword and no-support conditions across experimental lists. These results provide evidence of anticipatory processing effects on the earliest stages of word recognition. The findings furthermore show that mismatch responses to unexpected words are not always proportional to the degree of physical discrepancy between anticipated and actual inputs. The word recognition system is, at least under these circumstances, more rapidly sensitive to small deviations from contextual predictions than to flagrant deviations.
P130, Anticipatory Processing, and Similarity-based Conflict
The P130 effect to minor but not flagrant deviations from predictions indicates rapid similarity-based conflict between predicted and actual inputs. Several processing architectures are compatible with this effect pattern. We propose that the findings reflect a rapidly occurring combination of top–down and bottom–up processing, resulting in strong activation of lexical features (CAKE) and word form features (e.g., “ca–,” “–ak–”), which highlights the anomaly of the bottom–up input (“ceke” vs. “cake”). More specifically, we suggest that the P130 effect reflects the following recurrent processing events: First, before stimulus onset, context-driven anticipatory priming (Altmann & Mirković, 2009) drives partial activation of lexical features (e.g., CAKE) and constituent word form features (“ca–,” “–ak–”) for a contextually appropriate word. Second, the physical input “ceke” is partially consistent with and provides a feedforward activation boost to the already-primed CAKE. The highly active CAKE then drives top–down excitation of its already-primed constituent word form features, boosting them to high levels of activation. Finally, lateral inhibitory competition (O'Reilly, 1998; McClelland & Rumelhart, 1981) occurs between incompatible portions of the feedforward-activated and feedback-activated word form features (e.g., “ca–” vs. “ce–”). This inhibitory competition increases neural activity reflected in the P130 effect.
Flagrant violations (e.g., “tont” or “srdt”) do not drive rapid activation of a lexical representation, resulting in less feedback-driven competition, and this explains their lack of early mismatch effects. The no-support pseudoword “tont” (3) does resemble real words (TENT, TINT, and FONT), but these are not contextually supported (primed) and, therefore, do not gain rapid activation. Without robust lexical activations, there is no top–down excitation of word form representations that conflict with the input (e.g., “tont” vs. “tent”). The illegal nonwords (e.g., “srtd”) resemble no real words, making lexical activation even less likely.
The interactive processing account above leaves open a question about whether the hypothesized feedforward–feedback interactions involve representations that are semantic in nature. Several other recent findings of rapid lexical semantic effects on word recognition (e.g., Penolazzi et al., 2007) support a conclusion that the representations are semantic. However, some models of word recognition preclude rapid feedforward semantic access even as they allow (perhaps slower-acting) top–down semantic effects on word form features (Solomyak & Marantz, 2009). Within such late access models, the feedforward–feedback interactions might involve orthographic representations, which are “presemantic” but correlate with lexical semantic variables (e.g., Grainger & Holcomb, 2009; Dehaene et al., 2005).
It is also possible that late access models can account for the similarity-based P130 effect by adding an assumption that word form representations compete with each other along lateral inhibitory connections that are tuned to be stronger between similar competitors (“ca–” vs. “ce–”) than between dissimilar competitors (“ca–” vs. “to–”). If this were true, then predicted word form features could conflict more strongly with mildly mismatching inputs than with flagrant mismatches, without mediation by higher-level representations. Such tuning could reflect the confusability of similar word form representations during learning; if “ca–” and “ce–” are frequently coactive because of similar bottom–up inputs, this may drive learned mutual suppression, because only one can be appropriate in a given situation.
N170 Effects
The N170 effect was sensitive to flagrant but not near-match violations, the reverse of the effect pattern at P130. We suggest that the N170 effect reflects visual word form processing difficulty that occurs when there is no rapid selection of a higher-level lexical representation. In the case of contextually supported control words (e.g., “… bake a cake …”), a lexical representation (CAKE) is rapidly selected and provides top–down feedback that supports settling of lower-level word form representations.3 In contrast, no-support pseudowords and nonwords fail to rapidly excite a lexical representation, resulting in a lack of supportive feedback to and delayed settling of word form processing. Contextually supported pseudowords (e.g., “ceke”) successfully activate a lexical representation. Although this activation initially causes conflict with a portion of the input word's form (P130, discussed above), by the N170 latency, top–down feedback supports rapid settling of word form processing, as in control words, minimizing the N170 effect.
Later Components
Our early component effects were followed by effects on later components, which are consistent with existing findings. Enhanced N400 for no-support pseudowords is similar to the effect in a number of studies of word-like nonwords and may reflect difficulty selecting a coherent lexical semantic representation for the meaningless but word-like stimulus (e.g., Laszlo & Federmeier, 2009; McLaughlin, Osterhout, & Kim, 2004; Holcomb, 1993,27). The P600 effect for nonwords and for supported pseudowords is a less widely reported pattern but is consistent with several studies in which misspellings and other orthographically illegal strings elicit P600 effects (Kim & Pitkänen, submitted; Vissers et al., 2006). Such effects have been attributed to orthographic structural processing difficulty, where nonwords violate the orthographic structural regularities of the language (“srdt” is illegal in English) and supported pseudowords are orthographically legal but violate a word-specific constraint on orthographic structure (the word CAKE is spelled “cake,” not “ceke”; Kim & Pitkänen, submitted). A related but distinct account is that such P600s reflect monitoring of the conflict between the orthographic input and contextually appropriate configurations—the supported pseudoword may be perceived as a misspelling, and this may recruit processing that verifies the accuracy of the initial percept (van de Meerendonk, Indefrey, Chwilla, & Kolk, 2011; Vissers et al., 2006).
Speed of Information Flow within Visual Cortex
The current results contribute to recent findings of lexical semantic influences on early brain responses, contradicting a widely held view within the ERP and MEG literature that such influences do not occur until after ∼200 msec poststimulus onset. We emphasize here that the newer findings are consistent with what is known about the speed of information flow within the visual system, based on physiological and anatomical findings (Foxe & Simpson, 2002; Lamme & Roelfsema, 2000). The feedforward–feedback proposal above assumes that, by ∼130 msec, information has flowed from sensory cortex, up to higher-level cortical areas representing lexical information, and back again, at least one iteration. For this to occur, the initial feedforward contact with lexical representations would need to occur substantially earlier, perhaps by ∼80–90 msec poststimulus onset. Foxe and Simpson (2002) report human ERP evidence that occipital cortex responds to visual stimuli by 56 msec and that frontal cortex is active by 80 msec. Monkey intracranial recordings show that feedforward information flow from V1 to the highest levels of the ventral visual system (inferotemporal cortex, IT) occurs in ∼23 msec (Schroeder, Mehta, & Foxe, 2001; Schroeder, Mehta, & Givre, 1998) and that robust selectivity for complex stimuli (e.g., faces) occurs at latencies of ∼100 msec (e.g., Rolls & Tovee, 1994). A number of studies indicate that transmission time for information flowing along a single synaptic distance is 10–15 msec, both between and within cortical regions (Tovée, 1994). These indications of fast information flow within cortex are compatible with our conclusions that feedforward–feedback interactions serving word recognition are underway by 130 msec. Furthermore, we see no empirical constraint against the conclusion that the high-level representations within these interactions are semantic in nature.
Methodological Steps Enhance Sensitivity to Early Lexical Effects
In addition to our experimental design, several simple but critical methodological steps may contribute to the our finding of early lexical-level effects and help explain the absence of such effects in prior studies, including some with designs similar to our own (e.g., Laszlo & Federmeier, 2009; Vissers et al., 2006; Sauseng, Bergmann, & Wimmer, 2004). First, many studies have not analyzed early-latency ERPs (before ∼200 msec) because of a priori hypotheses that semantic variables will modulate the later N400. Additionally, many studies did not sample extensively over occipital-temporal sites, where early visual processing effects are most pronounced (Maurer, Brandeis, & McCandliss, 2005).
The use of sentence-embedded word reading (here and in Dikker et al., 2009, 2010; Dambacher et al., 2009; Sereno et al., 2003) may engage semantic processing and anticipatory commitments more so than commonly used isolated word recognition tasks (e.g., lexical decision on single words). At least one study that failed to find very early effects for misspellings (Sauseng et al., 2004) may be affected by the difficulty of recognizing a misspelling as such without sentence context (e.g., “taksi” as a misspelling of “taxi”).
Finally, Hauk and Pulvermüller (2004; see also Penolazzi et al., 2007) demonstrated that effects of word length on early ERP components can obscure similar latency effects of lexical semantic variables. Word length variability may be problematic even when matched across conditions; for instance, if word length affects the latency of lexical semantic effects, then word length variability might blur lexical effects in time. This issue may be addressable through statistical modeling of length effects (Hauk & Pulvermüller, 2004) or by strictly constraining length, as we have done here.
Conclusions
Our results require anticipatory processing models in which the response to unmatched predictions is not proportional to the degree of dissimilarity between predicted and actual inputs. Instead, the earliest mismatch responses were stronger for stimuli that deviated mildly from predictions. We propose a model involving rapid interactions between sublexical and lexical semantic representations within the first ∼130 msec of visual word recognition. The data do not, however, rule out models involving anticipation but precluding early lexical semantic access (e.g., Solomyak & Marantz, 2009). The findings contradict a widespread assumption that processing during this time window belongs to early-stage processing within a hierarchical feedforward cascade of visual analysis (e.g., Tarkiainen et al., 1999). This standard view does not capture recurrent dynamics, which are increasingly understood as fundamental to visual processing. Anatomical findings indicate dense reentrant projections from higher-level cortex to lower-level sensory areas, consistent with feedback modulation of low-level processing (Lamme & Roelfsema, 2000). Neuroimaging studies of word recognition indicate top–down influences on low levels within the visual cortical hierarchy, although they do not determine the time course of such effects (Twomey, Kawabata Duncan, Price, & Devlin, 2011). Recent computational models emphasize the ability of small amounts of recurrent interaction to exert profound influences on visual object processing (Epshtein, Lifshitz, & Ullman, 2008). In the context of these indications of recurrent processing, it is plausible that anticipatory priming, in combination with relatively short periods of recurrent interaction, give rise to sophisticated linguistic computations at very early latencies following stimulus onset. Such conclusions help unify models based on neurophysiological data with those based on behavioral data (e.g., eye movement latencies), where conclusions of lexical access within ∼200 msec have been more common (Sereno & Rayner, 2003).
Acknowledgments
We thank Tim Curran, Randy O'Reilly, Akira Miyake, and three anonymous reviewers for their helpful comments on this manuscript.
Reprint requests should be sent to Albert Kim, Institute of Cognitive Science, University of Colorado, Box 594 UCB, Boulder, CO 80309, or via e-mail: albert.kim@colorado.edu.
Notes
Dehaene et al. (2005) describe the extraction of such visual features as the “front end” of visual word recognition; the processes that generate orthographic representations are often not addressed explicitly by models of word recognition, which instead assume orthographic representations as bottom-level input.
Sixteen of the items (9%) in our actual lists were not included in the cloze test, because of edits to the stimuli that occurred after the cloze test.
We assume that cortical systems “settle” into locally stable states through a process of lateral inhibitory competition, which can be accelerated when top–down feedback boosts one representation and enhances its ability to inhibit its competitors (cf. O'Reilly, 1998).