Abstract

Spoken word recognition involves the activation of multiple word candidates on the basis of the initial speech input—the “cohort”—and selection among these competitors. Selection may be driven primarily by bottom–up acoustic–phonetic inputs or it may be modulated by other aspects of lexical representation, such as a word's meaning [Marslen-Wilson, W. D. Functional parallelism in spoken word-recognition. Cognition, 25, 71–102, 1987]. We examined these potential interactions in an fMRI study by presenting participants with words and pseudowords for lexical decision. In a factorial design, we manipulated (a) cohort competition (high/low competitive cohorts which vary the number of competing word candidates) and (b) the word's semantic properties (high/low imageability). A previous behavioral study [Tyler, L. K., Voice, J. K., & Moss, H. E. The interaction of meaning and sound in spoken word recognition. Psychonomic Bulletin & Review, 7, 320–326, 2000] showed that imageability facilitated word recognition but only for words in high competition cohorts. Here we found greater activity in the left inferior frontal gyrus (BA 45, 47) and the right inferior frontal gyrus (BA 47) with increased cohort competition, an imageability effect in the left posterior middle temporal gyrus/angular gyrus (BA 39), and a significant interaction between imageability and cohort competition in the left posterior superior temporal gyrus/middle temporal gyrus (BA 21, 22). In words with high competition cohorts, high imageability words generated stronger activity than low imageability words, indicating a facilitatory role of imageability in a highly competitive cohort context. For words in low competition cohorts, there was no effect of imageability. These results support the behavioral data in showing that selection processes do not rely solely on bottom–up acoustic–phonetic cues but rather that the semantic properties of candidate words facilitate discrimination between competitors.

INTRODUCTION

Cognitive models of spoken word recognition have established that recognizing spoken words involves the rapid, incremental interpretation of the continuously changing speech input and its mapping onto representations of lexical meaning (Luce & Pisoni, 1998; Gaskell & Marslen-Wilson, 1997; Norris, 1994; Marslen-Wilson, 1987; McClelland & Elman, 1986; Marslen-Wilson & Welsh, 1978). Although the neural basis of spoken word recognition has been extensively studied for over 100 years, neural models remain somewhat underspecified (Hickok & Poeppel, 2007; Boatman, 2004; Dronkers, Wilkins, Van Valin, Redfern, & Jaeger, 2004; Indefrey & Cutler, 2004; Scott & Wise, 2004) compared with cognitive models, which have attempted to specify the detailed structure of the word recognition process, including the architecture, representations, and processes involved. According to cognitive models of spoken word recognition, activating the meanings of words from spoken inputs is not a simple matter of recognizing the sounds of a word and mapping these onto stored representations of a word's meaning. Extensive psycholinguistic research has suggested that speech–meaning mapping involves continuous processes of activation and competition among multiple candidates (e.g., Gaskell & Marslen-Wilson, 1997, 2002; Allopenna, Magnuson, & Tanenhaus, 1998; Norris, 1994; Zwitserlood, 1989; Marslen-Wilson, 1987; McClelland & Elman, 1986).

An early example of this type of model—the cohort model (Marslen-Wilson, 1987; Marslen-Wilson & Welsh, 1978), a version of which was implemented in TRACE (McClelland & Elman, 1986)—claims that word-initial speech sounds (e.g., “/æl/” of “alligator”) simultaneously activate a cohort of word candidates (e.g., “alcohol,” “albatross,” “alligator”) sharing the same initial sound sequence (“/æl/”), which compete with each other for recognition. Word candidates continue to be evaluated against the incoming sensory input, and the activation levels of words that mismatch gradually decline (Marslen-Wilson, 1987; Tyler, 1984). Selection occurs when the evidence is sufficiently strong to support one of the candidate words. Although the initial activation of cohort competitors is claimed to be based purely on the sensory input, variants of this type of model take different approaches to the relationship between phonology and other levels of the system. For example, interactive activation models allow top–down feedback between different levels of analysis (e.g., TRACE; McClelland & Elman, 1986), whereas other models propose a strictly feedforward architecture in which different types of information are integrated over time in the absence of feedback between levels (e.g., Merge; Norris, McQueen, & Cutler, 2000). Still other models assume a distributed connectionist architecture in which the speech input is mapped directly onto distributed representations of multiple levels of representation, such as the word's phonology and its meaning (Gaskell & Marslen-Wilson, 1995, 1997). The connection weights in the network code information about both mappings such that the retrieval of a word involves the interaction of the two types of information. According to this kind of model, the interaction of form and meaning is an integral part of word recognition and contributes to the earliness with which a word can be recognized (Marslen-Wilson, 1987).

This claim for the interaction of form and meaning in the process of word recognition has been supported by a variety of studies in both written and spoken modalities (e.g., Huang & Pinker, 2010; Colombo, Pasini, & Balota, 2006; Hino & Lupker, 1996, 2000; Tyler, Voice, & Moss, 2000; Cortese, Simpson, & Woolsey, 1997; Hillis & Caramazza, 1995; Strain, Patterson, & Seidenberg, 1995; Miceli, Capasso, & Caramazza, 1994). Both Tyler et al. (2000) and Strain et al. (1995) found that the difficulty of discriminating between words on the basis of the bottom–up input (either their orthography or phonology) was reduced by semantic constraints. For example, Tyler et al. (2000) found that semantic information (measured by increased imageability) had a stronger facilitatory effect on spoken word recognition when words were members of large cohorts. When there was more competition among the set of activated word candidates, making it more difficult to differentiate a word from the other activated members of its cohort, semantic information aided the discrimination process.

This facilitatory effect was claimed to arise from the increased activation levels of highly imageable words, which increased their discriminability from their competitors. In contrast, words in small cohorts generate less competition and, thus, are easier to differentiate. Here there is less opportunity for imageability to boost activation and to influence the discrimination process. In this kind of model, therefore, both phonological and semantic information contribute evidence differentiating between cohort members and contributing to the selection of a single candidate. Imageability has been defined in terms of the richness of a word's semantic features, with high imageability words having a greater number of semantic features than low imageability words (Tyler et al., 2000; Plaut & Shallice, 1993). Larger numbers of semantic features are associated with greater activation and, within the context of interactive activation and distributed models, this increased activity boosts the evidence for a specific word candidate, leading to faster recognition latencies (Pexman, Lupker, & Hino, 2002).

Neuroimaging and neuropsychological studies have identified specific neural regions involved in spoken word recognition, with the superior temporal gyrus (STG) being primarily involved in sound processing, whereas more ventral portions of the STG, the middle temporal gyrus (MTG), and angular gyrus (AG) are involved in accessing meaning representations (Humphries, Binder, Medler, & Liebenthal, 2006; Orfanidou, Marslen-Wilson, & Davis, 2006; Prabhakaran, Blumstein, Myers, Hutchison, & Britton, 2006; Spitsyna, Warren, Scott, Turkheimer, & Wise, 2006; Dronkers et al., 2004; Rissman, Eliassen, & Blumstein, 2003). Damage to these regions often leads to comprehension deficits (Dronkers et al., 2004; Hickok & Poeppel, 2004; Hillis et al., 2001; Dronkers, Redfern, & Knight, 2000). In addition, the left inferior frontal gyrus (LIFG) is claimed to be involved in those aspects of lexical processing that involve selection between competing alternatives (e.g., Bozic, Tyler, Ives, Randall, & Marslen-Wilson, 2010; Snyder, Feigenson, & Thompson-Schill, 2007; Kan, Kable, Van Scoyoc, Chatterjee, & Thompson-Schill, 2006). However, the differential contribution of these regions in the various processes involved in the mapping from sound onto meaning representations and their interactions remains unclear. Although many studies have investigated the neural underpinnings of spoken word comprehension, few have been conducted within the context of cognitive models of the processes and representations involved (see Prabhakaran et al., 2006, for an exception).

In the present fMRI study, we aimed to investigate the functional organization of the neural language system with respect to the contributions of phonology and semantics to spoken word recognition. We asked whether the neural language system is modulated by the joint contributions of phonology and semantics, as has been claimed in various cognitive models and observed in behavioral studies (Tyler et al., 2000). We presented listeners with a series of spoken words and nonwords for lexical decision, manipulating lexical (cohort competition) and semantic (imageability) variables in a factorial design similar to that used in Tyler et al. (2000), to test the hypothesis that imageability and cohort competition interact during spoken word processing and that this interaction is reflected in differential patterns of neural activity. We predicted that increased cohort competition should generate increased activation driven by the additional demands involved in selecting between many activated candidates. This variable should most plausibly involve inferior frontal cortex, especially BA 47, a region which has been associated with competition and selection processes (e.g., Wright, Randall, Marslen-Wilson, & Tyler, 2011; Thompson-Schill, d'Esposito, & Kan, 1999; Thompson-Schill, D'Esposito, Aguirre, & Farah, 1997) and with cognitive control mechanisms more generally (Wagner, Pare-Blagoey, Clark, & Poldrack, 2001). We expected to see effects of semantics in the STG/MTG, especially in its posterior extent, because these regions are typically activated in tasks that involve the mapping from sound to meaning (Davis & Johnsrude, 2003), with the left posterior MTG (LpMTG)/AG being specifically activated in response to variations in imageability, with high imageability words generating greater activation compared with low imageability words (Binder, Westbury, McKiernan, Possing, & Medler, 2005). We also predicted that the interaction between cohort competition and semantics will modulate activity within the STG/MTG stream, particularly the posterior regions, on the hypothesis that these regions are associated with language-specific speech processing (Hickok & Poeppel, 2004; Binder et al., 1997, 2000).

METHODS

Participants

Fourteen healthy volunteers (seven men and seven women, with ages of 19–33 years) took part in this study. All were native English speakers and were right-handed (Edinburgh Handedness Inventory; Oldfield, 1971) with normal hearing. They gave informed consent and were compensated for their time. The study was approved by Cambridgeshire 3 Research Ethics Committee (United Kingdom).

Design and Materials

We manipulated cohort competition (high/low cohort competition) and imageability (high/low imageability) in a 2 × 2 factorial design with 90 real words in each of the four experimental conditions (see Table 1) and an equal number of nonwords (360) as fillers. The sets of target words (real words) were members of large/high competition or small/low competition cohorts from the CELEX database (Baayen, Pipenbrook, & Gulikers, 1995). Given the importance of a word's onset in spoken word recognition (Marslen-Wilson & Zwitserlood, 1989; Marslen-Wilson, 1987; Marslen-Wilson & Welsh, 1978), each target word's cohort was defined as the set of all word candidates sharing the same initial two phonemes as the target word. For example, when hearing the word “alligator,” words such as “alcohol” and “albatross,” which share the same initial phonemes with “alligator,” will initially be activated (as will other words which also share the same initial phonemes). All of these words constitute the initial cohort.

Table 1. 

Stimulus Characteristics for Each Experimental Condition with Mean Values (SDs)


Condition
High Imageability High Competition
High Imageability Low Competition
Low Imageability High Competition
Low Imageability Low Competition
Example monkey palace welfare crisis 
Imageability 536 (82) 548 (71) 236 (69) 236 (76) 
Cohort competition 0.4 (0.4) 6.9 (12.6) 0.3 (0.3) 9.6 (16.0) 
Word frequency 0.81 (0.51) 1.51 (0.51) 0.76 (0.37) 1.64 (0.62) 
Biphone frequency 0.020 (0.012) 0.015 (0.013) 0.021 (0.015) 0.016 (0.013) 
Duration (msec) 529 (117) 527 (112) 587 (117) 548 (105) 
Number of syllables 2.1 (0.7) 1.9 (0.8) 2.1 (0.7) 1.9 (0.7) 
Number of phonemes 5.3 (1.4) 4.7 (1.5) 5.3 (1.6) 4.9 (1.4) 

Condition
High Imageability High Competition
High Imageability Low Competition
Low Imageability High Competition
Low Imageability Low Competition
Example monkey palace welfare crisis 
Imageability 536 (82) 548 (71) 236 (69) 236 (76) 
Cohort competition 0.4 (0.4) 6.9 (12.6) 0.3 (0.3) 9.6 (16.0) 
Word frequency 0.81 (0.51) 1.51 (0.51) 0.76 (0.37) 1.64 (0.62) 
Biphone frequency 0.020 (0.012) 0.015 (0.013) 0.021 (0.015) 0.016 (0.013) 
Duration (msec) 529 (117) 527 (112) 587 (117) 548 (105) 
Number of syllables 2.1 (0.7) 1.9 (0.8) 2.1 (0.7) 1.9 (0.7) 
Number of phonemes 5.3 (1.4) 4.7 (1.5) 5.3 (1.6) 4.9 (1.4) 

Cohort competition refers to the ratio of target word frequency to the summed frequency of all word candidates in the initial cohort.

We defined cohort competition as the ratio of the target word frequency to the summed frequency of all its cohort members, multiplied by 100; the smaller the ratio, the higher the cohort competition. This measure reflects the competitive relationship of the target word and its cohort neighbors as well as the relative weight of the target word within the cohort in terms of frequency (Tyler et al., 2000). Using competition ratios, we manipulated two levels of cohort competition: high competition (or large cohorts) and low competition (or small cohorts) conditions. In large/high competition cohorts, there are many competitors to the target word, and the frequency of the target word typically only accounts for a small proportion of the summed frequency of all cohort members. For example, the word “monkey” has 136 cohort neighbors sharing the same initial sounds (“/mΛ/”; e.g., “Monday, money, monk, month, mother, muscle”), and the frequency of “monkey” accounts for less than 1% of the summed frequency of cohort members. In contrast, the word “woman” falls into a low competition cohort with 51 neighbors sharing the same onset (“/wu/”; e.g., “wood, wolf, wool”) where its frequency accounts for 40% of the summed frequency of all cohort members. Word candidates in the noninitial cohort (e.g., “key” in “monkey”) were not included in these calculations of the cohort environment, because this study only focused on initial cohort activation. An ANOVA showed that cohort competition ratios did not differ within each of the high and low competition conditions (F(1, 356) = 1.73, p > .1) and that competition ratios were significantly higher in high cohort competition compared with low cohort competition conditions (F(1, 356) = 54.42, p < .001). The computation of cohort competition on this basis had the effect that the target words in the high and low competition sets tended to differ in word frequency (see Table 1), as well in the size of their respective cohorts. Possibly related to this, there were also small differences across conditions in biphone frequency (as calculated using a Web-based phonotactic probability calculator; www.bncdnet.ku.edu/cgi-bin/DEEC/post_ppc.vi; see also Vitevitch & Luce, 2004).

Imageability Pretest

We obtained imageability ratings for the target words by means of a pretest carried out in our laboratory using the same rating method used to collect norms in the MRC Psycholinguistic Database (Coltheart, 1981). Imageability was measured using a 7-point scale wherein “1” represented words that were not at all imageable and “7” represented words that arouse mental images most readily. Fifteen participants who did not take part in the imaging study performed the rating task. The mean imageability scores (multiplied by 100) for each of the four conditions are shown in Table 1, together with mean spoken word duration (msec) and cohort competition values. An ANOVA showed that imageability ratings for the two high imageability conditions (536, 548) and the two low imageability conditions (236, 236) did not significantly differ (F(1, 356) < 1), and there was a significant difference between high and low imageability conditions (F(1, 356) = 1507.61, p < .001).

Because word duration varied across the sets, with low imageability words being, on average, longer (42 msec) than high imageability words (t(358) = 3.53, p < .001), scores on this variable were partialled out in the analyses of the RT and neuroimaging data. These differences in physical duration are also reflected in small differences in the average number of syllables (mean = 1.9–2.1) and phonemes (mean = 4.7–5.3) across sets, also listed in Table 1.

Three hundred sixty nonwords from another experiment acted as fillers in this study. These nonwords varied in the timing of the nonword point, where the sequence could no longer potentially continue as a meaningful real word and was definitely identifiable as a nonword. There were 72 phonotactically illegal items (e.g., “kvint”) and 288 phonotactically legal ones (e.g., “sned”). These nonwords consisted of 72 monosyllabic, 144 disyllabic, and 144 trisyllabic items, which were roughly matched in syllable number to the real words, which in turn comprised 94 monosyllabic, 169 disyllabic, and 197 trisyllabic items. The mean phoneme number was not exactly matched across words (mean = 5.1, SD = 1.5) and nonwords (mean = 6.4, SD = 1.8).

Procedure

Both word and nonword stimuli were recorded by a woman native speaker of British English at a sampling rate of 44,100 Hz and downsampled using CoolEdit Software (22,050 Hz, 16-bit resolution, monochannel) for presentation with the experimental software. The mean duration of the real words was 548 msec (SD = 115 msec). The mean duration of the nonword fillers was 782 msec (SD = 159 msec). Stimuli were delivered to participants during scanning via Etymotic headphones and were preemphasized before presentation (www.mrc-cbu.cam.ac.uk/∼rhodri/headphonesim.htm). Preemphasis optimizes the headphone frequency range in the scanner environment, providing a better auditory quality. We also included 80 sequences of silence. Stimuli were presented using CAST experimental software (www.mrc-cbu.cam.ac.uk/∼maarten/CAST.htm).

Participants were instructed to respond to each stimulus by pressing a response key with their index finger for real words and middle finger for nonwords and to make no response to the baseline items. Items were divided into four sessions with items in each condition balanced across sessions. Within each session, the order of presentation of real words, nonwords, and silence was pseudorandomized, such that no more than four real words or nonwords followed one another. Session order was counterbalanced across participants. Each session consisted of 200 experimental trials with five lead-in dummy scans for MRI signal stabilization and two dummy scans at the end. Each session lasted 12 min, and participants had a brief rest between sessions. Before the first session, there was a short practice session of 12 items to familiarize participants with the procedure inside the scanner. During the experiment, both RTs and errors were recorded.

MRI Acquisition and Imaging Analysis

Scanning was performed on a 3-T Tim Trio (Siemens, Munich, Germany) at the MRC Cognition and Brain Sciences Unit, Cambridge, United Kingdom, using a gradient-echo EPI sequence with head coils. Each functional scan consisted of 32 oblique axial slices, 3-mm thick (0.75-mm gap between slices) with in-plane resolution of 3 mm, repetition time = 3.4 sec, acquisition time = 2 sec, echo time = 30 msec, flip angle = 78°, and field of view = 192 mm × 192 mm. MPRAGE T1-weighted scans were acquired for anatomical localization.

We used a fast sparse imaging protocol (Hall et al., 1999) in which speech sounds were presented in the 1.4 sec of silence between scans. There was a silent gap of 100 msec between the end of a scan and the onset of the subsequent stimulus, minimizing the influence of preceding scanning noise on the perception of the speech sounds, especially their onsets. The time between successive stimuli was jittered to increase the chance of sampling the peak of hemodynamic response.

Preprocessing and statistical analysis were carried out in SPM5 (Wellcome Institute of Cognitive Neurology, London, United Kingdom; www.fil.ion.ucl.ac.uk) under MATLAB (Mathworks Inc., Sherborn, MA). EPI images were realigned to the first EPI image (excluding the five initial lead-in images) to correct for head motion, then spatially normalized to a standard Montreal Neurological Institute (MNI) EPI template, using a cutoff of 25 mm for the discrete cosine transform functions. Statistical modeling was done in the context of the general linear model (Friston et al., 1995) as implemented in SPM5, using an 8-mm FWHM Gaussian smoothing kernel.

In the fixed effect analysis for each participant, we used a parametric modulation design (Henson, 2004; Buchel, Wise, Mummery, Poline, & Friston, 1996) to model the four experimental conditions. To keep the analysis model simple and efficient, we first analyzed the data using as few modulators as possible and then ran supplementary analyses using a variety of additional modulators to verify the main model. For the main model, in each of the four testing sessions, the design matrix consisted of 14 columns of variables—nonwords, real words, stimulus duration, the four experimental conditions [low imageability high competition (LIHC), low imageability low competition (LILC), high imageability high competition (HIHC), and high imageability low competition (HILC)], null events (baseline), and six movement parameters calculated during realignment. Among these variables, there were three independent events: nonwords, real words, and null events. For the real word events, there were five parametric modulators—duration and the four experimental conditions. Each condition was composed of a quarter of all real word items, which were labeled as “1” in the corresponding modulator column, with the remaining real words labeled as “0” in the same column. For example, the item “monkey” was labeled as “1” in the modulator column for the HIHC condition, and items from the other three experimental conditions were labeled as “0” in the same column. At the same time, the same item “monkey” was labeled as “0” for the modulator columns corresponding to the other three experimental conditions. The modulators were orthogonalized, and any shared variance among these modulators was removed. This had the effect of partialling out any differences in duration among these four experimental conditions.1

Trials were modeled using a canonical hemodynamic response function, and the onset of each stimulus was taken as the onset of the trial in the SPM analysis model. The data for each participant were analyzed using a fixed effects model and then combined into a group random effects analysis. Activations were thresholded at p < .001, uncorrected, at the voxel level, and significant clusters were reported only when they survived p < .05, cluster level corrected for multiple comparisons, unless otherwise stated. SPM coordinates were reported in MNI space. Regions were identified by using a nonlinear transformation (Brett, 2001) from MNI coordinates to Talairach space (Talairach & Tournoux, 1988) and confirmed using the AAL and Brodmann templates as implemented in MRicron (www.MRicro.com/MRicron).

Because this study was aimed at investigating modulation within the language processing network, a bilateral fronto-temporal mask was applied to all the contrasts (unless otherwise stated) in group random effects analyses. This mask consisted of bilateral IFG (BA 44, 45, 47), anterior cingulate gyri, and the entire temporal lobes. The mask covered the typical left perisylvian language areas that are of theoretical interest and their right hemisphere (RH) homologues (e.g., Tyler & Marslen-Wilson, 2008; Hickok & Poeppel, 2004, 2007; Boatman, 2004; Dronkers et al., 2004; Indefrey & Cutler, 2004; Scott & Wise, 2004) while excluding regions not typically involved in language processing.

RESULTS

Behavioral Results

Only correct responses (96.0%) were included in the RT analyses. Stimulus duration was partialled out as a covariate in the analyses because it strongly correlated with RTs, r = 0.40, p < .001. Only item analyses (F2) were performed as duration could not be a covariate in subject analyses (F1). In a 2 (high and low cohort competition) × 2 (high and low imageability) ANCOVA with duration as a covariate (see Table 1), there was a significant main effect of Cohort Competition (F2(1, 353) = 4.06, p < .05), with words in high competition cohorts responded to more slowly than those in low competition cohorts (917 msec vs. 897 msec). There was also a significant Imageability effect (F2(1, 353) = 13.72, p < .001), with high imageability words responded to faster than low imageability words (888 msec vs. 926 msec), and a significant interaction between Imageability and Cohort Competition (F2(1, 353) = 4.44, p < .05), which was because of faster RTs to high compared with low imageability words in the high cohort competition conditions (888 msec vs. 946 msec, F2(1, 175) = 17.06, p < .001), whereas there was no difference between high and low imageability words in low competition/small cohorts (889 msec vs. 905 msec, F2(1, 177) = 1.21, p > .1). This differentiated response pattern replicated the findings of Tyler et al. (2000), suggesting that there is a greater facilitatory effect of imageability on spoken word recognition when words fall into large cohorts, which have many competitors. For words in low competition cohorts, imageability plays a less significant role.

Imaging Results

We first established that the task and stimuli elicited activation within those regions of the brain typically activated in spoken language tasks by directly contrasting the activation resulting from all real words against the silent baseline. This analysis produced significant activation in a bilateral fronto-temporal network, including bilateral STG (BA 41, 42, 21, 22, 38), MTG (BA 21, 22), anterior cingulate (BA 24), posterior ITG (BA 20, 37), and LIFG (BA 44, 45, 47; Figure 1 and Table 2), with greater activation in the left hemisphere (LH) than RH. These regions are typically activated in neuroimaging studies of spoken language (e.g., Tyler, Stamatakis, Post, Randall, & Marslen-Wilson, 2005; Davis & Johnsrude, 2003; Binder et al., 2000; Price et al., 1996).

Figure 1. 

Significant activation for the contrast of all real words minus silence shown rendered on the surface of a canonical brain image (above) and in an axial view (below). Color scale indicates t value of contrast. L = left hemisphere.

Figure 1. 

Significant activation for the contrast of all real words minus silence shown rendered on the surface of a canonical brain image (above) and in an axial view (below). Color scale indicates t value of contrast. L = left hemisphere.

Table 2. 

Areas of Activity for the Contrast of All Real Words Minus Silence

Regions
Cluster Level
Voxel Level
Coordinates
pcorrected
Extent
pcorrected
Z
x
y
z
LSTG (BA 42) .000 4791 .000 6.82 −56 −20 12 
 L MTG (BA 21)   .000 5.65 −60 −30 
 LSTG (BA 22)   .002 5.31 −46 −26 14 
R STG (BA 22) .000 2741 .001 5.51 66 −8 10 
 R MTG (BA 21)   .001 5.41 44 −34 −4 
 R hippocampus (BA 20)   .006 5.09 36 −24 −10 
L anterior cingulate (BA 24) .000 486 .005 5.14 −4 24 28 
 R anterior cingulate (BA 24)   .015 4.92 22 28 
 L anterior cingulate (BA 32)   .039 4.72 −10 12 26 
R fusiform gyrus (BA 37) .002 180 .006 5.09 40 −56 −24 
 R inferior temporal gyrus (BA 37)   .012 4.96 52 −54 −24 
 R fusiform gyrus (BA 37)   .130 4.37 28 −38 −22 
LIFG (BA 44) .000 434 .057 4.62 −46 24 
 L precentral gyrus (BA 6)   .184 4.25 −60 22 
 L rolandic opercularis (BA 6)   .189 4.24 −50 16 
LIFG (BA 45) .004 158 .295 4.08 −44 26 12 
Regions
Cluster Level
Voxel Level
Coordinates
pcorrected
Extent
pcorrected
Z
x
y
z
LSTG (BA 42) .000 4791 .000 6.82 −56 −20 12 
 L MTG (BA 21)   .000 5.65 −60 −30 
 LSTG (BA 22)   .002 5.31 −46 −26 14 
R STG (BA 22) .000 2741 .001 5.51 66 −8 10 
 R MTG (BA 21)   .001 5.41 44 −34 −4 
 R hippocampus (BA 20)   .006 5.09 36 −24 −10 
L anterior cingulate (BA 24) .000 486 .005 5.14 −4 24 28 
 R anterior cingulate (BA 24)   .015 4.92 22 28 
 L anterior cingulate (BA 32)   .039 4.72 −10 12 26 
R fusiform gyrus (BA 37) .002 180 .006 5.09 40 −56 −24 
 R inferior temporal gyrus (BA 37)   .012 4.96 52 −54 −24 
 R fusiform gyrus (BA 37)   .130 4.37 28 −38 −22 
LIFG (BA 44) .000 434 .057 4.62 −46 24 
 L precentral gyrus (BA 6)   .184 4.25 −60 22 
 L rolandic opercularis (BA 6)   .189 4.24 −50 16 
LIFG (BA 45) .004 158 .295 4.08 −44 26 12 

LIFG (BA 47) was significantly activated but not shown in any of these peak voxels. The activated LIFG (BA 47) is within the large LSTG cluster. L = left; R = right.

We then examined the main effects of cohort competition and imageability and the interaction between them. The effect of cohort competition was calculated by collapsing across imageability and comparing the high against low competition conditions. Words in high competition cohorts produced significantly greater activation than those in low competition cohorts in two left inferior frontal regions—LBA 45/47 (peak voxel: −42 28 2 in BA 45) and LBA 47 (Table 3 and Figure 2)—and one homologous RH region (BA 47, at cluster corrected p = .060, extending into BA 45 at the slightly lower threshold of .01 voxel level, corrected at the .05 cluster level). The mean activation values for each cluster were extracted and averaged in each of the four experimental conditions using MarsBaR (marsbar.sourceforge.net/). Plots in Figure 2 show greater activation for words in high compared with low competition conditions in all three activated regions.

Table 3. 

Areas of Activity for the Contrast of High Minus Low Cohort Competition Condition

Regions
Cluster Level
Voxel Level
Coordinates
pcorrected
Extent
pcorrected
Z
x
y
z
LIFG (BA 45) .000 298 .346 3.99 −42 28 
 LIFG (BA 47)   .705 3.62 −52 34 −8 
 LIFG (BA 45)   .913 3.36 −38 28 12 
LIFG (BA 47) .010 141 .400 3.93 −34 20 −12 
RIFG (BA 47)a .060 83 .866 3.43 36 26 −10 
Regions
Cluster Level
Voxel Level
Coordinates
pcorrected
Extent
pcorrected
Z
x
y
z
LIFG (BA 45) .000 298 .346 3.99 −42 28 
 LIFG (BA 47)   .705 3.62 −52 34 −8 
 LIFG (BA 45)   .913 3.36 −38 28 12 
LIFG (BA 47) .010 141 .400 3.93 −34 20 −12 
RIFG (BA 47)a .060 83 .866 3.43 36 26 −10 

aActivation at p < .001, voxel-level uncorrected, and p = .06, cluster-level corrected.

Figure 2. 

Significant clusters of activation for the contrast of high minus low cohort competition (top). RH cluster activation rendered at p < .001, voxel level uncorrected, p < .06, cluster level corrected. The same cohort competition effect as in the top shown on the axial view of the brain template; L = left hemisphere (middle). Plots showing the mean activation for high and low competition conditions in each of the three significant clusters (bottom).

Figure 2. 

Significant clusters of activation for the contrast of high minus low cohort competition (top). RH cluster activation rendered at p < .001, voxel level uncorrected, p < .06, cluster level corrected. The same cohort competition effect as in the top shown on the axial view of the brain template; L = left hemisphere (middle). Plots showing the mean activation for high and low competition conditions in each of the three significant clusters (bottom).

The main effect of imageability was assessed by collapsing across cohort competition and comparing high vs. low imageability sets. High imageability words generated more activity than low imageability words in LpMTG (BA 39; peak: −46 −78 26, 243 voxels, Z = 4.51) extending to the left AG (BA 39; see Figure 3) consistent with Binder et al. (2005). We tested the differential effects of imageability on cohort competition by subtracting the imageability effect (high vs. low imageability) for words in high competition cohorts from the imageability effect for words in low competition cohorts ([HIHC − LIHC] − [HILC − LILC]). This comparison produced significant activation in left posterior STG (LpSTG)/MTG (BA 21, 22 (−62 −28 0), Z = 3.95), which included Wernicke's area (Figure 4A). Figure 4B also shows the effects at a slightly lower threshold.

Figure 3. 

Significantly activated clusters for the contrast of high minus low imageability (left). Plots of the mean activation values for high and low imageability conditions in the significant cluster (right).

Figure 3. 

Significantly activated clusters for the contrast of high minus low imageability (left). Plots of the mean activation values for high and low imageability conditions in the significant cluster (right).

Figure 4. 

(A) Significant cluster of activity in the LpSTG/MTG (BA 21, 22) at p < .001, voxel level uncorrected, and p < .05, cluster level corrected for the interaction between cohort competition and imageability. (B) The same contrast rendered at a lower threshold of p < .01, voxel level uncorrected, and p < .05, cluster level corrected. (C) Plot showing the mean activation value for the imageability effect (high vs. low imageability words) in high and low competition cohorts in the significant cluster in A.

Figure 4. 

(A) Significant cluster of activity in the LpSTG/MTG (BA 21, 22) at p < .001, voxel level uncorrected, and p < .05, cluster level corrected for the interaction between cohort competition and imageability. (B) The same contrast rendered at a lower threshold of p < .01, voxel level uncorrected, and p < .05, cluster level corrected. (C) Plot showing the mean activation value for the imageability effect (high vs. low imageability words) in high and low competition cohorts in the significant cluster in A.

To further examine this effect, we selected the significant cluster (96 voxels) in the LpSTG/MTG (BA 21, 22) at voxel threshold of p < .001, uncorrected, as an ROI, and extracted the mean beta values for this region in the four experimental conditions. These values varied across conditions, with high imageability words producing greater activity compared with low imageability words in high competition cohorts, (t(13) = 2.91, p < .05) and low imageability words producing more activity than high imageability words when cohort competition was low (t(13) = 3.14, p < .01; Figure 4C). These results suggest that this region of STG/MTG plays a central role in the mapping from sound to meaning.

We carried out a second analysis to investigate how the activation arising from the main variables of cohort competition and imageability was related to the lexical variables of target word frequency (and to a lesser extent biphone frequency) that it had not been possible to match across conditions, as noted earlier (see Table 1). This analysis showed no effect of biphone frequency but revealed significant effects of word frequency in LIFG (BA 44, 45, 47) and right IFG (RIFG; BA 45, 47) and LSTG/MTG (BA 21, 22) with increased activation in these regions as word frequency decreased. These effects largely overlapped with the cohort competition effects in bilateral inferior frontal regions. This is not surprising given the high correlation between cohort competition and word frequency (log-transformed; r = 0.74, p < .01). Because cohort competition is defined as the ratio of target word frequency to the summed frequency of the initial cohort members, higher frequency target words will tend to have higher cohort competition scores. This means that target word frequency may play a dual role in the systems under investigation here, influencing both the degree of competition between the target word and its cohort competitors as well as the processes involved in the initial access of the target from the speech input. In the current context, therefore, it is important to establish what the balance is between these two roles and whether we are genuinely measuring effects of competition in the on-line selection processes underlying spoken word recognition. In a further analysis, where target word frequency is partialled out, the key finding in this article—the interaction between competition and imageability—remains intact. Although this indeed suggests that the cohort competition and selection effect was only partially affected by word frequency, we also found that the residual cohort competition effect did not survive at the normal statistical threshold.

Given this weakness of the main cohort competition effect when word frequency was partialled out, we conducted a further analysis of cohort effects using a competition measure on the basis of simple cohort size (the raw number of cohort members)—in effect returning to one of the measures used successfully in the original Tyler et al. (2000) article. In this third analysis, we used three parametric modulators to model the same real word event: cohort size (log-transformed), imageability and their interaction (the mean corrected cohort measure multiplied by the mean corrected imageability measure), while partialling out the variables of duration, target word frequency and the summed frequency of the initial cohort members. We found that increasing cohort size significantly correlated with increasing neural activity in a cluster involving the LIFG (BA 47) and insula, extending into LIFG (BA 45) at a lower threshold of p < .01, voxel level uncorrected, and p < .05, cluster level corrected. This cluster largely overlapped with that seen for cohort competition in the main analysis, suggesting that measures of cohort competition and measures of cohort size both tap into the same underlying processes of competition and selection, detectable independent of target word frequency. This model also showed a positive effect of imageability in the LpMTG, whereas the interaction between cohort size and imageability produced activation in the LSTG, both of which overlapped with the corresponding results in the main analysis.

Finally, we asked whether RT differences across conditions affected the results. To test this, we carried out another analysis in which we partialled out RT. We found the same patterns of activation for cohort competition, imageability, and their interaction as that in the main analysis in which RT was not removed.

DISCUSSION

The aim of this experiment was to determine how the processes involved in the mapping from speech sounds to meaning representations are instantiated in the brain. To do this, we used a task that is widely used in studies of spoken word recognition (Wright et al., 2011; Orfanidou et al., 2006; Prabhakaran et al., 2006; Binder et al., 2005; Rissman et al., 2003), together with stimulus contrasts that enabled us to manipulate basic phonological and semantic properties of the inputs to the word recognition system. We found that different processes involved in spoken word recognition modulated activity within different regions of the neural language system. Bilateral inferior frontal cortices (BA 47/45) were maximally sensitive to cohort competition, whereas activity within the LpMTG/AG (BA 39) was modulated by imageability, and the interaction between imageability and cohort competition engaged the LpSTG/MTG (BA 21, 22), including Wernicke's area, showing that this region is differentially responsive to semantic constraints as a function of a word's phonologically defined lexical competitor environment. Imageability had a greater facilitatory effect on the recognition of words which occurred in high competition cohorts, where it is more difficult to differentiate the unique candidate from other cohort competitors, replicating the behavioral results. In contrast, semantics did not significantly facilitate the recognition of words occurring in low competition cohorts, because these are easier to discriminate from their cohort competitors on the basis of phonological information. These results suggest that word recognition is carried out within an interactive system involving processes of activation and competition (Gaskell & Marslen-Wilson, 1997, 2002).

We manipulated semantics by varying imageability, which has been defined in terms of the richness of a word's semantic features, with high imageability words having a greater number of semantic features than low imageability words (Tyler et al., 2000; Plaut & Shallice, 1993). Semantic features are attributes that capture the meaning of a word (e.g., Tyler & Moss, 2001; Tyler et al., 2000; McRae, de Sa, & Seidenberg, 1997), and they play an important role in the initial computation of a word's meaning (Randall, Moss, Rodd, Greer, & Tyler, 2004; Pexman et al., 2002; McRae et al., 1997). In the present study we observed faster lexical decision RTs and increased neural activation for high compared with low imageability words. Words with more semantic features generate greater activation than those with fewer semantic features, and this increased activation facilitates word recognition, as indexed by faster lexical decision latencies.

The interaction of semantic and phonological constraints in the process of recognizing a spoken word was revealed in the relationship between cohort competition and imageability in the LpSTG/MTG (BA 21, 22), where we found a greater imageability effect for words in high competition cohorts compared with those in low competition cohorts.2 These results suggest that word meanings are activated early in the recognition process, such that they modulate neural activity as a function of a word's competitor environment. When words occur in high competitor environments so that it is more difficult to differentiate the target word from its phonological competitors, words with many semantic features receive additional activation, thereby increasing the evidence for the presence of those words and functioning to differentiate them from other words in the cohort.

In contrast, when words occur in small cohorts and are, therefore, relatively easier to differentiate from their cohort competitors, there is less opportunity for semantics to play a role in the recognition process. In such cases, recognition is driven to a greater extent by bottom–up phonological constraints. As shown in Figure 4C, in low competition environments, there is greater activation for low than high imageability words. This reverse imageability effect is in line with previous findings that abstract words produce greater activation than concrete words in the LSTG/MTG (e.g., Sabsevitz, Medler, Seidenberg, & Binder, 2005), because abstract words might be retrieved by association with other words (Binder et al., 2005), and this association recruits extra phonological resources. From another standpoint, if the reverse imageability effect in low competition environments is taken as a baseline, the imageability effect in high competition cohorts becomes even greater, showing a stronger facilitatory role of imageability.

General processing of the phonological and semantic properties of spoken words seems to modulate activity along the posterior middle aspect of the LSTG/MTG and into the left AG. Processes of cohort competition, in contrast, seem to engage the IFG bilaterally. Although the role of various subregions within the inferior frontal cortices in language function is controversial (Makuuchi, Bahlmann, Anwander, & Friederici, 2009; Friederici, Bahlmann, Heim, Schubotz, & Anwander, 2006; Friederici, Fiebach, Schlesewsky, Bornkessel, & von Cramon, 2006; Badre, Poldrack, Pare-Blagoev, Insler, & Wagner, 2005), the debate typically focuses on the LH. In the current study, we found bilateral involvement of inferior frontal regions (BA 47 and 45) as a function of increasing competitive activity between activated members of the word-initial cohort.

Both BA 45 and 47 have been implicated in processes of selection and retrieval (Badre et al., 2005; Thompson-Schill et al., 1999) with processes of selecting between competitors involving BA 45 and controlled retrieval involving BA 47 (Badre et al., 2005). Although activity in these frontal regions, when confined to the LH and in conjunction with activity in functionally related regions (Wright et al., 2011; Badre et al., 2005), may reflect language-specific processes, bilateral inferior frontal activity in the absence of activity elsewhere in the neural language system is more suggestive of a domain-general process of competition (Bozic et al., 2010). Competition and selection among multiple alternatives is common to a range of cognitive functions and is typically associated with increased bilateral frontal activation (Miller & Cohen, 2001).

In summary, these results support a model of the fronto-temporal-parietal neural language system whose activity is modulated by different linguistic processes. The STG, MTG, and AG in the LH were primarily involved in phonological and semantic processing. Although there is some controversy as to whether the AG is a separate region or part of the STG (Rauschecker & Scott, 2009), its functional connections to the STG are not in doubt. The functional relationship between the AG and STG has been shown in many studies on spoken language comprehension (e.g., Binder et al., 1997), suggesting that the STG/MTG extending posteriorly into AG plays a crucial role in mapping from phonological form onto meaning representations. These regions may function as a communication hub where different types of intralexical information, such as semantics and phonology, converge and interact. This is similar to the integration function proposed for this region in relation to other aspects of speech processing, such as sensory and motor binding (Hickok, Okada, & Serences, 2009; Hickok & Poeppel, 2004). The left inferior frontal cortex in contrast seems to play a more general cognitive role of selecting among alternative cohort candidates when co-activating with the RIFG.

This study tested the neural implications of the properties of spoken word processing as instantiated in the class of cognitive models, which propose that mapping from sound to meaning involves continuous processes of activation and competition of multiple candidates (e.g., Allopenna et al., 1998; Gaskell & Marslen-Wilson, 1997; Norris, 1994; Zwitserlood, 1989; Marslen-Wilson, 1987; McClelland & Elman, 1986; Marslen-Wilson & Welsh, 1978). We found that the brain is sensitive both to competition effects and their modulation by semantics, supporting the heuristic value of functional models of spoken word recognition as a basis for developing neural models of these processes.

Acknowledgments

This research was supported by a Medical Research Council (UK) program grant to L. K. T. (grant G0500842), a grant to W. M. W. (U.1055.04.002.00001.01), and subsidies from Cambridge Overseas Trust and from KC Wong Education Foundation to J. Z. We thank Mrs. Marie Dixon for recording the spoken stimuli, the radiographers at MRC Cognition and Brain Sciences Unit for their help with scanning, and all participants involved in this study.

Reprint requests should be sent to Jie Zhuang, Centre for Speech, Language and the Brain, Department of Experimental Psychology, University of Cambridge, Cambridge, United Kingdom CB2 3EB, or via e-mail: jzhuang@csl.psychol.cam.ac.uk.

Notes

1. 

Note that we assume here that this applies to differences in duration whether reflected in physical duration, average number of syllables or average number of phonemes.

2. 

Related facilitatory effects for high imageability words have also been demonstratedalthough in the somewhat different domain of reading aloudin a neuroimaging study by Frost et al. (2005), showing that high imageability compensates for behavioral effects of low frequency and orthographic inconsistency while generating increased activity in the left MTG (BA 21).

REFERENCES

Allopenna
,
P. D.
,
Magnuson
,
J. S.
, &
Tanenhaus
,
M. K.
(
1998
).
Tracking the time course of spoken word recognition using eye movements: Evidence for continuous mapping models.
Journal of Memory and Language
,
38
,
419
439
.
Baayen
,
R. H.
,
Pipenbrook
,
R.
, &
Gulikers
,
L.
(
1995
).
The CELEX lexical database
.
Philadelphia, PA
:
Philadelphia Linguistic Data Consortium, University of Pennsylvania
.
Badre
,
D.
,
Poldrack
,
R. A.
,
Pare-Blagoev
,
E. J.
,
Insler
,
R. Z.
, &
Wagner
,
A. D.
(
2005
).
Dissociable controlled retrieval and generalized selection mechanisms in ventrolateral prefrontal cortex.
Neuron
,
47
,
907
918
.
Binder
,
J. R.
,
Frost
,
J. A.
,
Hammeke
,
T. A.
,
Bellgowan
,
P. S. F.
,
Springer
,
J. A.
, &
Kaufman
,
J. N.
(
2000
).
Human temporal lobe activation by speech and nonspeech sounds.
Cerebral Cortex
,
10
,
512
528
.
Binder
,
J. R.
,
Frost
,
J. A.
,
Hammeke
,
T. A.
,
Cox
,
R. W.
,
Rao
,
S. M.
, &
Prieto
,
T.
(
1997
).
Human brain language areas identified by functional magnetic resonance imaging.
Journal of Neuroscience
,
17
,
353
362
.
Binder
,
J. R.
,
Westbury
,
C. F.
,
McKiernan
,
K. A.
,
Possing
,
E. T.
, &
Medler
,
D. A.
(
2005
).
Distinct brain systems for processing concrete and abstract concepts.
Journal of Cognitive Neuroscience
,
17
,
905
917
.
Boatman
,
D.
(
2004
).
Cortical bases of speech perception: Evidence from functional lesion studies.
Cognition
,
92
,
47
65
.
Bozic
,
M.
,
Tyler
,
L. K.
,
Ives
,
D. T.
,
Randall
,
B.
, &
Marslen-Wilson
,
W. D.
(
2010
).
Bihemispheric foundations for human speech comprehension.
Proceedings of the National Academy of Sciences, U.S.A.
,
107
,
17439
17444
.
Brett
,
M.
(
2001
).
Using the Talairach atlas with the MNI template.
Neuroimage
,
13
,
S85
.
Buchel
,
C.
,
Wise
,
R. J. S.
,
Mummery
,
C. J.
,
Poline
,
J.-B.
, &
Friston
,
K. J.
(
1996
).
Nonlinear regression in parametric activation studies.
Neuroimage
,
4
,
60
66
.
Colombo
,
L.
,
Pasini
,
M.
, &
Balota
,
D. A.
(
2006
).
Dissociating the influence of familiarity and meaningfulness from word frequency in naming and lexical decision performance.
Memory & Cognition
,
34
,
1312
1324
.
Coltheart
,
M.
(
1981
).
The MRC psycholinguistic database.
Quarterly Journal of Experimental Psychology
,
33A
,
497
505
.
Cortese
,
M. J.
,
Simpson
,
G. B.
, &
Woolsey
,
S.
(
1997
).
Effects of association and imageability on phonological mapping.
Psychonomic Bulletin & Review
,
4
,
226
231
.
Davis
,
M. H.
, &
Johnsrude
,
I. S.
(
2003
).
Hierarchical processing in spoken language comprehension.
Journal of Neuroscience
,
23
,
3423
3431
.
Dronkers
,
N. F.
,
Redfern
,
B.
, &
Knight
,
R.
(
2000
).
The neural architecture of language disorders.
In M. Gazzaniga (Ed.),
The new cognitive neurosciences
(pp.
949
958
).
Cambridge, MA
:
MIT Press
.
Dronkers
,
N. F.
,
Wilkins
,
D. P.
,
Van Valin
,
R. D.
, Jr.,
Redfern
,
B. B.
, &
Jaeger
,
J. J.
(
2004
).
Lesion analysis of the brain areas involved in language comprehension.
Cognition
,
92
,
145
177
.
Friederici
,
A. D.
,
Bahlmann
,
J.
,
Heim
,
S.
,
Schubotz
,
R. I.
, &
Anwander
,
A.
(
2006
).
The brain differentiates human and non-human grammers: Functional localization and structural connnectivity.
Proceedings of the National Academy of Sciences, U.S.A.
,
103
,
2458
2463
.
Friederici
,
A. D.
,
Fiebach
,
C. J.
,
Schlesewsky
,
M.
,
Bornkessel
,
I. D.
, &
von Cramon
,
D. Y.
(
2006
).
Processing linguistic complexity and grammaticality in the left frontal cortex.
Cerebral Cortex
,
16
,
1709
1717
.
Friston
,
K. J.
,
Holmes
,
A. P.
,
Worsley
,
K. J.
,
Poline
,
J.-P.
,
Frith
,
C. D.
, &
Frackowiak
,
R. S. J.
(
1995
).
Statistical parametric maps in functional imaging: A general linear approach.
Human Brain Mapping
,
2
,
189
210
.
Frost
,
S. J.
,
Mencl
,
W. E.
,
Sandak
,
R.
,
Moore
,
D. L.
,
Rueckl
,
J. G.
,
Katz
,
L.
,
et al
(
2005
).
A functional magnetic resonance imaging study of the tradeoff between semantics and phonology in reading aloud.
NeuroReport
,
16
,
621
624
.
Gaskell
,
M. G.
, &
Marslen-Wilson
,
W. D.
(
1995
).
Modelling the perception of spoken words.
In J. P. Moore & F. Lehman (Eds.),
Proceedings of the 17th Annual Conference of the Cognitive Science Society
.
Mahwah, NJ
:
Lawrence Erlbaum Associates Inc
.
Gaskell
,
M. G.
, &
Marslen-Wilson
,
W. D.
(
1997
).
Integrating form and meaning: A distributed model of speech perception.
Language and Cognitive Processes
,
12
,
613
656
.
Gaskell
,
M. G.
, &
Marslen-Wilson
,
W. D.
(
2002
).
Representation and competition in the perception of spoken words.
Cognitive Psychology
,
45
,
220
266
.
Hall
,
D. A.
,
Haggard
,
M. P.
,
Akeroyd
,
M. A.
,
Palmer
,
A. R.
,
Summerfield
,
A. Q.
,
Elliott
,
M. R.
,
et al
(
1999
).
“Sparse” temporal sampling in auditory fMRI.
Human Brain Mapping
,
7
,
213
223
.
Henson
,
R. N. A.
(
2004
).
Analysis of fMRI timeseries: Linear time-invariant models, event-related fMRI and optimal experimental design.
In R. S. J. Frackowiak, K. J. Friston, C. D. Frith, R. J. Dolan, & C. J. Price (Eds.),
Human brain function
(pp.
793
822
).
London
:
Elsevier
.
Hickok
,
G.
,
Okada
,
K.
, &
Serences
,
J. T.
(
2009
).
Area Spt in the human planum temporale supports sensory-motor integration for speech processing.
Journal of Neurophysiology
,
101
,
2725
2732
.
Hickok
,
G.
, &
Poeppel
,
D.
(
2004
).
Dorsal and ventral streams: A framework for understanding aspects of the functional anatomy of language.
Cognition
,
92
,
67
99
.
Hickok
,
G.
, &
Poeppel
,
D.
(
2007
).
The cortical organization of speech processing.
Nature Reviews Neuroscience
,
8
,
393
402
.
Hillis
,
A. E.
, &
Caramazza
,
A.
(
1995
).
Converging evidence for the interaction of semantic and sublexical phonological information in accessing lexical representations for spoken output.
Cognitive Neuropsychology
,
12
,
187
227
.
Hillis
,
A. E.
,
Wityk
,
R. J.
,
Tuffiash
,
E.
,
Beauchamp
,
N. J.
,
Jacobs
,
M. A.
,
Barker
,
P. B.
,
et al
(
2001
).
Hypoperfusion of Wernicke's area predicts severity of semantic deficit in acute stroke.
Annals of Neurology
,
50
,
561
566
.
Hino
,
Y.
, &
Lupker
,
S. J.
(
1996
).
Effects of polysemy in lexical decision and naming: An alternative to lexical access accounts.
Journal of Experimental Psychology: Human Perception and Performance
,
22
,
1331
1356
.
Hino
,
Y.
, &
Lupker
,
S. J.
(
2000
).
Effects of word frequency and spelling-to-sound regularity in naming with and without preceding lexical decision.
Journal of Experimental Psychology: Human Perception and Performance
,
26
,
166
183
.
Huang
,
Y. T.
, &
Pinker
,
S.
(
2010
).
Lexical semantics and irregular inflection.
Language and Cognitive Processes
,
25
,
1411
1461
.
Humphries
,
C.
,
Binder
,
J. R.
,
Medler
,
D. A.
, &
Liebenthal
,
E.
(
2006
).
Syntactic and semantic modulation of neural activity during auditory sentence comprehension.
Journal of Cognitive Neuroscience
,
18
,
665
679
.
Indefrey
,
P.
, &
Cutler
,
A.
(
2004
).
Pre-lexical and lexical processing in listening.
In M. S. Gazzaniga (Ed.),
The cognitive neurosciences
(3rd ed., pp.
759
774
).
Cambridge, MA
:
MIT Press
.
Kan
,
I. P.
,
Kable
,
J. W.
,
Van Scoyoc
,
A.
,
Chatterjee
,
A.
, &
Thompson-Schill
,
S. L.
(
2006
).
Fractionating the left frontal response to tools: Dissociable effects of motor experience and lexical competition.
Journal of Cognitive Neuroscience
,
18
,
267
277
.
Luce
,
P. A.
, &
Pisoni
,
D. B.
(
1998
).
Recognizing spoken words: The neighborhood activation model.
Ear and Hearing
,
19
,
1
36
.
Makuuchi
,
M.
,
Bahlmann
,
J.
,
Anwander
,
A.
, &
Friederici
,
A. D.
(
2009
).
Segregating the core computational faculty of human language from working memory.
Proceedings of the National Academy of Sciences, U.S.A.
,
106
,
8362
8367
.
Marslen-Wilson
,
W. D.
(
1987
).
Functional parallelism in spoken word-recognition.
Cognition
,
25
,
71
102
.
Marslen-Wilson
,
W. D.
, &
Welsh
,
A.
(
1978
).
Processing interactions and lexical access during word-recognition in continous speech.
Cognitive Psychology
,
10
,
29
63
.
Marslen-Wilson
,
W. D.
, &
Zwitserlood
,
P.
(
1989
).
Accessing spoken words: The importance of word onsets.
Journal of Experimental Psychology: Human Perception and Performance
,
15
,
576
585
.
McClelland
,
J.
, &
Elman
,
J.
(
1986
).
The TRACE model of speech perception.
Cognitive Psychology
,
18
,
1
86
.
McRae
,
K.
,
de Sa
,
V. R.
, &
Seidenberg
,
M. S.
(
1997
).
On the nature and scope of featural representations of word meaning.
Journal of Experimental Psychology: General
,
126
,
99
130
.
Miceli
,
G.
,
Capasso
,
R.
, &
Caramazza
,
A.
(
1994
).
The interaction of lexical and sublexical processes in reading, writing and repetition.
Neuropsychologia
,
32
,
317
333
.
Miller
,
E. K.
, &
Cohen
,
J. D.
(
2001
).
An integrative theory of prefrontal cortex function.
Annual Review of Neuroscience
,
24
,
167
202
.
Norris
,
D.
(
1994
).
Shortlist: A connectionist model of continuous speech recognition.
Cognition
,
52
,
189
234
.
Norris
,
D.
,
McQueen
,
J. M.
, &
Cutler
,
A.
(
2000
).
Merging information in speech recognition: Feedback is never necessary.
Behavioral and Brain Sciences
,
23
,
299
370
.
Oldfield
,
R. C.
(
1971
).
The assessment and analysis of handedness: The Edinburgh inventory.
Neuropsychologia
,
9
,
97
113
.
Orfanidou
,
E.
,
Marslen-Wilson
,
W. D.
, &
Davis
,
M. H.
(
2006
).
Neural response suppression predicts repetition priming of spoken words and pseudowords.
Journal of Cognitive Neuroscience
,
18
,
1237
1252
.
Pexman
,
P. M.
,
Lupker
,
S. J.
, &
Hino
,
Y.
(
2002
).
The impact of feedback semantics in visual word recognition: Number-of-features effects in lexical decision and naming tasks.
Psychonomic Bulletin & Review
,
9
,
542
549
.
Plaut
,
D. C.
, &
Shallice
,
T.
(
1993
).
Deep dyslexia: A case study of connectionist neuropsychology.
Cognitive Neuropsychology
,
10
,
377
500
.
Prabhakaran
,
R.
,
Blumstein
,
S. E.
,
Myers
,
E. B.
,
Hutchison
,
E.
, &
Britton
,
B.
(
2006
).
An event-related fMRI investigation of phonological-lexical competition.
Neuropsychologia
,
44
,
2209
2221
.
Price
,
C. J.
,
Wise
,
R. J. S.
,
Warburton
,
E. A.
,
Moore
,
C. J.
,
Howard
,
D.
,
Patterson
,
K.
,
et al
(
1996
).
Hearing and saying: The functional neuro-anatomy of auditory word processing.
Brain
,
119
,
919
931
.
Randall
,
B.
,
Moss
,
H. E.
,
Rodd
,
J. M.
,
Greer
,
M.
, &
Tyler
,
L. K.
(
2004
).
Distinctiveness and correlation in conceptual structure: Behavioral and computational studies.
Journal of Experimental Psychology: Learning, Memory, and Cognition
,
30
,
393
406
.
Rauschecker
,
J. P.
, &
Scott
,
S. K.
(
2009
).
Maps and streams in the auditory cortex: Nonhuman primates illuminate human speech processing.
Nature Neuroscience
,
12
,
718
724
.
Rissman
,
J.
,
Eliassen
,
J. C.
, &
Blumstein
,
S. E.
(
2003
).
An event-related fMRI investigation of implicit semantic priming.
Journal of Cognitive Neuroscience
,
15
,
1160
1175
.
Sabsevitz
,
D. S.
,
Medler
,
D. A.
,
Seidenberg
,
M.
, &
Binder
,
J. R.
(
2005
).
Modulation of the semantic system by word imageability.
Neuroimage
,
27
,
188
200
.
Scott
,
S. K.
, &
Wise
,
R. J. S.
(
2004
).
The functional neuroanatomy of prelexical processing in speech perception.
Cognition
,
92
,
13
45
.
Snyder
,
H. R.
,
Feigenson
,
K.
, &
Thompson-Schill
,
S. L.
(
2007
).
Prefrontal cortical response to conflict during semantic and phonological tasks.
Journal of Cognitive Neuroscience
,
19
,
761
775
.
Spitsyna
,
G.
,
Warren
,
J. E.
,
Scott
,
S. K.
,
Turkheimer
,
F. E.
, &
Wise
,
R. J. S.
(
2006
).
Converging language streams in the human temporal lobe.
Journal of Neuroscience
,
26
,
7328
7336
.
Strain
,
E.
,
Patterson
,
K.
, &
Seidenberg
,
M.
(
1995
).
Semantic effects in single word naming.
Journal of Experimental Psychology: Learning, Memory, and Cognition
,
21
,
1140
1154
.
Talairach
,
J.
, &
Tournoux
,
P.
(
1988
).
Co-planar stereotaxic atlas of the human brain
.
Stuttgart
:
Georg Thieme Verlag
.
Thompson-Schill
,
S. L.
,
D'Esposito
,
M.
,
Aguirre
,
G. K.
, &
Farah
,
M. J.
(
1997
).
Role of left inferior prefrontal cortex in retrieval of semantic knowledge: A reevaluation.
Proceedings of the National Academy of Sciences, U.S.A.
,
94
,
14792
14797
.
Thompson-Schill
,
S. L.
,
d'Esposito
,
M.
, &
Kan
,
I. P.
(
1999
).
Effects of repetition and competition on activity in left prefrontal cortex during word generation.
Neuron
,
23
,
513
522
.
Tyler
,
L. K.
(
1984
).
The structure of the initial cohort: Evidence from gating.
Perception and Psychophysics
,
36
,
417
427
.
Tyler
,
L. K.
, &
Marslen-Wilson
,
W. D.
(
2008
).
Fronto-temporal brain systems supporting spoken language comprehension.
Philosophical Transactions of the Royal Society of London, Series B, Biological Sciences
,
363
,
1037
1054
.
Tyler
,
L. K.
, &
Moss
,
H. E.
(
2001
).
Towards a distributed account of conceptual knowledge.
Trends in Cognitive Sciences
,
5
,
244
252
.
Tyler
,
L. K.
,
Stamatakis
,
E. A.
,
Post
,
B.
,
Randall
,
B.
, &
Marslen-Wilson
,
W. D.
(
2005
).
Temporal and frontal systems in speech comprehension: An fMRI study of past tense processing.
Neuropsychologia
,
43
,
1963
1974
.
Tyler
,
L. K.
,
Voice
,
J. K.
, &
Moss
,
H. E.
(
2000
).
The interaction of meaning and sound in spoken word recognition.
Psychonomic Bulletin & Review
,
7
,
320
326
.
Vitevitch
,
M. S.
, &
Luce
,
P. A.
(
2004
).
A web-based interface to calculate phonotactic probability for words and nonwords in English.
Behavior Research Methods, Instruments, & Computers
,
36
,
481
487
.
Wagner
,
A. D.
,
Pare-Blagoey
,
E. J.
,
Clark
,
J.
, &
Poldrack
,
R. A.
(
2001
).
Recovering meaning: Left prefrontal cortex guides controlled semantic retrieval.
Neuron
,
31
,
329
338
.
Wright
,
P.
,
Randall
,
B.
,
Marslen-Wilson
,
W. D.
, &
Tyler
,
L. K.
(
2011
).
Dissociating linguistic and task-related activity in LIFG.
Journal of Cognitive Neuroscience
,
23
,
404
413
.
Zwitserlood
,
P.
(
1989
).
The locus of the effects of sentential-semantic context in spoken word processing.
Cognition
,
32
,
25
64
.