Abstract
Human language is expressive because it is compositional: The meaning of a sentence (semantics) can be inferred from its structure (syntax). It is commonly believed that language syntax and semantics are processed by distinct brain regions. Here, we revisit this claim using precision fMRI methods to capture separation or overlap of function in the brains of individual participants. Contrary to prior claims, we find distributed sensitivity to both syntax and semantics throughout a broad frontotemporal brain network. Our results join a growing body of evidence for an integrated network for language in the human brain within which internal specialization is primarily a matter of degree rather than kind, in contrast with influential proposals that advocate distinct specialization of different brain areas for different types of linguistic functions.
INTRODUCTION
Human language is a powerful medium for communicating complex thoughts. This power comes from the compositional structure of language (Chomsky, 1965): Meaning is encoded not only by individual words but also by the form and sequential arrangement of those words. For example, the sentence There are octopuses inside the bathtub! is (probably) unfamiliar to the reader and also (probably) expresses a meaning with which the reader has no direct experience. Yet novel meanings are recoverable from novel sentences thanks to the systematic relationship between a sentence's form and its meaning. This principle even extends to unfamiliar words: When we read There are blickets inside the dax!, we can infer that the blickets and the dax are in a containment relationship and have certain other properties (e.g., a blicket is countable and a dax can contain something), even if we do not know the meanings of the words themselves. Thus, the expressive power of language derives from its factorization into meaning (semantics) versus form (the sentence's grammatical structure or syntax).
Many models of the neurobiology of language posit a similar factorization at the level of large-scale brain areas (e.g., posterior temporal vs. inferior frontal areas)—such that some areas are “syntactic hubs” that selectively represent and process the grammatical structure of sentences, whereas others are “semantic hubs” that selectively represent and process the meanings of words and/or phrases/sentences—albeit with disagreement as to the precise locations of these functions in the brain (Friederici, 2017; Duffau, Moritz-Gasser, & Mandonnet, 2014; Bornkessel-Schlesewsky & Schlesewsky, 2013; Hickok & Poeppel, 2007; Hagoort, 2005; Frazier, 1987). If true, this division of linguistic labor would have fundamental implications for the organization and evolutionary origins of human cognition: Brain circuits that implement the abstract combinatorics needed for syntactic processing could be recruited in service of other cognitive functions that have similar hierarchical structure to language (e.g., mathematics, music, and action planning; Koechlin & Jubault, 2006; Patel, 2003; Lashley, 1951), and they may find their origins in changes to brain anatomy that enabled algebraic thought, including language (Dehaene, Al Roumi, Lakretz, Planton, & Sablé-Meyer, 2022; Dehaene, Meyniel, Wacongne, Wang, & Pallier, 2015). One important source of evidence in favor of the spatial separability of syntax and semantics has been a landmark study by Pallier, Devauchelle, and Dehaene (Pallier, Devauchelle, & Dehaene, 2011, henceforth PDD), who argued based on fMRI evidence for a dissociation between brain areas that selectively represent and process syntax and areas that selectively represent and process lexical (word-level) and combinatorial (phrase-level) semantics. PDD's claims have informed theories of cognition, brain function, and evolution, especially those that posit neural circuits dedicated to abstract combinatorics (e.g., Dehaene et al., 2015, 2022; Bornkessel-Schlesewsky, Schlesewsky, Small, & Rauschecker, 2015; Bolhuis, Tattersall, Chomsky, & Berwick, 2014; Fitch, 2014; Petkov & Jarvis, 2012).
In PDD's paradigm (Figure 1A), participants read 12-word stimuli presented one word at a time. These stimuli were internally composed of “chunks” (our terminology) of locally coherent connected words. The chunks varied parametrically in length. At one extreme, a stimulus contained 12 concatenated (one-word) chunks (condition “c01” in Figure 1A), and at the other, a stimulus contained a single 12-word chunk (condition “c12” in Figure 1A). In the intermediate conditions, the stimuli contained concatenated chunks of different lengths: six 2-word chunks (c02), four 3-word chunks (c03), three 4-word chunks (c04), or two 6-word chunks (c06). The chunks in these conditions always formed valid syntactic constituents, that is, a complete phrase in a hierarchical representation of the sentence's grammatical structure (i.e., a parse tree; see Figure 1B). PDD hypothesized that language processing requires the comprehender to maintain an increasingly complex representation of structure (i.e., a unified syntactic and/or semantic representation of the word sequence) as each new word is processed, and that this increased representational complexity will correspond to an increase in overall neuronal activity in conditions with longer constituents (given that they express a more complex phrasal structure; see Figure 1B). To investigate the abstractness of syntactic representations, a “Jabberwocky” version of each condition (e.g., jab-c01, jab-c12) was created by replacing the content words (nouns, verbs, adjectives, and adverbs) with word-like nonwords (pseudowords), but preserving the syntactic “frame,” that is, function words like articles and auxiliaries, and functional morphological endings (e.g., higher and higher prices > hisker and hisker cleeces).
This design targets three potentially dissociable dimensions of linguistic representation, each of which could be either present or absent in a given brain region's response. First, lexical semantics (learned conventions about individual word meanings) is fully present only in the real-word conditions. Although Jabberwocky conditions may allow some aspects of lexical meaning to be inferred (rather than recalled)—for example, via pseudowords' form (Blasi, Wichmann, Hammarström, Stadler, & Christiansen, 2016) or context (Li, 1988)—the content of any such inferences should be consistent with a broader range of meanings on average than learned word meanings; otherwise, vocabulary learning would be unnecessary for language comprehension. Second, syntax (the grammatical structure of the sentence as reflected in the forms and sequential ordering of words) is present in proportion to chunk length. When chunk length is 1 (i.e., a word list), syntactic demands should be limited to the processing of word-internal features (e.g., suffixes marking tense or plurality), with no possibility of integrating words into larger structures (e.g., parse trees). As chunk length increases, the constituents formed by each chunk increase in complexity, which may impose additional processing demands related to syntactic tree construction (e.g., hypothesizing new nodes in the tree and new grammatical dependencies to preceding words in the chunk). Importantly, because the surface cues that are needed to construct abstract trees (prefixes, suffixes, and function words) are present in both the real-word and Jabberwocky conditions, the demands associated with tree construction should be similarly modulated by chunk length in both types of conditions. PDD also identified specific effects of phrasal constituency by including nonconstituents (nc) versions of the c03 and c04 conditions, that is, three- and four-word coherent sequences of real words that do not form complete syntactic phrases (e.g., over the floodlit; Figure 1B). Third, similar to lexical semantics, combinatorial semantics (the composite meaning denoted by the chunk) is fully present only in the real-word conditions. Although certain abstract properties of meaning are directly encoded by syntax (e.g., plurality in cleeces is both a syntactic and a semantic property) and thus present in the Jabberwocky conditions, these meanings are impoverished relative to real-word conditions in ways that go beyond the mere absence of lexical semantics. The combinatorial semantics of real-word items enable the construction of detailed mental models of meaning that can themselves inform other inferences not directly stated in language. Returning to the example above, the sentence There are octopuses inside the bathtub! may license the inference that there is also water in the bathtub with greater confidence than the sentence There are blickets inside the dax! may license the inference that there is also water in the dax, because the real-word sentence permits more specific connections to conceptual knowledge (e.g., that octopuses are aquatic animals, that bathtubs hold water). Furthermore, similar to syntax, combinatorial semantics is present in proportion to chunk length. When chunk length is 1, the meanings of nearby words cannot be unified into a larger whole. As chunk length increases, the mental representations of entities can be refined (e.g., via adjectives and prepositional phrases) and relations between entities can be hypothesized (e.g., by recognizing the subjects and objects of verbs).
These three linguistic dimensions (lexical semantics, syntax, and combinatorial semantics) coordinate to produce the compositional power of human language discussed above. Words with conventionalized meanings (lexical semantics) can be composed (i.e., unified into a single syntactic and/or semantic representation) by a shared system of rules (syntax) so as to lead systematically to shared representations of more complex meanings (combinatorial semantics) that may themselves be novel, or lack a single conventionalized expression. Differences in brain response to word sequences as a function of whether those sequences can be composed can thus shed light on the nature of composition during language comprehension (e.g., Vandenberghe, Nobre, & Price, 2002), including whether different components of composition are implemented by different brain areas. For convenience, our assumed definitions of key terms in the foregoing discussion are provided in Table 1.
Syntax | Grammatical properties that govern the form and arrangement of words in sentences. Within syntax, we include phrase structure (e.g., a determiner such as the followed by a noun such as cat can form a noun phrase the cat), grammatical relations (e.g., the phrase the cat can be the subject of a verb, e.g., sleeps), and affixation patterns that reflect these relations (e.g., the suffix -s of sleeps indicates that the subject is singular). |
Lexical semantics | Meanings of words. Within lexical semantics, we include all learned information about the concepts, properties, affordances, and usage patterns (e.g., social register) associated with words in a language. We exclude any aspect of meaning that can be inferred from the form of the word alone (e.g., plurality, which is often grammatically marked in English). |
Combinatorial semantics | Meanings of multiword phrases. Within combinatorial semantics, we include any aspect of meaning that is not directly conveyed by the words considered in isolation. This includes meanings inferred from the syntactic arrangement of words (e.g., merging the representations of the meanings of the cat and sleeps to yield a representation of the proposition the cat sleeps in some system of formal logic, e.g., Church, 1932) and the meanings of conventionalized collocations (e.g., let the cat out of the bag to mean “accidentally reveal a secret”). |
Structure | Any property of multiword phrases that is not directly conveyed by the words considered in isolation, including both grammar and meaning. We thus use structure as a cover term for both syntax and combinatorial semantics. In our experiments, structure (more precisely, structural complexity) is modulated by chunk length. |
Composition | Any mental process that infers (syntactic and/or conceptual) structure from sequences of words. |
Constituent | The full word sequence dominated by a single node in a hierarchical representation of a sentence's phrase structure (see Figure 1B). |
Chunk | A contiguous sequence of (pseudo)words that can be syntactically and/or semantically composed. |
Length effect | An increase in a brain region's BOLD response as a function of chunk length. |
Syntax | Grammatical properties that govern the form and arrangement of words in sentences. Within syntax, we include phrase structure (e.g., a determiner such as the followed by a noun such as cat can form a noun phrase the cat), grammatical relations (e.g., the phrase the cat can be the subject of a verb, e.g., sleeps), and affixation patterns that reflect these relations (e.g., the suffix -s of sleeps indicates that the subject is singular). |
Lexical semantics | Meanings of words. Within lexical semantics, we include all learned information about the concepts, properties, affordances, and usage patterns (e.g., social register) associated with words in a language. We exclude any aspect of meaning that can be inferred from the form of the word alone (e.g., plurality, which is often grammatically marked in English). |
Combinatorial semantics | Meanings of multiword phrases. Within combinatorial semantics, we include any aspect of meaning that is not directly conveyed by the words considered in isolation. This includes meanings inferred from the syntactic arrangement of words (e.g., merging the representations of the meanings of the cat and sleeps to yield a representation of the proposition the cat sleeps in some system of formal logic, e.g., Church, 1932) and the meanings of conventionalized collocations (e.g., let the cat out of the bag to mean “accidentally reveal a secret”). |
Structure | Any property of multiword phrases that is not directly conveyed by the words considered in isolation, including both grammar and meaning. We thus use structure as a cover term for both syntax and combinatorial semantics. In our experiments, structure (more precisely, structural complexity) is modulated by chunk length. |
Composition | Any mental process that infers (syntactic and/or conceptual) structure from sequences of words. |
Constituent | The full word sequence dominated by a single node in a hierarchical representation of a sentence's phrase structure (see Figure 1B). |
Chunk | A contiguous sequence of (pseudo)words that can be syntactically and/or semantically composed. |
Length effect | An increase in a brain region's BOLD response as a function of chunk length. |
PDD's design therefore gives rise to the eight hypothetical response profiles depicted in Figure 2. For example, a selectively syntactic region (−Lex, +Syn, –CombSem) should respond identically across real-word and Jabberwocky conditions; a selectively combinatorial-semantic region (−Lex, –Syn, +CombSem) should show a length effect (stronger responses to longer chunks) only in the real-word conditions; and a combined lexical-semantic, syntactic, and combinatorial-semantic region (+Lex, +Syn, +CombSem) should show length effects in both real-word and Jabberwocky conditions, with a stronger length effect in the real-word conditions. PDD's design therefore permits empirical discrimination of different logically possible patterns of (in)sensitivity to lexical-semantic, syntactic, and combinatorial-semantic dimensions of language, with major implications for our understanding of the neural substrates that enable language comprehension.
PDD reported three key findings of relevance to the neural substrates of syntactic and semantic processing. Finding 1: Inferior frontal and posterior temporal language regions in the left hemisphere (LH) responded more strongly to longer constituents, even in the Jabberwocky conditions. Finding 2: The slope of this increase with chunk length was indistinguishable in these areas between the real-word and Jabberwocky conditions. The impact of this finding was plausibly enhanced by the additional apparent absence of a difference in intercept between conditions, such that the overall response profiles in these regions were nearly identical in the two types of conditions (similar to the selectively syntactic, −Lex, +Syn, –CombSem, profile in Figure 2). Finding 3: By contrast, in anterior temporal and temporoparietal language regions, activation increased with chunk length in the real-word conditions but not the Jabberwocky conditions, with a significant difference in slope between the two condition types (similar to the –Syn, +CombSem profiles in Figure 2). These findings have been reinforced by other studies showing syntactic/semantic dissociations with a similar topography to that reported by PDD (e.g., Matchin, Hammerly, & Lau, 2017; Goucha & Friederici, 2015).
In addition to their support for neurobiological effects of syntax in general (Finding 1), these findings have had a major influence on thinking about the division of labor within the human language system, which we group into two broad claims that were made directly by PDD or attributed to them by subsequent work.
- •
Syntactic Hubs: Finding 2 has been taken to support the existence of abstract-syntactic hubs in inferior frontal and posterior temporal cortex (Dehaene, 2019; Nelson et al., 2017; Hertrich, Dietrich, & Ackermann, 2016; Pattamadilok, Dehaene, & Pallier, 2016; Dehaene et al., 2015; Wang, Uhrig, Jarraya, & Dehaene, 2015; Pallier et al., 2011). Because PDD reported qualitatively identical response profiles in these regions for real-word and Jabberwocky conditions, prior invocations of this empirical finding are often ambiguous between a strong form in which these hubs exclusively encode abstract combinatorics—with no reference to lexical- or combinatorial-semantic content (Hertrich et al., 2016; Dehaene et al., 2015; Kempen, 2014; profile –Lex, +Syn, –CombSem in Figure 2)—and a weaker form in which these hubs do not encode combinatorial semantics, but may nonetheless respond more strongly to real words than pseudowords overall (Matchin et al., 2017; profile +Lex, +Syn, –CombSem in Figure 2).
- •
Lexico-Semantic Hubs: Finding 3 has been taken to support a selective role for anterior temporal and temporoparietal areas in lexical- and combinatorial-semantic processing (Frankland & Greene, 2020; Zaccarella, Schell, & Friederici, 2017; Bautista & Wilson, 2016; Friston & Buzsáki, 2016; Skeide, Brauer, & Friederici, 2016; Bornkessel-Schlesewsky et al., 2015; Zaccarella & Friederici, 2015; Wilson et al., 2014; Bornkessel-Schlesewsky & Schlesewsky, 2013; Pallier et al., 2011; profile +Lex, –Syn, +CombSem in Figure 2). PDD use the term lexico-semantic to characterize these areas, and we follow this terminology when discussing PDD's (and related) claims. For elaboration on the ways in which PDD's study has influenced subsequent thinking about the neurobiology of language, see Appendix 1.
However, these claims now face empirical and methodological objections. Empirically, the existence of syntactic hubs (or, at least, the strong form of this claim) has been challenged by evidence of lexical processing in the inferior frontal and posterior temporal areas identified by PDD as abstract syntactic hubs (e.g., Matchin et al., 2017; Fedorenko, Duncan, & Kanwisher, 2012; Fedorenko, Nieto-Castañón, & Kanwisher, 2012; Fedorenko, Hsieh, Nieto-Castañón, Whitfield-Gabrieli, & Kanwisher, 2010; Rodd, Davis, & Johnsrude, 2005), and the existence of lexico-semantic hubs has been challenged by evidence of sensitivity to structure in Jabberwocky materials in anterior temporal regions argued by PDD to be insensitive to such effects (e.g., Fedorenko, Duncan, et al., 2012; Fedorenko, Nieto-Castañón, et al., 2012; Fedorenko et al., 2010; Rogalsky & Hickok, 2009; Humphries, Binder, Medler, & Liebenthal, 2006). These prior studies raise concerns about the robustness and replicability of PDD's reported pattern. Methodologically, some of the choices in PDD's design and analyses are problematic. First, PDD used a between-subjects design to compare the real-word and Jabberwocky conditions (thus simultaneously varying both the sample of participants and the condition), although this manipulation is feasible to perform in a within-subject design that avoids this confound. Because individuals and, by extension, groups of individuals vary along numerous trait and state dimensions that are known to affect neural responses (e.g., Chen, Saad, Britton, Pine, & Cox, 2013; Hariri, 2009; Holmes & Friston, 1998), the magnitudes of neural responses in two groups cannot be confidently attributed to differences/similarities between conditions. Second, PDD used the same data both to define the ROIs and to quantify their responses, introducing circularity (Kriegeskorte, Simmons, Bellgowan, & Baker, 2009). Third, PDD relied on traditional fMRI group analyses (Holmes & Friston, 1998), which assume voxel-wise correspondence across individual brains. Ample evidence now exists for substantial interindividual variability in the precise locations of functional areas in the association cortex (e.g., Vázquez-Rodríguez et al., 2019; Frost & Goebel, 2012; Tahmasebi et al., 2012), including in the language network (e.g., Mahowald & Fedorenko, 2016; Fedorenko et al., 2010). Given that some of PDD's claims rely on not finding certain effects in certain brain regions, the choice of traditional group analyses, which suffer from low sensitivity (Nieto-Castañón & Fedorenko, 2012), is suboptimal. We stress that PDD's approach and claims were reasonable for the time and that some of the concerns above arise from empirical findings or methodological insights that were contemporaneous or subsequent to PDD's publication date. However, because PDD's findings continue to exert substantial influence, it is important to consider them in light of subsequent developments.
Motivated by these concerns, and in line with current emphasis in the field on robustness and replicability (Gilmore, Diaz, Wyble, & Yarkoni, 2017; Yarkoni & Westfall, 2017; Open Science Collaboration, 2015; Makel & Plucker, 2014; Simons, 2014; Pashler & Wagenmakers, 2012), we conduct three fMRI experiments (across n = 75 participants) that constitute the closest effort to date to replicate PDD's original study while addressing the methodological issues above. First, we use a strictly within-subject design. Second, we use independent data to define the ROIs and to quantify their responses to the critical conditions. And third, we define ROIs functionally in individual brains (e.g., Fedorenko, 2021; Fedorenko et al., 2010; Saxe, Brett, & Kanwisher, 2006), which has been shown to yield higher sensitivity and higher functional resolution (e.g., Braga, DiNicola, Becker, & Buckner, 2020; Shashidhara, Spronkers, & Erez, 2020; Fedorenko, Duncan, et al., 2012; Nieto-Castañón & Fedorenko, 2012).
To foreshadow our results, we strongly replicate PDD's key discovery of a basic chunk length effect in all experiments (see Giglio, Ostarek, Weber, & Hagoort, 2022, for another recent replication by another research group): Activity in multiple language areas increases parametrically with the increasing length of linguistic context, even in the absence of lexical content. However, our results challenge the existence of both syntactic and semantic hubs. In particular, (a) all regions of the language network (except the TPJ / angular gyrus language region) show a length effect in Jabberwocky conditions; (b) all language regions show an effect of “lexicality,” with real-word conditions eliciting stronger responses than Jabberwocky conditions; and (c) all language regions but the PostTemp language region show a length by lexicality interaction whereby the length effect is stronger in the real-word conditions compared with Jabberwocky conditions. We further show that these length effects do not critically depend on syntactic constituency per se but rather on the length of contiguous coherent text, which undermines PDD's claim that syntactic constituency critically drives the length effect.
These findings challenge a bifurcation of the language system into discrete syntactic and lexico-semantic components. Our results instead join a growing body of evidence for an integrated network for language in the human brain (Malik-Moraleda et al., 2022; Braga et al., 2020; Scott, Gallée, & Fedorenko, 2017; Fedorenko et al., 2010) within which internal specialization is primarily a matter of degree rather than kind (Blank & Fedorenko, 2020; Fedorenko & Blank, 2020; Fedorenko, Behr, & Kanwisher, 2011; Fedorenko, Blank, Siegelman, & Mineroff, 2020; Keller, Carpenter, & Just, 2001, see Fedorenko, Ivanova, & Regev, 2024, for review) in contrast with influential proposals that posit a sharp separation between different types of linguistic representations and processes (Friederici, 2017; Bornkessel-Schlesewsky & Schlesewsky, 2013; Hagoort, 2005; Hickok & Poeppel, 2007).
METHODS
This study consists of three experiments. Experiment 1 focuses on the real-word conditions from PDD and attempts to replicate the basic length effect in the language network's response. Experiment 2 additionally includes Jabberwocky conditions to test PDD's critical theoretical claim: that a subset of the language network implements abstract, content-independent, syntactic processing. Experiment 3 targets the centrality of syntactic constituency by investigating length effects using chunks that overwhelmingly do not form syntactic constituents.
Participants
Seventy-four unique individuals (age 18–38 years, 39 female participants) participated for payment (Experiment 1: n = 15; Experiment 2: n = 40, Experiment 3: n = 20; one individual participated in both Experiment 2 and Experiment 3, on separate days). The same number of participants (40) were included in our key replication of PDD (Experiment 2) as in PDD's original study. All but three participants were right-handed—as determined by the Edinburgh Handedness Inventory (Oldfield, 1971), or self-report. All participants were native (age of first exposure < 10 years old) or highly proficient (n = 3) speakers of English (see Malik-Moraleda et al., 2022, for evidence that the language system of highly proficient speakers is similar to that of native speakers). All participants gave informed written consent in accordance with the requirements of Massachusetts Institute of Technology's (MIT) Committee on the Use of Humans as Experimental Subjects. Each participant completed a language localizer task (Fedorenko et al., 2010) and a critical task.
Critical Task: Design
The design of Experiments 1 and 2 followed PDD but used English materials available at https://osf.io/fduve/ (the original experiments were carried out in French). In particular, participants were presented with same-length stimuli (sequences of 12 words/nonwords), and the internal composition of these stimuli varied across conditions. The conditions in Experiment 1 were similar to PDD's real-word conditions, except they did not include the three-word constituent condition. Experiment 2 included three types of experimental manipulation that directly follow PDD's original design: (a) six real-word conditions: a sequence of 12 unconnected words (i.e., constituents of length 1: c01; here and elsewhere, our condition name abbreviations are similar to those in PDD), six 2-word constituents (c02), four 3-word constituents (c03), three 4-word constituents (c04), two 6-word constituents (c06), and a 12-word sentence (c12); (b) three conditions that were a subset of the Jabberwocky conditions from PDD and were selected to span the range of constituent lengths: a list of 12 unconnected nonwords (jab-c01), three 4-word Jabberwocky constituents (jab-c04), and a 12-word Jabberwocky sentence (jab-c12); and (c) two nonconstituent conditions—four 3-word nonconstituent chunks (nc03) and three 4-word nonconstituent chunks (nc04). Sample stimuli are shown in Figure 1.
Like the materials in Experiments 1 and 2, the materials in Experiment 3 implicitly contained sequences of contiguous chunks of varying length drawn from English sentences. However, unlike Experiments 1 and 2, these chunks were not required to (and generally did not) form syntactic constituents in their source contexts. Thus, Experiment 3 allows us to investigate the extent to which constituency is critical to the relationship between implicit chunk length and the brain's response. The materials for Experiment 3 consisted of two sets, so as to span a large range of chunk lengths at a fine-grained level. Stimuli in Set 1 were 24 words long in total and fell into length conditions based on the divisors of 24 (i.e., c01, c02, c03, c04, c06, c08, and c12). Stimuli in Set 2 were 30 words long and fell into length conditions based on the divisors of 30 (i.e., c01, c02, c03, c05, c06, and c10).
Full details about the materials and stimulus design are given in Appendix 4. Quantitative analyses of the linguistic features of these materials are given in Appendix 5.
Critical Task: Procedure
The procedure was similar for the three experiments and followed PDD: Participants saw the stimuli presented one word/nonword at a time in the center of the screen in all caps with no punctuation at the rate of 300 msec per word/nonword. In all experiments, participants were simply instructed to read attentively, based on prior evidence (Fedorenko et al., 2010) that responses to sentences, Jabberwocky sentences, word lists, and nonword lists do not appear to be affected by the presence of a task. In Experiment 1, the 150 trials (thirty 12-word stimuli × 5 conditions) were distributed across five runs, so that each run contained six trials per condition. In addition, each run included 108 sec of fixation, for a total run duration of 216 sec (3 min 36 sec). In Experiment 2, the 330 trials (thirty 12-word stimuli × 11 conditions) were distributed across 10 runs, so that each run contained three trials per condition. In addition, each run included 121.2 sec of fixation, for a total run duration of 240 sec (4 min). In both experiments, the order of conditions and the distribution of fixation periods in each run were determined with the optseq2 algorithm (Dale, Fischl, & Sereno, 1999). Experiment 3 used the same presentation format as Experiments 1 and 2, which means that the Set 1 (24-word) trials lasted 7.2 sec, and Set 2 (30-word) trials lasted 9 sec. The 156 trials of Experiment 3 (twelve 24-word stimuli × 7 conditions plus twelve 30-word stimuli × 6 conditions) were distributed across six runs, with each run containing 26 trials (fourteen 24-word trials, and twelve 30-word trials), two trials of each of the 13 conditions. Fixation periods were distributed as follows: 8 sec at the beginning of the run, 5.4 sec after each trial, and 8.2 sec at the end of the run. Condition order varied across runs and participants, with the constraint that trials of the same condition did not appear in a row.
fMRI Data Acquisition, Preprocessing, and Modeling; Functional Localization; and Data Analysis
fMRI data acquisition, preprocessing, and modeling are described in Appendix 2. Participant-specific functional localization is described in Appendix 3. Contrasts used in analyses of responses to the critical tasks are defined in Appendix 6. Statistical methods are described in Appendix 7.
Motivation for Functionally Localizing the Language Network
Here, we briefly motivate our assumption that there exists a core “language network” and our approach to identify this network at the individual-participant level using a validated language “localizer” paradigm (for further discussion of the importance of participant-specific localization of functional areas in the brain, see Kanwisher, 2010; Saxe et al., 2006; for a discussion of this issue in the domain of language specifically, see Braga et al., 2020; Fedorenko et al., 2010, 2011).
There Exists an Integrated Language Network in the (Typical, Mature) Human Brain
Several lines of evidence converge to support the view that parts of frontal and temporal cortex (among possibly other cortical, subcortical, and cerebellar areas; Lipkin et al., 2022; Fedorenko et al., 2010) constitute an integrated network that is implicated in language processing (for a recent review, see Fedorenko et al., 2024). First, within individuals, fMRI responses are stable (i.e., highly topographically similar) across diverse reading- and listening-based localizer contrasts that target high-level language processing, including coarse contrasts (such as sentences vs. nonword lists / consonant strings / acoustically degraded speech; Malik-Moraleda et al., 2022; Scott et al., 2017; Fedorenko et al., 2010) and finer contrasts (such as sentences vs. word lists, or word lists vs. nonword lists; Fedorenko et al., 2010). Second, within individuals, fMRI responses to localizer contrasts are stable over time (Mahowald & Fedorenko, 2016; Fedorenko et al., 2010). Third, within individuals, voxels within this network show highly correlated activity with each other during naturalistic comprehension paradigms and much higher than with voxels outside the language network (Malik-Moraleda et al., 2022; Paunov et al., 2019; Blank et al., 2014); in fact, these correlations are so strong that the same network that is identified by language localizers can also be recovered from patterns of BOLD signal fluctuations during a task-free resting state paradigm (Braga et al., 2020). Fourth, the same network is found across diverse languages, both across speakers (Malik-Moraleda et al., 2022) and within bi/multilingual speakers (Malik-Moraleda et al., 2024).
Thus, despite the inherent complexity both of language itself and of its (undisputed) interactions with diverse perceptual, motor, cognitive, and affective functions, evidence supports the hypothesis that the network identified by language localizer tasks is a functionally meaningful unit of analysis in the brain whose existence is external to, for example, conceptual debates about the definition of “language.” The question thus becomes less a matter of how to define language to study it in the brain, and more a matter of what computations this network supports. As of this writing, several lines of evidence suggest that “language processing” is the best available construal of this network's function. First, the perceptual controls used in many localizer tasks (e.g., sentences vs. nonwords or intact vs. degraded speech) rule out a low-level perceptual function, given that this network responds differentially to perceptually similar signals as a function of the presence of linguistic meaning and/or structure. Similarly, during language production, this network responds more strongly when individuals produce meaningful and structured language stimuli (phrases and sentences) compared with stimuli where the articulation demands are similar but the higher level language processing demands are not (Hu, et al., 2023), which suggests that low-level motor planning and execution cannot explain its responses. Second, this network is highly selective for language processing relative to diverse nonlinguistic inputs and tasks as measured with fMRI (Diachek, Blank, Siegelman, Affourtit, & Fedorenko, 2020; Deen, Koldewyn, Kanwisher, & Saxe, 2015; Monti, Parsons, & Osherson, 2012; Fedorenko et al., 2011; see Fedorenko et al., 2024, for review) and its damage impairs language production and comprehension but leaves intact diverse higher-order cognitive functions (Fedorenko & Varley, 2016). Third, this network is engaged by multiple levels of linguistic representation, from sublexical (Regev et al., 2024; Lopopolo et al., 2017), to lexical (Fedorenko et al., 2020; Rodd et al., 2005), to sentential (Shain et al., 2022; Caucheteux et al., 2021; Reddy & Wehbe, 2021) properties.
Therefore, the evidence for the existence of this integrated network is robust to many localizer-paradigm details and external to researcher assumptions about the definition of language (e.g., the network can be identified from resting state data). Although future research will likely continue to refine the precise computations that this network carries out, current evidence suggests that little precision is lost by calling this network the “language network,” as we have done.
The Precise Locations of the Language Areas Vary between Individual Brains
The language network's general topography is similar across individuals (e.g., falling consistently within inferior and middle frontal gyri and along the superior temporal sulcus and/or middle temporal gyrus), and its precise topography is stable within individuals over time. However, the exact locations, shapes, and sizes of language areas vary between individuals (Lipkin et al., 2022; Fedorenko et al., 2010, 2011). This variability poses a problem for group-level analyses that average individual maps voxel-wise and/or use group-level ROIs (such as those used by PDD), given that these analyses likely pool responses from voxels that are highly selective for language with responses from nearby voxels that are less so or even belong to distinct functional networks (see Fedorenko & Blank, 2020, for illustration of this issue in inferior frontal cortex), which reduces sensitivity and functional resolution and underestimates effect sizes (Nieto-Castañón & Fedorenko, 2012; Saxe et al., 2006). These issues can lead to failure to detect real effects (e.g., length effects in Jabberwocky conditions in anterior temporal areas). Thus, of relevance to the current study, although PDD's parcels likely cover many core language areas (Appendix 9), they are also likely less precise given that they do not take into account interindividual variability in the precise locations of language areas (Fedorenko et al., 2011). Follow-up analyses of our data (Appendix 10) indicate that effects are indeed much weaker using PDD's group-level ROIs than they are using participant-specific fROIs.
The Use of Functional Localization Is Unlikely to Bias Our Results against Finding Separable Lexical, Syntactic, and Semantic Functions
One possible concern about our analysis design (raised by an anonymous reviewer) is that our localizer contrast (sentences > nonword lists) may be biased toward finding overlap between lexical semantic, syntactic, and combinatorial semantic processing demands, given that sentences and nonword lists differ in all three of these dimensions, and thus the areas that show the largest difference between them may also be those in which these demands overlap. This concern is partially addressed by Appendix 10, where we show that some of PDD's key findings (especially their reported syntax-selectivity of IFG) still fail to replicate when using their group ROIs instead of our functional ROIs. Moreover, this objection additionally rests on three questionable assumptions.
The first assumption is that there exist areas that selectively respond only to some of the demands of interest (lexical semantic, syntactic, and/or combinatorial semantic). However, this possibility has already been extensively investigated using narrower localizer contrasts, including word lists > nonword lists (targeting lexical processing), Jabberwocky > nonword lists (targeting “pure” syntactic processing), and sentences > word lists (targeting syntactic and combinatorial semantic processing), and no evidence of any such areas has emerged (Blank et al., 2016; Fedorenko et al., 2010, 2011; see also Bautista & Wilson, 2016, for related evidence from a different approach). Instead, areas/sets of voxels that respond to any of these contrasts tend to also respond to the other contrasts (Fedorenko et al., 2010), even at a finer spatial scale as measured with human intracranial recordings (Fedorenko et al., 2016). In fact, one of the motivating goals behind the original four-condition Fedorenko and colleagues (2010) localizer design was to separate selectivity for different types of linguistic representation. After no evidence of such separation emerged in multiple studies, Fedorenko and colleagues began to simplify the localizer design to the sentences versus nonwords version used here.
The second assumption is that simultaneously manipulating the demands associated with lexical semantic, syntactic, and combinatorial semantic processing will produce a larger response in areas that support multiple of these functions compared with areas that support only one. However, even if there existed pure syntax-processing regions as suggested by PDD, it does not follow that their response to sentences would be smaller than that of mixed-function regions simply because they have narrower functional selectivity. Sentences are substantially more syntactically complex than nonword lists, and syntax-selective regions should strongly differentiate between them. In other words, although a large response to our localizer contrast does not entail syntax selectivity, syntax selectivity does entail a large response to our localizer contrast.
The third assumption is that the network identified by a language localizer is highly dependent on the particular contrast and that a different contrast (e.g., a more narrowly syntactic contrast) would select a substantially different set of voxels. If this assumption is false (i.e., if roughly same set of voxels is picked out by many different contrasts), then the content of the particular localizer contrast becomes irrelevant, as long as it reliably identifies the same brain network as other approaches. As argued earlier in this section, we believe this is precisely the case for the language network. To re-emphasize one compelling piece of evidence, Braga and colleagues (2020) showed that the activation map for the sentences > nonword lists contrast tightly corresponds to a network that emerges from voxel timecourse correlations during task-free resting state. In other words, the contrast used in our study is simply an efficient way of identifying the network of interest (based on a few minutes of task data) that would emerge anyway if we had ∼1 hr or more of resting-state data per participant.
On the basis of the considerations above, we find it unlikely that participant-specific functional localization led to spurious findings of overlap in our study.
RESULTS
We revisit PDD's claims in three experiments. Experiment 1 seeks to replicate the finding of an overall increase in the BOLD response of language brain areas as a function of chunk length. Experiment 2 is a conceptual replication of PDD, including all of the original real-word conditions and a critical subset of the Jabberwocky conditions, as well as PDD's two additional “nonconstituent” conditions consisting of three- and four-word chunks that do not form valid syntactic constituents (e.g., over the floodlit; Figure 1B). Unlike PDD, in all experiments, we independently localize the language network in each participant and use a fully within-subject design. Experiment 3 more directly targets the centrality of constituency for obtaining the length effect (stronger responses to longer chunks) by presenting participants with 24-word and 30-word stimuli that are composed of chunks of varying length (taken from naturalistic texts), which overwhelmingly do not form constituents in their source sentences (86.5% of the time; because Experiment 3 was originally designed with a different research goal in mind, avoiding constituents entirely was not a consideration). For details about these experiments, see the Methods section. Results are visualized in Figure 3 (full significance testing details are given in Appendix 8).
Do the Language Regions Show Length Effects?
For the real-word conditions, all regions show the pattern reported by PDD: significantly increasing activation as a function of chunk length, including a smaller increase at larger lengths (e.g., c06 to c12) in all three experiments (Figure 3B–E).
Do Any Language Regions Behave Like “Syntactic Hubs,” Showing Identical Responses to the Real-word and Jabberwocky Conditions?
No language region shows the pattern (reported by PDD for inferior frontal and posterior temporal areas) of visually indistinguishable patterns of response in the real-word and Jabberwocky conditions. Instead, all language regions' responses are modulated by lexicality, either in the overall response, in the slope of the length effect, or both (Figure 3C and E). Thus, no language region appears to be a hub for abstract (i.e., content-independent) combinatorics.
Do Anterior Temporal and Temporoparietal Language Regions Only Show Length Effects in the Real-word Conditions?
We find a significant length effect in the Jabberwocky conditions for the language network as a whole, as well as for each region within it except for the temporoparietal left angular gyrus (LAngG) language region. Contrary to PDD's claim that the anterior temporal language area (LAntTemp) is not responsive to chunk length in the absence of lexical content (Jabberwocky), we find this effect robustly (Figure 3C and E).
However, the LAngG language region (which corresponds to PDD's “TPJ” region; Figure A1) only shows a length effect in the real-word conditions (which is significantly larger than the length effect for Jabberwocky conditions), as PDD claimed, and in direct pairwise comparisons between regions, the length effect for Jabberwocky stimuli is significantly weaker in the LAngG functional ROIs (fROI) than in all other language regions. Nonetheless, the length effect for real-word stimuli is also significantly weaker in LAngG than in all other language regions except for the LAntTemp fROI. Together with prior evidence (e.g., Shain, Paunov, Chen, Lipkin, & Fedorenko, 2023; Shain, Blank, Fedorenko, Gibson, & Schuler, 2022; Braga et al., 2020; Blank, Kanwisher, & Fedorenko, 2014), this qualitative difference in response suggests functional differentiation between the LAngG language region and the rest of the core language network (see Discussion section).
Are Inferior Frontal and Posterior Temporal Language Regions Insensitive to Combinatorial Semantics, over and above Syntax?
We find a significantly steeper slope for the length effect in the real-word conditions relative to the Jabberwocky conditions (Length × Stimulus Type interaction) in the language network as a whole, as well as in each region within it except for the left middle frontal gyrus (LMFG) and left posterior temporal (LPostTemp) language regions. The Length × Stimulus Type interaction in the LMFG region is positive and similar in magnitude to that of other regions, but it fails to reach significance. By contrast, the Length × Stimulus Type interaction in the LPostTemp region is numerically near zero. This finding is contrary to PDD's claim that the inferior frontal language areas (left inferior frontal gyrus- [LIFG] and its orbital part [LIFGorb]) are equally sensitive to chunk length in real-word and Jabberwocky conditions (Figure 3C and E). However, the LPostTemp language region shows highly similar length effects for real-word and Jabberwocky stimuli, as PDD claimed, and in direct comparisons, the difference in length effect between the real-word conditions and the Jabberwocky conditions is significantly weaker in the LPostTemp region relative to both the LIFGorb and the LIFG language regions, in spite of the fact that the left inferior frontal gyrus (IFG) has been classically associated with syntactic processing (e.g., Grodzinsky, Pieperhoff, & Thompson, 2021; Friederici, 2011; Hagoort, 2005; Caramazza & Zurif, 1976). This result supports PDD's claim that the LPostTemp region is insensitive to combinatorial semantics, showing equal sensitivity to syntactic structure, with or without lexical content. We return to this finding in the Discussion section.
Does Syntactic Constituency Critically Drive the Length Effect?
The length effect in Experiment 2 is at least as strong in the nonconstituent conditions as it is in the real-word constituent conditions, which undermines PDD's claim that length effects are driven primarily by syntactic constituency. This finding is reinforced by Experiment 3, which evaluates length effects in materials composed primarily (86.5%) of nonconstituents (Figure 3D). As shown, the length effect in response to these largely nonconstituent materials is qualitatively similar to the length effects reported in Experiments 1 and 2, and quantitatively, we observe no significant differences in any region, or in the language network as a whole, between the length effect in Experiment 3 versus in either Experiment 1 or Experiment 2 in between-groups comparisons. Thus, syntactic constituency does not critically drive the length effects in the language network.
Summary
Our results support a distributed burden of lexical-semantic, syntactic, and combinatorial-semantic processing throughout the language network (rather than the dissociation between syntactic and lexico-semantic subnetworks, as claimed by PDD) and challenge the claim that stronger responses to longer chunks are driven by syntactic constituency (given that these length effects are equally strong regardless of whether the chunks form constituents). The key similarities and differences between our findings and PDD's are summarized in Table 2.
PDD reported (a) one set of regions (inferior frontal and posterior temporal) that were sensitive to structure (chunk length) in real-word conditions and equally sensitive to structure in Jabberwocky conditions (supporting abstract syntactic processing in these regions), and (b) another set of regions (anterior temporal and TPJ) that were sensitive to lexical content and insensitive to structure in Jabberwocky conditions. Our study does not reproduce several of PDD's reported insensitivities (red minus signs) and challenges the purported double dissociation between semantic regions on the one hand (anterior temporal and temporoparietal areas) and syntactic regions on the other (inferior frontal and posterior temporal areas). Instead, we find more broadly distributed lexical semantic, syntactic, and combinatorial semantic effects throughout the language network, albeit with evidence (consistent with PDD's claims) that the temporoparietal area is only sensitive to structure in real-word conditions and that the posterior temporal language area is equally sensitive to structure in both real-word and Jabberwocky conditions.
See Appendix 9 for evidence that the results above hold when we use the masks from PDD to define the language areas, and see Appendix 10 for evidence that qualitatively similar patterns hold even when we average over PDD's entire ROI parcels (i.e., without functional localization), albeit with weaker overall effects. See Appendix 11 for evidence that the extremes of the length conditions—(jab-)c01 and (jab-)c12—replicate an established pattern of response in the language network. See Appendix 12 for exploratory analyses of the right-hemisphere homotopes of the left-hemisphere language areas.
DISCUSSION
Whether different brain areas specialize for different types of linguistic processing is a long-standing open question in the neurobiology of language. Perhaps the most frequently proposed pattern of specialization is a dissociation between some brain areas that selectively represent and process the syntax of sentences and others that selectively represent and process their semantics (Friederici, 2017; Dehaene et al., 2015; Bornkessel-Schlesewsky & Schlesewsky, 2013; Baggio & Hagoort, 2011). This perspective is inspired in part by the fact that some nonlinguistic domains (e.g., mathematics, action planning, and music) also exhibit a kind of “syntax” in that they obey similar principles to language of sequential, hierarchical, and symbolic representation (Lashley, 1951). If, as some have argued (Fitch & Martins, 2014; Koechlin & Jubault, 2006; Novick, Trueswell, & Thompson-Schill, 2005; Patel, 2003), abstract syntactic structure building is supported by a shared brain network with a key locus in the inferior frontal cortex, then the human capacity for language may be linked to a more general capacity for structured symbol manipulation, which may in turn have arisen from anatomical changes to pFC during human evolution (Dehaene et al., 2015, 2022). This position offers tantalizing continuities between language and other domains, along with explanatory links to evolutionary processes that might have set the stage (a) for the emergence of language or (b) for the increasing sophistication of human cognition following language evolution. However, the empirical literature that is used to support this position (from both neuroimaging and neuropsychology) has relied on analyses that average brains in a common space and assume that a given spatial coordinate implements the same function across individuals—an assumption that is known to be incorrect for the language system and to lead systematically both to (i) failure to discover functional selectivities that are present in individual brains and (ii) conflation of functions that are distinct in individual brains (Fedorenko et al., 2010, 2011; Fedorenko & Kanwisher, 2009; Saxe et al., 2006). This concern extends to the finding of distinct syntactic and lexico-semantic processing centers by PDD, whose results are additionally subject to concerns about (i) reliance on between-groups comparisons to substantiate the claim of abstract syntactic processing and (ii) using the same data to define the ROIs and to statistically examine their responses. Because PDD's results have informed much subsequent theorizing about the neural basis of language and the structure of mental representations for language (e.g., Bornkessel-Schlesewsky et al., 2015; Dehaene et al., 2015; Bolhuis et al., 2014; Fitch, 2014; Petkov & Jarvis, 2012) and because of a growing effort in the field to replicate influential findings (Gilmore et al., 2017; Yarkoni & Westfall, 2017; Open Science Collaboration, 2015; Makel & Plucker, 2014; Simons, 2014; Pashler & Wagenmakers, 2012), here, we revisit PDD's claims across three fMRI experiments that address these methodological concerns.
Our findings robustly replicate PDD's discovery of parametric sensitivity in the language areas to the amount of linguistic context (increasing activation for longer spans of coherent text), as well as their finding that this pattern continues to hold in several areas even when lexical content is removed. Not only do we find this pattern across multiple experiments and in a different language (English) than the originally used French, but the effects are statistically indistinguishable across multiple independent groups of participants, which suggests that PDD uncovered a stable population-level signature of language comprehension in the brain (Fedorenko et al., 2016). This signature constitutes compelling evidence both that the brain's response is modulated by linguistic complexity and that syntax contributes to this modulation independently of meaning. This finding from PDD (replicated here) is thus an important explanandum in any theory of the brain basis of language comprehension.
However, our findings do not accord with PDD's proposed division of labor within the language network, namely, a double dissociation between syntactic and lexico-semantic subnetworks. Instead, our results reveal a more distributed pattern of lexical-semantic, syntactic, and combinatorial-semantic processing than that proposed by PDD (key similarities and differences between our findings and PDD's are summarized in Table 2). First, our results challenge the notion of pure syntactic hubs (i.e., the claim that inferior frontal and posterior temporal language areas respond identically to syntactic complexity across real-word and Jabberwocky conditions). Instead, we find large and statistically significant increases in the language network's response, including in the inferior frontal and posterior temporal areas, to real-word relative to Jabberwocky stimuli. This finding aligns with several prior studies (fMRI: Fedorenko et al., 2010—see Figure A5 for a direct comparison of the overlapping subset of conditions, and also Bedny, Pascual-Leone, Dodell-Feder, Fedorenko, & Saxe, 2011; magnetoencephalography: Matchin, Brodbeck, Hammerly, & Lau, 2019; intracranial recordings: Fedorenko et al., 2016) and with growing evidence for strong integration between syntax and semantics in the representations and computations that underlie language processing across fields and approaches, from linguistic theory (e.g., Goldberg, 2005; Pollard & Sag, 1994; Jackendoff, 1990; Kaplan & Bresnan, 1982), to psycholinguistics (e.g., Schuler & Wheeler, 2014; Pylkkänen & McElree, 2006; Kamide, Scheepers, & Altmann, 2003; MacDonald, Pearlmutter, & Seidenberg, 1994), to computational linguistics (e.g., Oh & Schuler, 2021; Dyer, Kuncoro, Ballesteros, & Smith, 2016; Mikolov, Chen, Corrado, & Dean, 2013; Manning & Schütze, 1999), to cognitive neuroscience (e.g., Kauf, Tuckute, Levy, Andreas, & Fedorenko, 2024; Merlin & Toneva, 2022; Anderson et al., 2021; Caucheteux, Gramfort, & King, 2021; Reddy & Wehbe, 2021; Fedorenko et al., 2016, 2020; Bautista & Wilson, 2016; Blank, Balewski, Mahowald, & Fedorenko, 2016; Fedorenko, Nieto-Castañón, et al., 2012; Keller, Gunasekharan, Mayo, & Corley, 2009).
Second, our results challenge the claim that inferior frontal areas are insensitive to semantic (as opposed to purely syntactic) composition. Instead, we find larger increases in response to chunk length in the real-word compared with the Jabberwocky conditions in both the LIFG and LIFGorb language areas.
Third, our results challenge the claim that anterior temporal areas only process combinatorial structure in the presence of lexical meaning. Instead, we find significant increases in response to chunk length in the Jabberwocky conditions (see also Brennan et al., 2012; Fedorenko et al., 2010; Figure A5).
Our results thus suggest greater spatial overlap in the brain among lexical-semantic, syntactic, and combinatorial-semantic processing than suggested by PDD, at least at the level of the macroanatomical areas (e.g., inferior frontal vs. anterior temporal vs. posterior temporal components of the language network).
Our results bear on linguistic composition in the general sense in which we have used this term (i.e., unifying word-level syntactic or semantic representations into phrase-level representations; Table 1). This general sense of composition should be distinguished from the narrower sense of composition as a transparent derivation of meaning via rule application (e.g., Montague, 1970), as opposed to, for example, the opaque conventionalized meanings of some multiword expressions (e.g., idioms like let the cat out of the bag). Our study does not attempt to distinguish pathways to phrasal meaning and therefore does not bear directly on the degree to which the brain relies on rules, surface statistics, and/or prior knowledge of the world to derive meaning from language (although this question has received substantial attention in the literature, e.g., Baggio, 2021). Our present concern is instead to characterize the effect of phrasal structure in brain areas responsible for inferring phrase-level syntactic and semantic representations, irrespective of how they do so.
We are by no means the first to express skepticism about the existence of sharp macroanatomical boundaries between syntax and semantics in the brain. Direct critiques of this idea have been raised both by our own group (e.g., Fedorenko et al., 2020) and by others (e.g., Aliko, Wang, Small, & Skipper, 2023; Rodd, Longe, Randall, & Tyler, 2010; Rodd, Vitello, Woollams, & Adank, 2015; Skipper, 2015; Wilson & Saygın, 2004). Indeed, several existing studies already support a broad distribution of syntactic (Shain et al., 2022; Shain, Blank, van Schijndel, Schuler, & Fedorenko, 2020) and semantic (Tang, LeBel, Jain, & Huth, 2023; Huth, de Heer, Griffiths, Theunissen, & Gallant, 2016) processing during language comprehension, as well as gradient (rather than categorical) differences in selectivity for syntax and semantics throughout much of perisylvian language cortex (Caucheteux et al., 2021; Reddy & Wehbe, 2021). Much of this countervailing evidence is based on naturalistic story listening data (although cf., e.g., Bautista & Wilson, 2016). Our study shows that even under experimental interventions that are highly similar to those used by PDD to produce some of the clearest evidence for a spatial dissociation between syntactic and semantic processing, results are most consistent with a distributed burden of lexical-semantic, syntactic, and combinatorial-semantic processing.
In addition to finding more distributed effects of both syntax and semantics than originally reported by PDD, our results also challenge the centrality to these length effects of syntactic constituency. A constituent is a complete phrase in a hierarchical representation of the sentence's grammatical structure (Figure 1B). PDD used constituents in the main conditions of interest, and—based on a comparison with control conditions that used nonconstituents—argued that the effect of chunk length critically depended on constituency, as opposed to other kinds of syntactic and semantic relations that hold between words in contiguous spans of language. Using PDD's narrow contrast between three- and four-word chunks that do not form constituents, we find that the increase in brain activity from the three-word condition to the four-word condition is at least as large in the nonconstituent stimuli as it is in the constituent stimuli in all regions except the LAngG language region (see below for discussion of this region). Furthermore, in a separate experiment that explored a wider range of implicit chunk lengths and consisted overwhelmingly (> 86%) of nonconstituent chunks, we find qualitatively and quantitatively similar effects of chunk length to those found when using valid syntactic constituents, with no significant difference in the length effect in any region or in the language network as a whole. This result is therefore incompatible with PDD's claim that chunk-length effects are driven primarily by the memory demands associated with assembling phrasal constituents. Nonetheless these length effects plausibly derive from linguistic complexity more broadly construed, and indeed, we find that multiple independently motivated measures of linguistic processing demand correlate with chunk length (Figure A2). Our results simply argue for an interpretation of length effects as driven by (perhaps diverse features of) richer linguistic contexts, rather than by phrasal constituency specifically. Other studies are needed to elucidate what those features are (see, e.g., Heilbron, Armeni, Schoffelen, Hagoort, & de Lange, 2022; Shain et al., 2020, 2022; Lopopolo, Frank, van den Bosch, & Willems, 2017; Lopopolo, van den Bosch, Petersson, & Willems, 2021; Brennan & Hale, 2019; Brennan et al., 2012; Brennan, Stabler, Van Wagenen, Luh, & Hale, 2016; Willems, Frank, Nijhof, Hagoort, & van den Bosch, 2016; Henderson, Choi, Luke, & Desai, 2015).
An alternative conceptualization of the length effects observed in our study and in PDD draws on the framework of “proper” and “actual” domains of specialized information processing systems (Barrett & Kurzban, 2006; Sperber, 1994), whereby the system's degree of engagement with an input can be modulated by the degree of fit between a given input and the target domain for which the system is adapted. Given the highly combinatory and contextualized nature of natural language, several words of contiguous context may be necessary to identify a stimulus as “proper” to the language network. As a consequence, shorter length conditions may fail by degrees to fully engage language processing mechanisms in the first place, thereby attenuating overall activation in the language network (see also Tuckute et al., 2024). Temporal receptive windows (TRWs), that is, the length of the preceding context that affects the processing of the current input (Lerner, Honey, Silbert, & Hasson, 2011; Hasson, Yang, Vallines, Heeger, & Rubin, 2008), could potentially serve as a filter for identifying domains proper to the language network, and indeed prior evidence supports the existence of TRWs for language on the order of a few words (Regev et al., 2023; Blank & Fedorenko, 2020; Nelson et al., 2017; Fedorenko et al., 2016; Lerner et al., 2011). However, the causes of these patterns of temporal receptivity are unknown, and they could derive from more basic kinds of linguistic processing (e.g., the degree to which nearby words can be composed into a syntactic parse may serve as a cue to whether an input is proper to the language network). In the absence of a deeper causal understanding of TRWs in the language network, viewing length effects as reflecting the distinction between proper and actual domains is not mutually incompatible with the interpretation whereby length effects reflect linguistic processing complexity.
Despite the lack of dissociation between syntactic and lexico-semantic processing centers and the broader distribution of diverse aspects of linguistic processing within the language network, our findings support two key functional asymmetries that were posited by PDD:
- (1)
The LAngG/LTPJ language region differs functionally from the rest of the LH language network.
First, like PDD, we find that the temporoparietal (LAngG in our terminology, left TPJ [LTPJ] in PDD's) language region behaves differently from the rest of the language regions: The length effect for Jabberwocky stimuli is (i) not significant and (ii) significantly smaller than in all other language regions. Thus, the LAngG language region is indeed less responsive to chunk length than other language regions in the absence of lexical content.
This finding should be interpreted in the context of related evidence that speaks to the role of this temporoparietal area in language processing. Although this area was identified as a language area by PDD and included as part of the language network in early studies using the functional localization paradigm (given its stronger responses to sentences than lists of pseudowords; Pallier et al., 2011; Fedorenko et al., 2010), evidence has accumulated over the last decade that this area differs functionally from the rest of the language network. First, the LAngG/LTPJ language region shows systematically weaker correlations with other language regions during naturalistic cognition paradigms than those regions do with each other (Malik-Moraleda et al., 2022; Paunov, Blank, & Fedorenko, 2019; Blank et al., 2014). Furthermore, data-driven functional parcellation using dense individual-subject resting state data picks out the same core temporal and frontal areas examined here—but not the LAngG/LTPJ region—as an integrated network (Braga et al., 2020). Second, the LAngG/LTPJ language region shows substantially weaker evidence, compared with the other LH language regions, of core language processing operations like next-word prediction and syntactic structure building (Shain, Blank, et al., 2020, 2022; Blank et al., 2016). Third, the LAngG/LTPJ language region responds at least as strongly to pictures and videos of meaningful events as to sentences, and sometimes more strongly (Shain, Paunov, Chen, et al., 2023; Ivanova et al., 2021; Amit, Hoeflin, Hamzah, & Fedorenko, 2017), contra other language regions, which are selective for linguistic over pictorial inputs. In addition, this region often shows below-baseline responses during language conditions (e.g., both in this study and in PDD), which would be explained if this area is instead a node in the default mode network (Vincent et al., 2006; Greicius, Krasnow, Reiss, & Menon, 2003; Raichle et al., 2001), a brain network whose activity increases during rest, and which has been associated with high-level conceptual processing and/or episodic self-projection (Davey et al., 2016; Philippi, Tranel, Duff, & Rudrauf, 2015; Vincent et al., 2006; Greicius et al., 2003). Many have implicated the angular gyrus broadly (cf. the language-responsive portion of it) in heteromodal conceptual integration (Ivanova et al., 2021; Davis & Yee, 2019; Amit et al., 2017; Fernandino et al., 2016; Price, Bonner, Peelle, & Grossman, 2015; Price, Peelle, Bonner, Grossman, & Hamilton, 2016; Bonner, Peelle, Cook, & Grossman, 2013; Seghier, 2013; Binder, Desai, Graves, & Conant, 2009). This hypothesis could explain the greater response in the AngG/TPJ region to meaningful language stimuli, even in the absence of a selectively linguistic function. As a result of all this evidence, in recent work (e.g., Chen et al., 2023; Shain, Paunov, Chen, et al., 2023), we have begun excluding the LAngG language area from our definition of the language network.
- (2)
The LPostTemp language region is sensitive to syntax and lexical semantics but not combinatorial semantics.
The claim from PDD that our results most strongly support is that the language-responsive area in the posterior temporal cortex is similarly sensitive to syntax, with or without lexical content. Although the overall response of the LPostTemp region to real-word stimuli is greater than its response to Jabberwocky stimuli, the difference in the length effect between real-word and Jabberwocky stimuli is virtually zero, as evidenced by similar slopes (Figure 3C and E)—the +Lex, +Synt, –CombSem profile in Figure 2. This result is inconsistent with PDD's strong characterization of LPostTemp as a pure syntactic hub (−Lex, +Synt, –CombSem), given that its response is strongly influenced by lexical content, independently of syntax. However, it does suggest that the burden of combinatorial processing in the LPostTemp region is unaffected by the meaningfulness of the resulting structure, which supports a lack of combinatorial-semantic processing over and above syntactic processing. This profile appears to be unique to the LPostTemp language region; the difference between the length effects in real-word versus Jabberwocky stimuli is nonsignificant and near zero in LPostTemp, significant in inferior frontal (LIFG and LIFGorb) language regions, and significantly larger in the frontal regions than in the LPostTemp region in direct comparisons, although LIFG has been classically associated with abstract syntax (Grodzinsky et al., 2021; Friederici, 2011; Hagoort, 2005; Caramazza & Zurif, 1976).
This result is important for two reasons. First, it lends support to the hypothesis that the posterior temporal language area plays a special role in processing hierarchical syntax, relative to other language areas that frequently co-activate with it during language processing (Matchin & Hickok, 2020; Bornkessel-Schlesewsky & Schlesewsky, 2013). Second, it is to our knowledge the first clear evidence of region-level (cf. Regev et al., 2023) functional differentiation within the human language network using functional localization methods—which account for interindividual variation in the precise locations of language areas—and appropriate Region × Condition interaction statistics (e.g., Nieuwenhuis, Forstmann, & Wagenmakers, 2011). These methods have so far yielded a highly distributed picture of linguistic (including sublexical, lexical semantic, syntactic, and combinatorial semantic) processing across the regions of the language network, with little evidence of network-internal structure (Regev et al., 2024; Shain et al., 2020, 2022; Mollica et al., 2020; Blank et al., 2016; Fedorenko et al., 2010; see Fedorenko et al., 2024, for review). Our current results support invariance in the LPostTemp language region to combinatorial semantics (over and above syntax, –CombSem in the terminology of Figure 2). This finding is noteworthy in light of evidence that damage to posterior temporal cortex is associated with more severe and longer-lasting aphasia compared with other parts of the language network (Wilson et al., 2023). This finding also aligns with prior proposals that posterior temporal cortex may serve a critical early stage of the comprehension process: receiving input from perceptual areas (e.g., speech perception areas (Overath, McDermott, Zarate, & Poeppel, 2015)), identifying grammatical categories and hierarchical phrasal relations, and relaying this syntactic information to downstream conceptual semantic areas (Matchin & Hickok, 2020). This hypothesis could account for the apparent absence of combinatory semantic effects in the LPost Temp region, given that this region may be upstream from these combinatorial semantic computations. However, invariance to combinatorial semantics is a weaker claim than the widespread interpretation of PDD as showing a selectively syntactic role for the posterior temporal language region (i.e., −Lex, +Synt, –CombSem; see Appendix 1).
Although we have improved on the methods and analyses used in some prior work on the neurobiology of natural language syntax and semantics, our study nonetheless has limitations. First, our findings of overlap between syntax and semantics pertain only to large-scale brain regions. We have focused on macroanatomy because this is the level at which most current neurobiological models posit functional dissociations within the language network (Friederici, 2017; Duffau et al., 2014; Bornkessel-Schlesewsky & Schlesewsky, 2013; Hagoort, 2005; Hickok & Poeppel, 2007). Of course, current results are compatible with the existence of functional differentiation at smaller spatial scales (within the regions that we have used as units of analyses, within voxels, or within neural populations / individual cells, as can be measured with intracranial recordings. (Regev et al., 2023). Second, our intended manipulation of the presence/absence of lexical semantics between our real-word and Jabberwocky materials may not be pure: Some semantic information may be inferred from pseudowords' form (Blasi et al., 2016) or context (Li, 1988), and syntactic information may be harder to recover in Jabberwocky sentences. Indeed, prominent theories of syntax assume that most syntactic information is stored alongside semantics in the mental lexicon (with only very abstract composition rules that assemble these syntactic fragments into larger structures), perhaps resulting in impoverished syntactic representations for pseudowords relative to real words (Goldberg, 2005; Steedman, 2001; Chomsky, 1995a; Pollard & Sag, 1994). If our Jabberwocky materials are simply harder to parse than our real-word materials (or than PDD's original Jabberwocky materials), this could explain the steeper length effects that we find in real-word versus Jabberwocky conditions. Although we cannot entirely rule out such a confound, worse parsing of Jabberwocky conditions is unlikely to be the primary explanation for our results: Syntactic structure is sufficiently available in our Jabberwocky materials to drive increases in processing demand throughout the language network, and even in one region (the LPostTemp language region) to the same extent as real-word materials. Third, our finding of chunk length effects leaves open a wide space of questions about the kinds of computations that drive these effects. Although chunk length effects are consistent with PDD's assumption that chunk length indexes the size of the neural assembly needed to represent a parse tree for the chunk, other interpretations are possible (as discussed above with respect to TRWs).
In conclusion, we find lexicality effects in inferior frontal and posterior temporal language regions, length effects for Jabberwocky stimuli in the anterior temporal region (as well as all other language regions except the one in the angular gyrus), and length by lexicality interactions in the inferior frontal language regions (and other language regions except the posterior temporal region). This pattern of findings is summarized in Table 2. These results collectively support a broad distribution of sensitivity to syntax and semantics throughout the human language network, challenging PDD's hypothesized dissociation between language regions that selectively process abstract syntax and language regions that selectively process lexical and/or combinatorial semantics. Our results instead converge with growing evidence that linguistic representations and computations over a range of levels of description (phonological, lexical, syntactic, and combinatorial-semantic) are largely distributed across the language network (Regev et al., 2024; Shain et al., 2022; Blank & Fedorenko, 2020; Fedorenko et al., 2020; Bautista & Wilson, 2016; Fedorenko, Nieto-Castañon, & Kanwisher, 2012). We do find evidence of one key invariance argued for by PDD: Although the posterior temporal language region is more responsive to materials with lexical content, it shows no increase in response to combinatorial semantics over and above syntax. This finding deserves further investigation, including with more temporally sensitive methods, to ask whether this brain region may support an earlier stage of comprehension that focuses on identifying the words and the grammatical relations among them, with inferences about, for example, logical semantics (entities, relations, quantifiers, entailments, etc.) subsequently taking place in other language areas. However, our results show that the burden of lexical-semantic, syntactic, and combinatorial-semantic processing is distributed across diverse cortical areas, and that no single area or set of areas constitutes the syntax hub claimed by PDD and related work.
APPENDIX
Appendix 1: Extended Discussion of the Impact of Pallier and colleagues (2011)
Here, we provide an extended discussion of how the research community has tended to interpret PDD with respect to the neurobiological bases of syntactic versus lexico-semantic processing.
PDD's finding of virtually identical parametric increases in inferior frontal and posterior temporal language areas' activation with chunk length across both real-word and Jabberwocky stimuli strongly suggests that these regions comprise an autonomous syntactic “module” (the –Lex, +Syn, –CombSem profile in the terminology of Figure 2 of the main article; Fodor, 1983). We believe this is the most straightforward interpretation of PDD's emphasis on “the relative independence of syntax from lexico-semantic features” (p. 2526, emphasis ours). Subsequent work by the authors has been more explicit about this interpretation: “Remarkably, when the stimuli were ‘delexicalized' by substituting all content words with meaningless pseudowords while maintaining all grammatical words and inflections, a core set of areas in left IFG and pSTS continued to respond identically, suggesting their central role in the construction of abstract syntactic trees” (Dehaene et al., 2015, p. 12, emphasis ours; see also Dehaene, 2019). This interpretation of PDD has been explicit in some studies (Kempen, 2014) and is at least implied by other studies citing PDD in support of a “modular” (Hertrich et al., 2016), “core” (Dehaene, Al Roumi, Lakretz, Planton, & Sablé-Meyer, 2022; Dehaene, 2019; Nelson et al., 2017; Pattamadilok et al., 2016; Wang et al., 2015), or “pure” (Hage & Nieder, 2016) syntax network. We therefore believe that an important component of PDD's influence has been the suggestion of an autonomous module or network for syntactic tree building that is insensitive to the content (meaning) of those trees and that therefore responds identically to both real and Jabberwocky constituents.
This strong position is difficult to sustain in the face of abundant evidence that inferior frontal and posterior temporal language areas respond more to real-word than Jabberwocky stimuli (+Lex in Figure 2, e.g., Matchin et al., 2017; Mahowald & Fedorenko, 2016; Fedorenko et al., 2010; Humphries et al., 2006; Mazoyer et al., 1993, inter alia). However, a weaker interpretation of PDD's inferior frontal and posterior temporal results focuses only on the absence of a difference in the slope of the parametric effect of constituent length between real-word and Jabberwocky stimuli, while allowing for a difference in overall response between the two condition types (the +Lex, +Syn, –CombSem profile in Figure 2). This position abandons the notion that these regions constitute an independent syntactic module (in PDD's words, the “independence of syntax from lexico-semantic features”), given that they are allowed to be sensitive not only to the demands of processing syntactic structures but also to the demands associated with processing real (but not Jabberwocky) words (e.g., retrieving and representing lexical meanings). Under this view, the key invariance in these regions that is supported by PDD's results is to combinatorial-semantic content, given that the increase in activation with syntactic complexity is not greater in the real-word conditions (which have combinatorial-semantic meaning) versus the Jabberwocky conditions (which arguably do not). Studies that cite PDD in favor of syntax selectivity in these regions must at minimum have this weak interpretation in mind (e.g., Nelson et al., 2017; Hage & Nieder, 2016; Hertrich et al., 2016; Fitch, 2014; Fitch & Martins, 2014; Berwick, Beckers, Okanoya, & Bolhuis, 2012; Cappa, 2012), although the distinction between the weak and strong claims above is rarely made explicit.
In addition, PDD's finding of a length effect in anterior temporal and temporoparietal language areas only in the real-word (but not the Jabberwocky) conditions has been taken to support a selectively semantic function for these areas (the +Lex, –Syn, +CombSem profile in Figure 2), by PDD themselves and by work building on their results (Frankland & Greene, 2020; Zaccarella et al., 2017; Bautista & Wilson, 2016; Friston & Buzsáki, 2016; Skeide et al., 2016; Bornkessel-Schlesewsky et al., 2015; Zaccarella & Friederici, 2015; Wilson et al., 2014; Bornkessel-Schlesewsky & Schlesewsky, 2013).
Appendix 2: Data Acquisition, Preprocessing, and First-level Modeling
Data Acquisition
All data were collected at the Athinoula A. Martinos Imaging Center at the McGovern Institute for Brain Research at MIT. Whole-brain structural and functional data were collected using one of two configurations. Data from participants in Experiment 1 and Experiment 2 that were scanned before 2021 (n = 40) were acquired on whole-body 3 Tesla Siemens Trio scanner with a 32-channel head coil. For these participants, T1-weighted, magnetization prepared rapid gradient echo structural images were collected in 176 sagittal slices with 1-mm isotropic voxels (repetition time [TR] = 2530 msec, echo time [TE] = 3.48 msec, inversion time [TI] = 900 msec, flip = 8°). Functional, BOLD data were acquired using an EPI sequence with a 90° flip angle and using generalized autocalibrating partially parallel acquisitions (GRAPPA) with an acceleration factor of 2, with the following parameters: thirty-three 4-mm thick near-axial slices acquired in an interleaved order (with 10% distance factor), with an in-plane resolution of 2.1 mm × 2.1 mm, field of view in the phase encoding (A > > P) direction 200 mm and matrix size 96 × 96, TR = 2000 msec and TE = 30 msec. The first 10 sec of each run were excluded to allow for steady-state magnetization.
Data from participants in Experiment 2 and Experiment 3 that were scanned in 2021 or later (n = 35) were collected on a whole-body 3 Tesla Siemens PRISMA scanner with a 32-channel head coil, also at the Athinoula A. Martinos Imaging Center at the McGovern Institute for Brain Research at MIT. For these participants, T1-weighted, magnetization prepared rapid gradient echo structural images were collected in 208 sagittal slices with 1-mm isotropic voxels (TR = 1800 msec, TE = 2.37 msec, TI = 900 msec, flip = 8°). Functional, BOLD data were acquired using a simultaneous multi-slice EPI sequence with a 90° flip angle and using a slice acceleration factor of 2, with the following acquisition parameters: fifty-two 2-mm thick near-axial slices acquired in the interleaved order (with 10% distance factor), 2 mm × 2 mm in-plane resolution, field of view in the phase encoding (A > > P) direction 208 mm and matrix size 104 × 104, TR = 2,000 msec, TE = 30 msec, and partial Fourier of 7/8. For both functional sequences, the first 10 sec of each run were excluded to allow for steady-state magnetization.
Preprocessing
fMRI data were analyzed using SPM12 (release 7487), CONN EvLab module (release 19b) and other custom MATLAB scripts. Each participant's functional and structural data were converted from DICOM to NIFTI format. All functional scans were coregistered and resampled using B-spline interpolation to the first scan of the first session (Friston et al., 1995). Potential outlier scans were identified from the resulting subject-motion estimates as well as from BOLD signal indicators using default thresholds in CONN preprocessing pipeline (5 SDs above the mean in global BOLD signal change, or framewise displacement values above 0.9 mm, Nieto-Castanon, 2020). Functional and structural data were independently normalized into a common space (the Montreal Neurological Institute template; IXI549Space) using SPM12 unified segmentation and normalization procedure (Ashburner & Friston, 2005) with a reference functional image computed as the mean functional data after realignment across all timepoints omitting outlier scans. The output data were resampled to a common bounding box between Montreal Neurological Institute-space coordinates (−90, −126, −72) and (90, 90, 108), using 2-mm isotropic voxels and fourth order spline interpolation for the functional data, and 1-mm isotropic voxels and trilinear interpolation for the structural data. Last, the functional data were then smoothed spatially using spatial convolution with a 4-mm FWHM Gaussian kernel.
First-level Modeling
For both the language localizer task and the critical task, effects were estimated using a general linear model in which each experimental condition was modeled with a boxcar function convolved with the canonical hemodynamic response function (fixation was modeled implicitly, such that all timepoints that did not correspond to one of the conditions were assumed to correspond to a fixation period). Temporal autocorrelations in the BOLD signal timeseries were accounted for by a combination of high-pass filtering with a 128-sec cutoff, and whitening using an AR(0.2) model (first-order autoregressive model linearized around the coefficient α = .2) to approximate the observed covariance of the functional data in the context of restricted maximum likelihood estimation. In addition to main condition effects, other model parameters in the general linear model design included first-order temporal derivatives for each condition (included to model variability in the hemodynamic response function delays), as well as nuisance regressors controlling for the effect of slow linear drifts, subject-motion parameters, and potential outlier scans on the BOLD signal. Resulting effect estimates reflect percent BOLD signal change (PSC).
Appendix 3: Participant-specific Functional Localization
Procedure
Sixty-five participants (out of 75 total) performed the localizer task in the same session as the critical task, and the remaining participants performed the localizer in a different session (for evidence that localizer activations are stable across scanning sessions, see Lipkin et al., 2022; Braga et al., 2020; Mahowald & Fedorenko, 2016). Most participants completed one or two additional tasks for unrelated studies. The entire scanning session lasted approximately 2 hr.
Localizer Task
The task used to localize the language network is described in detail in Fedorenko and colleagues (2010). Briefly, we used a reading task that contrasted sentences and lists of unconnected, pronounceable nonwords in a standard blocked design with a counterbalanced order across runs. This contrast targets higher-level aspects of language including, critically, lexical-semantic, syntactic, and compositional-semantic processing, to the exclusion of perceptual (speech or reading-related) and articulatory processes (see, e.g., Fedorenko & Thompson-Schill, 2014, for discussion). Stimuli were presented one word/nonword at a time. Participants were asked to read the materials attentively and to press a button at the end of each trial (included to help participants remain alert). Importantly, this localizer has been shown to generalize across different versions: the sentences > nonwords contrast, and similar contrasts between language and a degraded control condition, robustly activates the fronto-temporal language network regardless of the task, materials, modality of presentation, and particular language (Chen et al., 2023; Malik-Moraleda et al., 2022; Ivanova et al., 2020; Scott et al., 2017; Fedorenko et al., 2010). This includes generalization to both narrower contrasts (e.g., sentences > lists of unconnected words; Blank et al., 2016; Fedorenko et al., 2010) and broader contrasts (e.g., listening to passages > listening to acoustically degraded passages; in fact, this latter, auditory version of the localizer was used for two participants in Experiment 3 because of poor data quality in the visual language localizer, as described below; e.g., Malik-Moraleda et al., 2022; Scott et al., 2017). Furthermore, the same network robustly emerges from naturalistic-cognition paradigms (e.g., resting state, listening to stories, watching movies) using the data-driven functional correlation approach (Braga et al., 2020, see also Branco, Seixas, & Castro, 2020; Blank et al., 2014; Tie et al., 2014), suggesting that this network constitutes a natural kind in the brain, and our localizer contrast is simply a quick and efficient way to identify this network as needed for testing critical hypotheses about it.
The whole-brain maps for the language localizer are available at: https://osf.io/fduve/.
Definition and Validation of Language-responsive Functional Regions of Interest
For each participant (in each experiment), we defined a set of language-responsive fROIs using group-constrained, participant-specific localization (Fedorenko et al., 2010). In particular, each participant's map for the sentences > nonwords contrast from the language localizer task was intersected with a set of six binary masks (the maps for the language localizer are available at: https://osf.io/fduve/). These masks were derived from a probabilistic activation overlap map for the language localizer contrast in a large set of distinct participants (n = 220) using the watershed parcellation, as described in Fedorenko and colleagues (2010), and corresponded to relatively large areas within which most participants showed activity for the target contrast. These masks covered the fronto-temporal language network: three in the left frontal lobe falling within the IFG, its orbital portion, and the MFG, and three in the temporal and parietal cortex (Figure 3A of the main article). Within each mask, a participant-specific language fROI was defined as the top 10% of voxels with the highest t-values for the localizer contrast (see Lipkin et al., 2022 for evidence that the fROIs are similar when defined using a fixed statistical threshold).
For two participants, the data quality for the standard version of the language localizer was low; however, both had completed an alternative version of the localizer based on listening to short passages versus acoustically degraded versions of those passages (see Malik-Moraleda et al., 2022; Scott et al., 2017 for evidence that this version of the localizer identifies the same areas as the standard, reading-based localizer). For two additional participants, one run of the language localizer showed some fMRI artifacts; as a result, we used just one run for these participants (which is sufficient for identifying the language network).
Before examining the data from the critical experiments, we ensured that the language fROIs show the expected signature response (i.e., a stronger response to sentences than nonwords). To do so, we used an across-runs cross-validation procedure (e.g., Nieto-Castañón & Fedorenko, 2012), where one run of the localizer is used to define the fROIs, and the other run to estimate the responses, ensuring independence (e.g., Kriegeskorte et al., 2009). As expected, and replicating prior work (e.g., Blank et al., 2016; Mahowald & Fedorenko, 2016; Fedorenko et al., 2010, 2011, inter alia), the language fROIs showed a robust sentences > nonwords effect across the 73 participants with cross-validated localizer contrast estimates (all t(72) > 7.26, p < 1e-9, Cohen's d > 0.085), correcting for the number of regions (six) using the false discovery rate (FDR) correction (Benjamini & Yekutieli, 2001). The localizer activation maps for the two additional participants with a single run of the localizer task (preventing across-runs cross validation) were evaluated by visual inspection and looked typical.
Our masks show a close correspondence with the group-level ROIs used in PDD (Figure A1, see Appendix 9 for evidence that results replicate when using PDD's parcels as masks), with three exceptions: (i) PDD did not recover the language-responsive region in the MFG (because this region consistently emerges in both contrast-based and functional-correlation-based analyses as part of the language network—e.g., Braga et al., 2020; Glasser et al., 2016; Blank et al., 2014; Fedorenko et al., 2010—we chose to include it here); (ii) our AntTemp mask encompasses both of the anterior temporal ROIs in PDD (i.e., the anterior superior temporal sulcus ROI, and the temporal pole ROI; because PDD found similar functional profiles for these two ROIs, and for ease of comparisons with past work from our group, we chose not to split our mask into two parts); and (iii) our AngG mask only partially overlaps with PDD's TPJ mask (however, none of PDD's critical claims that we challenge in the current article pertain to this region; besides, our results for this region are similar to PDD's in spite of this difference in the masks, see Appendix 9).
Appendix 4: Materials Selection and Stimulus Design
Experiments 1 and 2
To create the materials for the real-word conditions, we extracted 180 two-word constituents (c02), 120 three-word constituents (c03), 90 four-word constituents (c04), 60 six-word constituents (c06), and 30 twelve-word constituents (c12) from the Penn-Treebank-parsed corpus (Marcus, Santorini, & Marcinkiewicz, 1993) and the Natural Stories corpus (Futrell et al., 2021). For each of the c02, c03, c04, and c06 conditions, the constituents were further manually concatenated into 30 twelve-word sequences, ensuring that syntactic or semantic dependencies would be unlikely to be formed across constituent boundaries. Finally, the c01 condition was created by selecting a set of 360 words from the full set of words in the Natural Stories corpus, and concatenating them into 30 twelve-word sequences, ensuring that adjacent words would be unlikely to combine syntactically or semantically.
To create the materials for the Jabberwocky conditions (jab-c01, jab-c04, and jab-c12), we took the strings from the c01, c04, and c12 real-word conditions and replaced all content words with pronounceable nonwords using the Wuggy software (Keuleers & Brysbaert, 2010).
To construct the materials for the nonconstituent conditions, we initially tried sampling three- and four-word nonconstituent spans from the Natural Stories corpus (Futrell et al., 2021), which contains hand-corrected phrase structure annotations. However, most strings extracted this way could often function as constituents in a different sentence context, especially given that many words in English can be used in multiple parts of speech. As a result, we hand-selected the nonconstituent chunks from a larger set of texts and manually concatenated them to ensure that syntactic or semantic dependencies were unlikely to be formed across boundaries. We used an online book recommendation app (available at recommendmeabook.com) to sample the first page of classic and recent best-selling fiction books (e.g., The Poisonwood Bible by Kingsolver). For every nonconstituent chunk, we extracted a nonconstituent of the appropriate length (three or four words long, depending on the condition) from a book and then manually searched for another nonconstituent that we believed would be unlikely to connect syntactically or semantically to the preceding one, and so on until the sequence (of four 3-word-long nonconstituents, or three 4-word-long nonconstituents) was complete. To protect against possible semantic dependencies, we often sampled nonconstituents from different books for the same sequence. Using this method, we created 30 twelve-word sequences for each nonconstituent condition (out of 120 three-word nonconstituents for the nc03 condition, and out of 90 four-word nonconstituents for the nc04 condition). Sample stimuli from Experiments 1 and 2 are shown in Figure 1 of the main article, and the full set of materials is available on the Open Science Framework (https://osf.io/fduve/).
Experiment 3
To create the materials for the (largely nonconstituent) conditions of Experiment 3, we used the English Web Treebank of the Universal Dependencies corpus (Nivre et al., 2016). First, we removed sentences that consisted of fewer than 17 words (to permit variability in the starting position of a chunk within the source sentence, even for the longest, 12-word, chunks), which resulted in a treebank of 6273 sentences (out of the original 16,622). These sentences were randomly assigned to conditions, and one chunk of the appropriate length was then extracted from each sentence, starting from a randomly chosen word index (among Positions 1 through 5) within the sentence. We additionally required that: (i) no token in a chunk could be a proper noun, a punctuation mark, a number (containing any digits), or a symbol; (ii) no token in a chunk could have a miscellaneous/non-identified part-of-speech tag; (iii) the first two characters of any word could not be capitalized (to avoid abbreviations); and (iv) the chunk could not already be in the set of extracted chunks. We oversampled the number of sequences needed for each condition by a factor of three, to allow for subsequent filtering. We filtered sequences to ensure that the sets of strings were matched across conditions (with a p value of .05 or higher for any given condition pair in an independent-samples t test) in terms of the following features: (a) the average starting index of the string; (b) the ratio of content to function words (where content words included nouns, verbs, adjectives, and adverbs); (c) the average unigram lexical frequency; and (d) the average word length (in letters). If any pair of conditions was not matched on one or more of these features, the “worst offender” chunks were removed, and the statistics were recomputed. This was repeated until all pairs of conditions were matched on all features. The resulting set of chunks was then manually examined to remove chunks that straddled clausal boundaries or contained potentially sensitive, offensive, or highly culturally specific content. We then performed the matching described above one more time on the set of approved chunks, to ensure that no biases were introduced by the content filtering.
Having selected the set of candidate chunks, we developed an algorithm to concatenate them into 24-word and 30-word items. Following PDD, the key desideratum was that the boundaries between adjacent strings within an item be clearly detectable. A long short-term memory language model was trained on the English Web Treebank from which the chunks were sampled. Using this model, for all chunks of a given length, each possible chunk pair combination was assigned a cost calculated as , where “s1 + s2” denotes chunk concatenation, and is computed by the language model. These costs were accumulated in an adjacency matrix. The chunk order with the minimum cost was found by greedily solving an asymmetric travelling salesman problem to select a minimum cost path through the chunks. This procedure resulted in a set of concatenated chunks to be used in the experiments. To create the condition with chunks of Length 1, we used the words from the condition of chunk Length 2 because the chunk Length 1 condition should be most critically comparable to the next-length-up condition. However, because all conditions were well-matched for lexical properties, as described above, the words used in chunk Length 1 condition were automatically matched to all the other conditions, too.
Appendix 5: Linguistic Features
We analyzed the materials in our real-word conditions in Experiments 1–2 with respect to six linguistic features with independent empirical support (open nodes, node closings, storage cost, integration cost, 5-gram surprisal, and probabilistic context-free grammar (PCFG) surprisal; all measures elaborated below), to shed light on possible causes of the length effects originally reported by PDD and replicated in our study. Results are reported in Figure A2. As shown, many of these features are either positively or negatively correlated with constituent length in these materials, suggesting potential directions for research that attempts to ground these effects in theory-driven accounts of language processing. We expand on these findings below.
Measures Derived from Memory-based Accounts of Language Processing
Open nodes and node closings.
Nelson and colleagues (2017) elaborated on PDD's proposal—in the context of a follow-up study that used intracranial recordings—by hypothesizing a parsing mechanism that consumes more and more working memory until the constituent ends, permitting a merge operation (Chomsky, 1995b) whereby the memory allocated to representing that constituent is released and neural activation drops proportionately. Thus, PDD's pattern of stronger activity for sequences made up of longer constituents is hypothesized to derive from an accumulation of working memory demand by the parser over the course of constituent processing, with higher average demand for longer contiguous spans of text, because they can contain longer constituents. In Nelson and colleagues (2017), these “build-up” processes were encoded in the measure open nodes (a form of storage cost associated with maintaining items in working memory), and processes associated with merge and memory release were encoded in the measure node closings (a form of integration cost associated with retrieving and updating items in working memory).
We computed both of these measures as described in Nelson and colleagues (2017) from phrase structure trees in a generalized categorial grammar (Nguyen, van Schijndel, & Schuler, 2012) that were automatically generated for all stimuli using a probabilistic parser (van Schijndel, Exley, & Schuler, 2013) and hand-corrected by an expert annotator (parses and annotations available from the ModelBlocks repository: https://github.com/modelblocks/modelblocks-release). For these and other measures, each distinct constituent within an item was treated as independent by the model. For sequences made up of multiple constituents, the values were averaged across constituents to derive a single value for each sequence.
Although node closings has independent psycholinguistic support (e.g., Brennan et al., 2012, 2016; Hale, 2006), this predictor is anticorrelated with PDD's constituent-length manipulations and therefore cannot explain the effect (Figure A2, node closings). Open nodes is better correlated with PDD's expected pattern (Figure A2, open nodes).
Dependency locality theory storage and integration cost.
The dependency locality theory (DLT; Gibson, 2000) is one of many theories of working memory use in human sentence processing (see also, e.g., Rasmussen & Schuler, 2018; Lewis & Vasishth, 2005; Gordon, Hendrick, & Johnson, 2001). It was selected for analysis based on evidence that it best characterizes activity in the language network among a range of existing memory-based theories (Shain et al., 2022). DLT effects have also been reported in behavioral studies (Chen, Gibson, & Wolf, 2005; Grodner & Gibson, 2005). The DLT posits measures that are conceptually related to the open nodes (storage) and node closings (integration) predictors discussed above. In the DLT, storage cost (Figure A2, storage cost) tracks the number of incomplete syntactic dependencies that must be maintained in memory. Integration cost (Figure A2, integration cost) tracks the difficulty of constructing a syntactic dependency as a function of the number of intervening discourse referents. The measures of integration cost that we use here incorporate modifications described in Shain, van Schijndel, Futrell, Gibson, and Schuler (2016) that discount the cost of preceding modifiers and coordinate structures and increase the cost of verbs, following theoretical and empirical support described in Shain and colleagues (2022). DLT measures were computed automatically from the hand-corrected phrase structure trees described above.
The storage cost measure is correlated with PDD's constituent-length manipulation; the integration cost measure is anticorrelated with PDD's manipulation and therefore cannot explain the effect.
Measures Derived from Surprisal-based Accounts of Language Processing
An alternative class of accounts of language comprehension with broad empirical support (e.g., Shain et al., 2020; Goodkind & Bicknell, 2018; Willems et al., 2015; Smith & Levy, 2013; Frank & Bod, 2011) focus on the predictability of incoming words in context (Levy, 2008; Hale, 2001). Here, we focus on two such measures, following Shain et al. (2020), who found support for both in neural responses of the language system during naturalistic story comprehension.
Five-gram surprisal.
The negative log probability of a word in context as computed by KenLM 5-gram language models (Heafield, Pouzyrevsky, Clark, & Koehn, 2013) from frequency counts in the Gigaword 3 corpus (Graff, Kong, Chen, & Maeda, 2007; Figure A2, 5-gram surprisal). Five-gram models condition the probability distribution over the upcoming word on the sequence of four words that precede it, using default interpolation and backoff settings as described in Heafield and colleagues (2013). Five-gram models capture local word co-occurrence statistics but struggle to capture effects of larger-scale syntactic structures (e.g., constituency, long-distance dependencies). Robust effects of 5-gram predictability (and related models) are consistently reported in both behavioral (Smith & Levy, 2013; Frank & Bod, 2011; Demberg & Keller, 2008) and neuroimaging (Shain et al., 2020; Lopopolo et al., 2017; Willems et al., 2015) studies.
PCFG surprisal.
The negative log probability of a word in context as computed by the PCFG parser of van Schijndel and colleagues (2013), trained on trees from the Penn Treebank (Marcus et al., 1993) that were automatically reannotated into a generalized categorial grammar formalism (Nguyen et al., 2012; Figure A2, PCFG surprisal). PCFG models condition only on hypothesized syntactic analyses of sentences. They therefore excel at capturing syntactic influences on expectations but struggle to capture local word-to-word cooccurrence patterns. PCFG effects have been reported in both behavioral (van Schijndel & Schuler, 2015; Fossum & Levy, 2012) and neuroimaging (Shain et al., 2020; Brennan et al., 2016) studies.
Both surprisal measures are anticorrelated with PDD's constituent-length manipulation and therefore cannot explain the effect.
Appendix 6: Contrast Definition for the Critical Experiments
The first-level models estimate the response in PSC to each condition of the critical experiment (e.g., c02, jab-c12). However, our critical research questions aggregate over these conditions in different ways (Is the response to real-word stimuli bigger than the response to Jabberwocky stimuli overall? Does activity increase on chunk length? etc.) Thus, as was done by PDD, we derive our key measures from the condition-level estimates. The resulting aggregate contrasts (estimated within each participant) are used as dependent variables for statistical analysis.
To estimate the overall response to real-word, Jabberwocky, and nonconstituent conditions, we computed a by-participant average of the responses to the stimuli in each of these broader stimulus types. To estimate the difference in response between real-word and Jabberwocky conditions, we took the by-participant difference between the averages within those two stimulus types only for Lengths 1, 4, and 12, which were represented for both stimulus types.
To estimate the parametric change in BOLD response as a function of constituent length, we computed the slope by participant of the best-fit line relating constituent length values to their associated first-level PSC estimates. To do so, we followed PDD in treating conditions c01, c02, c03, c04, c06, and c12 as equidistant, based on their observation of a sublinear monotonic relationship between length (in words) and the BOLD response. To model length effects in Experiment 3, which includes conditions not present in PDD's original study (i.e., Lengths 5, 8, and 10), we interpolated linearly between the points in PDD's original continuum. For example, Length 5 (which was not used by PDD) was treated as lying halfway between Lengths 4 and 6 (both of which were used by PDD). To estimate the difference in sensitivity to constituent length between stimulus types (e.g., between real-word and Jabberwocky conditions), we took the by-participant difference in slope between the two stimulus types.
Appendix 7: Statistical Analysis
We modeled the contrast values (as defined above, e.g., the by-participant difference in constituent length effect between real-word and Jabberwocky stimuli) as dependent variables in linear mixed-effects models in lme4 (Bates, Mächler, Bolker, & Walker, 2015) when examining entire networks, with random effects for participant and fROI, or simple linear models when examining the fROIs separately (since fROI-level models contain one contrast estimate per participant, there is no by-participant hierarchical structure to model). When examining the fROIs separately, reported p values are adjusted for false discovery rate (Benjamini & Yekutieli, 2001) over the number of fROIs in the network.
Regional contrast estimates were tested (against zero) using an unpaired t test. Pairwise tests of the difference in a contrast between two regions were tested in the same way, only using the difference in a given contrast from one region to the other (within an individual) as the dependent variable, rather than the contrast itself.
Appendix 8: Full Statistical Results from the Main Article
Full statistical results from the main article are reported in Table A1.
Contrast . | Experiment . | fROI . | β . | σ(β) . | t . | p . |
---|---|---|---|---|---|---|
Constituent length for real-word conditions | 1 | Overall | 0.19 | 0.04 | 5.34 | < .001*** |
1 | LIFGorb | 0.29 | 0.04 | 6.51 | < .001*** | |
1 | LIFG | 0.26 | 0.04 | 6.39 | < .001*** | |
1 | LMFG | 0.19 | 0.04 | 4.91 | < .001*** | |
1 | LAntTemp | 0.15 | 0.03 | 5.62 | < .001*** | |
1 | LPostTemp | 0.20 | 0.03 | 6.80 | < .001*** | |
1 | LAngG | 0.08 | 0.03 | 3.28 | .013* | |
Constituent length for real-word conditions | 2 | Overall | 0.18 | 0.03 | 5.98 | < .001*** |
2 | LIFGorb | 0.23 | 0.03 | 7.37 | < .001*** | |
2 | LIFG | 0.23 | 0.04 | 6.56 | < .001*** | |
2 | LMFG | 0.25 | 0.03 | 8.20 | < .001*** | |
2 | LAntTemp | 0.14 | 0.02 | 8.18 | < .001*** | |
2 | LPostTemp | 0.16 | 0.02 | 7.72 | < .001*** | |
2 | LAngG | 0.09 | 0.02 | 4.69 | < .001*** | |
Constituent length for Jabberwocky conditions | 2 | Overall | 0.11 | 0.03 | 4.08 | .003** |
2 | LIFGorb | 0.11 | 0.03 | 3.97 | < .001*** | |
2 | LIFG | 0.10 | 0.02 | 4.78 | < .001*** | |
2 | LMFG | 0.13 | 0.02 | 5.48 | < .001*** | |
2 | LAntTemp | 0.18 | 0.03 | 6.97 | < .001*** | |
2 | LPostTemp | 0.09 | 0.02 | 5.58 | < .001*** | |
2 | LAngG | 0.14 | 0.02 | 8.30 | 1.000 | |
Lexicality effect (real-word > Jabberwocky) | 2 | Overall | 0.75 | 0.10 | 7.49 | < .001*** |
2 | LIFGorb | 0.67 | 0.09 | 7.43 | < .001*** | |
2 | LIFG | 0.68 | 0.13 | 5.29 | < .001*** | |
2 | LMFG | 0.91 | 0.14 | 6.68 | < .001*** | |
2 | LAntTemp | 0.78 | 0.07 | 11.21 | < .001*** | |
2 | LPostTemp | 0.94 | 0.09 | 9.94 | < .001*** | |
2 | LAngG | 0.49 | 0.09 | 5.51 | < .001*** | |
Constituent-Length × Stimulus Type (real-word vs. Jabberwocky) interaction | 2 | Overall | 0.08 | 0.02 | 3.28 | .004** |
2 | LIFGorb | 0.13 | 0.03 | 4.03 | .004** | |
2 | LIFG | 0.10 | 0.03 | 3.47 | .006** | |
2 | LMFG | 0.07 | 0.03 | 2.31 | .078 | |
2 | LAntTemp | 0.05 | 0.02 | 2.63 | .045* | |
2 | LPostTemp | 0.02 | 0.02 | 0.84 | 1.000 | |
2 | LAngG | 0.09 | 0.02 | 3.68 | .005** | |
Length effect in Experiment 3 (mostly nonconstituents) | 3 | Overall | 0.19 | 0.03 | 5.72 | < .001*** |
3 | LIFGorb | 0.21 | 0.03 | 6.98 | < .001*** | |
3 | LIFG | 0.26 | 0.04 | 6.82 | < .001*** | |
3 | LMFG | 0.18 | 0.03 | 5.59 | < .001*** | |
3 | LAntTemp | 0.20 | 0.02 | 9.87 | < .001*** | |
3 | LPostTemp | 0.23 | 0.03 | 9.04 | < .001*** | |
3 | LAngG | 0.06 | 0.03 | 2.42 | .063 | |
Length effect in Experiment 1 (constituents) vs. Experiment 3 (mostly nonconstituents) | 1 & 3 | Overall | 0.00 | 0.03 | −0.08 | .938 |
1 & 3 | LIFGorb | −0.07 | 0.05 | −1.44 | 1.000 | |
1 & 3 | LIFG | 0.00 | 0.06 | −0.06 | 1.000 | |
1 & 3 | LMFG | −0.01 | 0.05 | −0.15 | 1.000 | |
1 & 3 | LAntTemp | 0.06 | 0.03 | 1.74 | 1.000 | |
1 & 3 | LPostTemp | 0.03 | 0.04 | 0.80 | 1.000 | |
1 & 3 | LAngG | −0.02 | 0.04 | −0.47 | 1.000 | |
Length effect in Experiment 2 (constituents) vs. Experiment 3 (mostly nonconstituents) | 2 & 3 | Overall | 0.01 | 0.03 | 0.25 | .804 |
2 & 3 | LIFGorb | −0.02 | 0.05 | −0.40 | 1.000 | |
2 & 3 | LIFG | 0.02 | 0.06 | 0.43 | 1.000 | |
2 & 3 | LMFG | −0.06 | 0.05 | −1.28 | 1.000 | |
2 & 3 | LAntTemp | 0.06 | 0.03 | 2.29 | .316 | |
2 & 3 | LPostTemp | 0.07 | 0.03 | 2.07 | .316 | |
2 & 3 | LAngG | −0.03 | 0.03 | −0.85 | 1.000 | |
Difference in constituent length effect for real-word conditions | 2 | LAngG vs. LIFGorb | 0.14 | 0.03 | 4.47 | < .001*** |
2 | LAngG vs. LIFG | 0.14 | 0.04 | 3.87 | .002** | |
2 | LAngG vs. LMFG | 0.15 | 0.03 | 5.27 | < .001*** | |
2 | LAngG vs. LAntTemp | 0.05 | 0.02 | 1.91 | .146 | |
2 | LAngG vs. LPostTemp | 0.07 | 0.02 | 3.15 | .009** | |
Difference in constituent length effect for Jabberwocky conditions | 2 | LAngG vs. LIFGorb | 0.10 | 0.02 | 5.20 | < .001*** |
2 | LAngG vs. LIFG | 0.13 | 0.02 | 5.60 | < .001*** | |
2 | LAngG vs. LMFG | 0.17 | 0.03 | 6.16 | < .001*** | |
2 | LAngG vs. LAntTemp | 0.08 | 0.02 | 4.76 | < .001*** | |
2 | LAngG vs. LPostTemp | 0.14 | 0.02 | 6.72 | < .001*** | |
Difference in Constituent-Length × Stimulus Type (real-word vs. Jabberwocky) interaction | 2 | LPostTemp vs. LIFGorb | 0.11 | 0.03 | 4.30 | .002** |
2 | LPostTemp vs. LIFG | 0.08 | 0.03 | 3.27 | .021* |
Contrast . | Experiment . | fROI . | β . | σ(β) . | t . | p . |
---|---|---|---|---|---|---|
Constituent length for real-word conditions | 1 | Overall | 0.19 | 0.04 | 5.34 | < .001*** |
1 | LIFGorb | 0.29 | 0.04 | 6.51 | < .001*** | |
1 | LIFG | 0.26 | 0.04 | 6.39 | < .001*** | |
1 | LMFG | 0.19 | 0.04 | 4.91 | < .001*** | |
1 | LAntTemp | 0.15 | 0.03 | 5.62 | < .001*** | |
1 | LPostTemp | 0.20 | 0.03 | 6.80 | < .001*** | |
1 | LAngG | 0.08 | 0.03 | 3.28 | .013* | |
Constituent length for real-word conditions | 2 | Overall | 0.18 | 0.03 | 5.98 | < .001*** |
2 | LIFGorb | 0.23 | 0.03 | 7.37 | < .001*** | |
2 | LIFG | 0.23 | 0.04 | 6.56 | < .001*** | |
2 | LMFG | 0.25 | 0.03 | 8.20 | < .001*** | |
2 | LAntTemp | 0.14 | 0.02 | 8.18 | < .001*** | |
2 | LPostTemp | 0.16 | 0.02 | 7.72 | < .001*** | |
2 | LAngG | 0.09 | 0.02 | 4.69 | < .001*** | |
Constituent length for Jabberwocky conditions | 2 | Overall | 0.11 | 0.03 | 4.08 | .003** |
2 | LIFGorb | 0.11 | 0.03 | 3.97 | < .001*** | |
2 | LIFG | 0.10 | 0.02 | 4.78 | < .001*** | |
2 | LMFG | 0.13 | 0.02 | 5.48 | < .001*** | |
2 | LAntTemp | 0.18 | 0.03 | 6.97 | < .001*** | |
2 | LPostTemp | 0.09 | 0.02 | 5.58 | < .001*** | |
2 | LAngG | 0.14 | 0.02 | 8.30 | 1.000 | |
Lexicality effect (real-word > Jabberwocky) | 2 | Overall | 0.75 | 0.10 | 7.49 | < .001*** |
2 | LIFGorb | 0.67 | 0.09 | 7.43 | < .001*** | |
2 | LIFG | 0.68 | 0.13 | 5.29 | < .001*** | |
2 | LMFG | 0.91 | 0.14 | 6.68 | < .001*** | |
2 | LAntTemp | 0.78 | 0.07 | 11.21 | < .001*** | |
2 | LPostTemp | 0.94 | 0.09 | 9.94 | < .001*** | |
2 | LAngG | 0.49 | 0.09 | 5.51 | < .001*** | |
Constituent-Length × Stimulus Type (real-word vs. Jabberwocky) interaction | 2 | Overall | 0.08 | 0.02 | 3.28 | .004** |
2 | LIFGorb | 0.13 | 0.03 | 4.03 | .004** | |
2 | LIFG | 0.10 | 0.03 | 3.47 | .006** | |
2 | LMFG | 0.07 | 0.03 | 2.31 | .078 | |
2 | LAntTemp | 0.05 | 0.02 | 2.63 | .045* | |
2 | LPostTemp | 0.02 | 0.02 | 0.84 | 1.000 | |
2 | LAngG | 0.09 | 0.02 | 3.68 | .005** | |
Length effect in Experiment 3 (mostly nonconstituents) | 3 | Overall | 0.19 | 0.03 | 5.72 | < .001*** |
3 | LIFGorb | 0.21 | 0.03 | 6.98 | < .001*** | |
3 | LIFG | 0.26 | 0.04 | 6.82 | < .001*** | |
3 | LMFG | 0.18 | 0.03 | 5.59 | < .001*** | |
3 | LAntTemp | 0.20 | 0.02 | 9.87 | < .001*** | |
3 | LPostTemp | 0.23 | 0.03 | 9.04 | < .001*** | |
3 | LAngG | 0.06 | 0.03 | 2.42 | .063 | |
Length effect in Experiment 1 (constituents) vs. Experiment 3 (mostly nonconstituents) | 1 & 3 | Overall | 0.00 | 0.03 | −0.08 | .938 |
1 & 3 | LIFGorb | −0.07 | 0.05 | −1.44 | 1.000 | |
1 & 3 | LIFG | 0.00 | 0.06 | −0.06 | 1.000 | |
1 & 3 | LMFG | −0.01 | 0.05 | −0.15 | 1.000 | |
1 & 3 | LAntTemp | 0.06 | 0.03 | 1.74 | 1.000 | |
1 & 3 | LPostTemp | 0.03 | 0.04 | 0.80 | 1.000 | |
1 & 3 | LAngG | −0.02 | 0.04 | −0.47 | 1.000 | |
Length effect in Experiment 2 (constituents) vs. Experiment 3 (mostly nonconstituents) | 2 & 3 | Overall | 0.01 | 0.03 | 0.25 | .804 |
2 & 3 | LIFGorb | −0.02 | 0.05 | −0.40 | 1.000 | |
2 & 3 | LIFG | 0.02 | 0.06 | 0.43 | 1.000 | |
2 & 3 | LMFG | −0.06 | 0.05 | −1.28 | 1.000 | |
2 & 3 | LAntTemp | 0.06 | 0.03 | 2.29 | .316 | |
2 & 3 | LPostTemp | 0.07 | 0.03 | 2.07 | .316 | |
2 & 3 | LAngG | −0.03 | 0.03 | −0.85 | 1.000 | |
Difference in constituent length effect for real-word conditions | 2 | LAngG vs. LIFGorb | 0.14 | 0.03 | 4.47 | < .001*** |
2 | LAngG vs. LIFG | 0.14 | 0.04 | 3.87 | .002** | |
2 | LAngG vs. LMFG | 0.15 | 0.03 | 5.27 | < .001*** | |
2 | LAngG vs. LAntTemp | 0.05 | 0.02 | 1.91 | .146 | |
2 | LAngG vs. LPostTemp | 0.07 | 0.02 | 3.15 | .009** | |
Difference in constituent length effect for Jabberwocky conditions | 2 | LAngG vs. LIFGorb | 0.10 | 0.02 | 5.20 | < .001*** |
2 | LAngG vs. LIFG | 0.13 | 0.02 | 5.60 | < .001*** | |
2 | LAngG vs. LMFG | 0.17 | 0.03 | 6.16 | < .001*** | |
2 | LAngG vs. LAntTemp | 0.08 | 0.02 | 4.76 | < .001*** | |
2 | LAngG vs. LPostTemp | 0.14 | 0.02 | 6.72 | < .001*** | |
Difference in Constituent-Length × Stimulus Type (real-word vs. Jabberwocky) interaction | 2 | LPostTemp vs. LIFGorb | 0.11 | 0.03 | 4.30 | .002** |
2 | LPostTemp vs. LIFG | 0.08 | 0.03 | 3.27 | .021* |
p Values are generated by likelihood ratio tests of linear mixed-effects models, with by-fROI results corrected for false discovery rate (FDR) applied over all six fROIs using the Benjamini–Yekutieli procedure (Benjamini & Yekutieli, 2001) using a nominal significance level of α = .05. Starred p values indicate statistical significance under FDR correction (*p ≤ .05, **p ≤ .01, ***p ≤ .001). Significant regions are shown in bold in the fROI column.
Appendix 9: Results Replicate When Using PDD's ROIs as Masks to Define the Language fROIs
This study constrained the participant-specific functional localization procedure using broad masks for language areas that have been validated in prior work (e.g., Fedorenko et al., 2010). As discussed in Appendix 3, five out of six of these masks correspond closely to the ROIs reported in PDD, but the overlap between our masks and PDD's ROIs is not perfect. To ensure that our results are not because of the choice of the particular masks, in this section, we rerun our main analyses using PDD's language ROIs as localizer masks, rather than our standard localizer masks. As shown in Figure A3 and Table A2, results using PDD's parcels as localizer masks are highly similar to those reported using our standard masks in the main article, which indicates that results do not hinge critically on our choice of localizer masks.
Contrast . | Experiment . | fROI . | β . | σ(β) . | t . | p . |
---|---|---|---|---|---|---|
Constituent length for real-word conditions | 1 | Overall | 0.18 | 0.03 | 6.12 | < .001*** |
1 | LIFGorb | 0.26 | 0.05 | 5.48 | < .001*** | |
1 | LIFGtri | 0.24 | 0.05 | 4.86 | .001** | |
1 | LTP | 0.13 | 0.03 | 4.64 | .001** | |
1 | LaSTS | 0.14 | 0.03 | 4.23 | .002** | |
1 | LpSTS | 0.20 | 0.04 | 5.52 | < .001*** | |
1 | LTPJ | 0.12 | 0.03 | 4.26 | .002** | |
Constituent length for real-word conditions | 2 | Overall | 0.15 | 0.02 | 6.28 | < .001*** |
2 | LIFGorb | 0.21 | 0.03 | 6.44 | < .001*** | |
2 | LIFGtri | 0.20 | 0.04 | 5.80 | < .001*** | |
2 | LTP | 0.12 | 0.02 | 6.66 | < .001*** | |
2 | LaSTS | 0.14 | 0.02 | 7.04 | < .001*** | |
2 | LpSTS | 0.16 | 0.02 | 6.63 | < .001*** | |
2 | LTPJ | 0.09 | 0.01 | 5.82 | < .001*** | |
Constituent length for Jabberwocky conditions | 2 | Overall | 0.08 | 0.02 | 3.76 | .003** |
2 | LIFGorb | 0.07 | 0.02 | 3.06 | .012* | |
2 | LIFGtri | 0.09 | 0.02 | 3.69 | .003** | |
2 | LTP | 0.06 | 0.02 | 3.96 | .002** | |
2 | LaSTS | 0.07 | 0.02 | 4.49 | < .001*** | |
2 | LpSTS | 0.14 | 0.02 | 7.90 | < .001*** | |
2 | LTPJ | 0.02 | 0.01 | 1.41 | .389 | |
Lexicality effect (real-word > Jabberwocky) | 2 | Overall | 0.67 | 0.08 | 8.39 | < .001*** |
2 | LIFGorb | 0.60 | 0.09 | 6.73 | < .001*** | |
2 | LIFGtri | 0.60 | 0.13 | 4.53 | < .001*** | |
2 | LTP | 0.65 | 0.08 | 7.62 | < .001*** | |
2 | LaSTS | 0.76 | 0.07 | 10.95 | < .001*** | |
2 | LpSTS | 0.89 | 0.11 | 8.27 | < .001*** | |
2 | LTPJ | 0.51 | 0.07 | 7.58 | < .001*** | |
Constituent-Length × Stimulus Type (real-word vs. Jabberwocky) interaction | 2 | Overall | 0.08 | 0.02 | 3.34 | .005** |
2 | LIFGorb | 0.14 | 0.03 | 4.44 | < .001*** | |
2 | LIFGtri | 0.11 | 0.03 | 3.46 | .006** | |
2 | LTP | 0.06 | 0.02 | 2.67 | .033* | |
2 | LaSTS | 0.07 | 0.02 | 3.19 | .010* | |
2 | LpSTS | 0.01 | 0.03 | 0.51 | 1.000 | |
2 | LTPJ | 0.07 | 0.02 | 4.47 | < .001*** | |
Length effect in Experiment 3 (mostly nonconstituents) | 3 | Overall | 0.18 | 0.03 | 5.53 | < .001*** |
3 | LIFGorb | 0.19 | 0.03 | 7.35 | < .001*** | |
3 | LIFGtri | 0.25 | 0.04 | 6.80 | < .001*** | |
3 | LTP | 0.17 | 0.03 | 6.52 | < .001*** | |
3 | LaSTS | 0.18 | 0.02 | 8.03 | < .001*** | |
3 | LpSTS | 0.23 | 0.03 | 7.92 | < .001*** | |
3 | LTPJ | 0.06 | 0.02 | 2.76 | .030* | |
Length effect in Experiment 1 (constituents) vs. Experiment 3 (mostly nonconstituents) | 1 & 3 | Overall | 0.03 | 0.03 | 0.92 | .733 |
1 & 3 | LIFGorb | −0.02 | 0.05 | −0.44 | 1.000 | |
1 & 3 | LIFGtri | 0.04 | 0.06 | 0.76 | 1.000 | |
1 & 3 | LTP | 0.05 | 0.03 | 1.70 | .864 | |
1 & 3 | LaSTS | 0.04 | 0.03 | 1.22 | .864 | |
1 & 3 | LpSTS | 0.07 | 0.04 | 1.82 | 1.000 | |
1 & 3 | LTPJ | −0.03 | 0.03 | −1.00 | .864 | |
Length effect in Experiment 2 (constituents) vs. Experiment 3 (mostly nonconstituents) | 2 & 3 | Overall | 0.03 | 0.03 | 0.92 | .362 |
2 & 3 | LIFGorb | −0.02 | 0.05 | −0.44 | 1.000 | |
2 & 3 | LIFGtri | 0.04 | 0.06 | 0.76 | 1.000 | |
2 & 3 | LTP | 0.05 | 0.03 | 1.70 | .697 | |
2 & 3 | LaSTS | 0.04 | 0.03 | 1.22 | 1.000 | |
2 & 3 | LpSTS | 0.07 | 0.04 | 1.82 | .697 | |
2 & 3 | LTPJ | −0.03 | 0.03 | −1.00 | 1.000 |
Contrast . | Experiment . | fROI . | β . | σ(β) . | t . | p . |
---|---|---|---|---|---|---|
Constituent length for real-word conditions | 1 | Overall | 0.18 | 0.03 | 6.12 | < .001*** |
1 | LIFGorb | 0.26 | 0.05 | 5.48 | < .001*** | |
1 | LIFGtri | 0.24 | 0.05 | 4.86 | .001** | |
1 | LTP | 0.13 | 0.03 | 4.64 | .001** | |
1 | LaSTS | 0.14 | 0.03 | 4.23 | .002** | |
1 | LpSTS | 0.20 | 0.04 | 5.52 | < .001*** | |
1 | LTPJ | 0.12 | 0.03 | 4.26 | .002** | |
Constituent length for real-word conditions | 2 | Overall | 0.15 | 0.02 | 6.28 | < .001*** |
2 | LIFGorb | 0.21 | 0.03 | 6.44 | < .001*** | |
2 | LIFGtri | 0.20 | 0.04 | 5.80 | < .001*** | |
2 | LTP | 0.12 | 0.02 | 6.66 | < .001*** | |
2 | LaSTS | 0.14 | 0.02 | 7.04 | < .001*** | |
2 | LpSTS | 0.16 | 0.02 | 6.63 | < .001*** | |
2 | LTPJ | 0.09 | 0.01 | 5.82 | < .001*** | |
Constituent length for Jabberwocky conditions | 2 | Overall | 0.08 | 0.02 | 3.76 | .003** |
2 | LIFGorb | 0.07 | 0.02 | 3.06 | .012* | |
2 | LIFGtri | 0.09 | 0.02 | 3.69 | .003** | |
2 | LTP | 0.06 | 0.02 | 3.96 | .002** | |
2 | LaSTS | 0.07 | 0.02 | 4.49 | < .001*** | |
2 | LpSTS | 0.14 | 0.02 | 7.90 | < .001*** | |
2 | LTPJ | 0.02 | 0.01 | 1.41 | .389 | |
Lexicality effect (real-word > Jabberwocky) | 2 | Overall | 0.67 | 0.08 | 8.39 | < .001*** |
2 | LIFGorb | 0.60 | 0.09 | 6.73 | < .001*** | |
2 | LIFGtri | 0.60 | 0.13 | 4.53 | < .001*** | |
2 | LTP | 0.65 | 0.08 | 7.62 | < .001*** | |
2 | LaSTS | 0.76 | 0.07 | 10.95 | < .001*** | |
2 | LpSTS | 0.89 | 0.11 | 8.27 | < .001*** | |
2 | LTPJ | 0.51 | 0.07 | 7.58 | < .001*** | |
Constituent-Length × Stimulus Type (real-word vs. Jabberwocky) interaction | 2 | Overall | 0.08 | 0.02 | 3.34 | .005** |
2 | LIFGorb | 0.14 | 0.03 | 4.44 | < .001*** | |
2 | LIFGtri | 0.11 | 0.03 | 3.46 | .006** | |
2 | LTP | 0.06 | 0.02 | 2.67 | .033* | |
2 | LaSTS | 0.07 | 0.02 | 3.19 | .010* | |
2 | LpSTS | 0.01 | 0.03 | 0.51 | 1.000 | |
2 | LTPJ | 0.07 | 0.02 | 4.47 | < .001*** | |
Length effect in Experiment 3 (mostly nonconstituents) | 3 | Overall | 0.18 | 0.03 | 5.53 | < .001*** |
3 | LIFGorb | 0.19 | 0.03 | 7.35 | < .001*** | |
3 | LIFGtri | 0.25 | 0.04 | 6.80 | < .001*** | |
3 | LTP | 0.17 | 0.03 | 6.52 | < .001*** | |
3 | LaSTS | 0.18 | 0.02 | 8.03 | < .001*** | |
3 | LpSTS | 0.23 | 0.03 | 7.92 | < .001*** | |
3 | LTPJ | 0.06 | 0.02 | 2.76 | .030* | |
Length effect in Experiment 1 (constituents) vs. Experiment 3 (mostly nonconstituents) | 1 & 3 | Overall | 0.03 | 0.03 | 0.92 | .733 |
1 & 3 | LIFGorb | −0.02 | 0.05 | −0.44 | 1.000 | |
1 & 3 | LIFGtri | 0.04 | 0.06 | 0.76 | 1.000 | |
1 & 3 | LTP | 0.05 | 0.03 | 1.70 | .864 | |
1 & 3 | LaSTS | 0.04 | 0.03 | 1.22 | .864 | |
1 & 3 | LpSTS | 0.07 | 0.04 | 1.82 | 1.000 | |
1 & 3 | LTPJ | −0.03 | 0.03 | −1.00 | .864 | |
Length effect in Experiment 2 (constituents) vs. Experiment 3 (mostly nonconstituents) | 2 & 3 | Overall | 0.03 | 0.03 | 0.92 | .362 |
2 & 3 | LIFGorb | −0.02 | 0.05 | −0.44 | 1.000 | |
2 & 3 | LIFGtri | 0.04 | 0.06 | 0.76 | 1.000 | |
2 & 3 | LTP | 0.05 | 0.03 | 1.70 | .697 | |
2 & 3 | LaSTS | 0.04 | 0.03 | 1.22 | 1.000 | |
2 & 3 | LpSTS | 0.07 | 0.04 | 1.82 | .697 | |
2 & 3 | LTPJ | −0.03 | 0.03 | −1.00 | 1.000 |
Significance tests of key contrasts with estimates, standard errors, and t values (respectively columns β, σ(β), and t). p Values are generated by likelihood ratio tests of linear mixed-effects models, with by-fROI results corrected for false discovery rate (FDR) applied over all six fROIs using the Benjamini–Yekutieli procedure (Benjamini & Yekutieli, 2001) using a nominal significance level of α = .05. Starred p values indicate statistical significance under FDR correction (*p ≤ .05, **p ≤ .01, ***p ≤ .001). Significant regions are shown in bold in the fROI column.
Appendix 10: Results Partially Replicate When Using PDD's ROIs as Group-level ROIs
We have advocated our use of participant-specific functional localization as a key methodological advantage of our study relative to PDD's (see Introduction and Methods sections of the main article). However, does this design choice impact results? Here, we investigate this question by following PDD's precedent and averaging responses across all voxels within each of PDD's parcels, without functional localization of participant-specific language areas. This approach thus uses the same set of voxels in all participants and does not account for interindividual variation in the precise locations of language areas.
Resulting estimates, plotted in Figure A4, are similar in important ways to those reported in the main article: In each region, responses increase (numerically) with chunk length in the real-word conditions as well as in the Jabberwocky conditions, albeit more weakly. However, as expected based on prior evidence (Fedorenko et al., 2010), sensitivity to all effects is greatly attenuated when using group-level ROIs as opposed to individual-level fROIs (Figure A4 and Table A3, see Figure A5 for the same visualizations with tighter y axes, for legibility); the only difference between the effects in Figure A4 and the comparatively stronger effects in Figure A3 is that the former averages over the entire parcel whereas the latter averages only over the 10% of the parcel that responds most strongly to the language localizer (as determined based on each individual map for the localizer contrast). This attenuation of effects is to be expected when group-level ROIs are used given that—for any given participant—only a subset of the ROI may belong to the language network and the ROI may therefore include voxels that are not language-responsive. As a concrete example: When using fROIs, all regions (with the exception of the LAngG language fROI) show a length effect in the Jabberwocky conditions (Table A2), but when using group-level ROIs, only the LpSTS shows a length effect in Jabberwocky conditions (Table A3).
Contrast . | Experiment . | fROI . | β . | σ(β) . | t . | p . |
---|---|---|---|---|---|---|
Constituent length for real-word conditions | 1 | Overall | 0.08 | 0.02 | 4.70 | < .001*** |
1 | LIFGorb | 0.10 | 0.03 | 2.90 | .028* | |
1 | LIFGtri | 0.13 | 0.03 | 3.86 | .013* | |
1 | LTP | 0.06 | 0.02 | 3.23 | .018* | |
1 | LaSTS | 0.05 | 0.02 | 3.24 | .018* | |
1 | LpSTS | 0.09 | 0.02 | 4.79 | .004** | |
1 | LTPJ | 0.04 | 0.01 | 3.38 | .018* | |
Constituent length for real-word conditions | 2 | Overall | 0.06 | 0.01 | 4.92 | < .001*** |
2 | LIFGorb | 0.08 | 0.02 | 4.75 | < .001*** | |
2 | LIFGtri | 0.10 | 0.02 | 4.97 | < .001*** | |
2 | LTP | 0.05 | 0.01 | 4.43 | < .001*** | |
2 | LaSTS | 0.05 | 0.01 | 6.88 | < .001*** | |
2 | LpSTS | 0.07 | 0.01 | 5.71 | < .001*** | |
2 | LTPJ | 0.03 | 0.01 | 2.92 | .014* | |
Constituent length for Jabberwocky conditions | 2 | Overall | 0.02 | 0.01 | 1.97 | .061 |
2 | LIFGorb | 0.02 | 0.02 | 1.10 | 1.000 | |
2 | LIFGtri | 0.01 | 0.02 | 0.37 | 1.000 | |
2 | LTP | 0.02 | 0.01 | 2.05 | .229 | |
2 | LaSTS | 0.02 | 0.01 | 2.29 | .200 | |
2 | LpSTS | 0.06 | 0.01 | 5.53 | < .001*** | |
2 | LTPJ | 0.01 | 0.01 | 0.60 | 1.000 | |
Lexicality effect (real-word > Jabberwocky) | 2 | Overall | 0.23 | 0.05 | 4.70 | < .001*** |
2 | LIFGorb | 0.13 | 0.06 | 2.26 | .086 | |
2 | LIFGtri | 0.13 | 0.08 | 1.60 | .289 | |
2 | LTP | 0.27 | 0.04 | 6.70 | < .001*** | |
2 | LaSTS | 0.31 | 0.03 | 8.98 | < .001*** | |
2 | LpSTS | 0.37 | 0.06 | 6.19 | < .001*** | |
2 | LTPJ | 0.18 | 0.03 | 5.12 | < .001*** | |
Constituent-Length × Stimulus Type (real-word vs. Jabberwocky) interaction | 2 | Overall | 0.04 | 0.01 | 2.72 | .017* |
2 | LIFGorb | 0.06 | 0.02 | 3.40 | .012* | |
2 | LIFGtri | 0.09 | 0.02 | 3.96 | .005** | |
2 | LTP | 0.02 | 0.01 | 1.60 | .343 | |
2 | LaSTS | 0.03 | 0.01 | 2.92 | .028* | |
2 | LpSTS | 0.01 | 0.01 | 0.80 | 1.000 | |
2 | LTPJ | 0.02 | 0.01 | 1.91 | .232 | |
Length effect in Experiment 3 (mostly nonconstituents) | 3 | Overall | 0.06 | 0.02 | 3.96 | .003** |
3 | LIFGorb | 0.05 | 0.01 | 4.03 | .002** | |
3 | LIFGtri | 0.10 | 0.02 | 4.68 | < .001*** | |
3 | LTP | 0.06 | 0.02 | 4.14 | .002** | |
3 | LaSTS | 0.07 | 0.01 | 6.94 | < .001*** | |
3 | LpSTS | 0.10 | 0.02 | 6.16 | < .001*** | |
3 | LTPJ | 0.01 | 0.01 | 0.43 | 1.000 | |
Length effect in Experiment 1 (constituents) vs. Experiment 3 (mostly nonconstituents) | 1 & 3 | Overall | −0.01 | 0.02 | −0.88 | .383 |
1 & 3 | LIFGorb | −0.05 | 0.03 | −1.42 | 1.000 | |
1 & 3 | LIFGtri | −0.03 | 0.04 | −0.74 | 1.000 | |
1 & 3 | LTP | 0.00 | 0.02 | 0.01 | 1.000 | |
1 & 3 | LaSTS | 0.01 | 0.02 | 0.72 | 1.000 | |
1 & 3 | LpSTS | 0.01 | 0.02 | 0.49 | 1.000 | |
1 & 3 | LTPJ | −0.04 | 0.02 | −2.19 | .525 | |
Length effect in Experiment 2 (constituents) vs. Experiment 3 (mostly nonconstituents) | 2 & 3 | Overall | 0.00 | 0.02 | 0.12 | .908 |
2 & 3 | LIFGorb | −0.03 | 0.03 | −1.16 | 1.000 | |
2 & 3 | LIFGtri | 0.01 | 0.03 | 0.28 | 1.000 | |
2 & 3 | LTP | 0.02 | 0.02 | 0.90 | 1.000 | |
2 & 3 | LaSTS | 0.01 | 0.01 | 1.05 | 1.000 | |
2 & 3 | LpSTS | 0.03 | 0.02 | 1.23 | 1.000 | |
2 & 3 | LTPJ | −0.02 | 0.02 | −1.47 | 1.000 |
Contrast . | Experiment . | fROI . | β . | σ(β) . | t . | p . |
---|---|---|---|---|---|---|
Constituent length for real-word conditions | 1 | Overall | 0.08 | 0.02 | 4.70 | < .001*** |
1 | LIFGorb | 0.10 | 0.03 | 2.90 | .028* | |
1 | LIFGtri | 0.13 | 0.03 | 3.86 | .013* | |
1 | LTP | 0.06 | 0.02 | 3.23 | .018* | |
1 | LaSTS | 0.05 | 0.02 | 3.24 | .018* | |
1 | LpSTS | 0.09 | 0.02 | 4.79 | .004** | |
1 | LTPJ | 0.04 | 0.01 | 3.38 | .018* | |
Constituent length for real-word conditions | 2 | Overall | 0.06 | 0.01 | 4.92 | < .001*** |
2 | LIFGorb | 0.08 | 0.02 | 4.75 | < .001*** | |
2 | LIFGtri | 0.10 | 0.02 | 4.97 | < .001*** | |
2 | LTP | 0.05 | 0.01 | 4.43 | < .001*** | |
2 | LaSTS | 0.05 | 0.01 | 6.88 | < .001*** | |
2 | LpSTS | 0.07 | 0.01 | 5.71 | < .001*** | |
2 | LTPJ | 0.03 | 0.01 | 2.92 | .014* | |
Constituent length for Jabberwocky conditions | 2 | Overall | 0.02 | 0.01 | 1.97 | .061 |
2 | LIFGorb | 0.02 | 0.02 | 1.10 | 1.000 | |
2 | LIFGtri | 0.01 | 0.02 | 0.37 | 1.000 | |
2 | LTP | 0.02 | 0.01 | 2.05 | .229 | |
2 | LaSTS | 0.02 | 0.01 | 2.29 | .200 | |
2 | LpSTS | 0.06 | 0.01 | 5.53 | < .001*** | |
2 | LTPJ | 0.01 | 0.01 | 0.60 | 1.000 | |
Lexicality effect (real-word > Jabberwocky) | 2 | Overall | 0.23 | 0.05 | 4.70 | < .001*** |
2 | LIFGorb | 0.13 | 0.06 | 2.26 | .086 | |
2 | LIFGtri | 0.13 | 0.08 | 1.60 | .289 | |
2 | LTP | 0.27 | 0.04 | 6.70 | < .001*** | |
2 | LaSTS | 0.31 | 0.03 | 8.98 | < .001*** | |
2 | LpSTS | 0.37 | 0.06 | 6.19 | < .001*** | |
2 | LTPJ | 0.18 | 0.03 | 5.12 | < .001*** | |
Constituent-Length × Stimulus Type (real-word vs. Jabberwocky) interaction | 2 | Overall | 0.04 | 0.01 | 2.72 | .017* |
2 | LIFGorb | 0.06 | 0.02 | 3.40 | .012* | |
2 | LIFGtri | 0.09 | 0.02 | 3.96 | .005** | |
2 | LTP | 0.02 | 0.01 | 1.60 | .343 | |
2 | LaSTS | 0.03 | 0.01 | 2.92 | .028* | |
2 | LpSTS | 0.01 | 0.01 | 0.80 | 1.000 | |
2 | LTPJ | 0.02 | 0.01 | 1.91 | .232 | |
Length effect in Experiment 3 (mostly nonconstituents) | 3 | Overall | 0.06 | 0.02 | 3.96 | .003** |
3 | LIFGorb | 0.05 | 0.01 | 4.03 | .002** | |
3 | LIFGtri | 0.10 | 0.02 | 4.68 | < .001*** | |
3 | LTP | 0.06 | 0.02 | 4.14 | .002** | |
3 | LaSTS | 0.07 | 0.01 | 6.94 | < .001*** | |
3 | LpSTS | 0.10 | 0.02 | 6.16 | < .001*** | |
3 | LTPJ | 0.01 | 0.01 | 0.43 | 1.000 | |
Length effect in Experiment 1 (constituents) vs. Experiment 3 (mostly nonconstituents) | 1 & 3 | Overall | −0.01 | 0.02 | −0.88 | .383 |
1 & 3 | LIFGorb | −0.05 | 0.03 | −1.42 | 1.000 | |
1 & 3 | LIFGtri | −0.03 | 0.04 | −0.74 | 1.000 | |
1 & 3 | LTP | 0.00 | 0.02 | 0.01 | 1.000 | |
1 & 3 | LaSTS | 0.01 | 0.02 | 0.72 | 1.000 | |
1 & 3 | LpSTS | 0.01 | 0.02 | 0.49 | 1.000 | |
1 & 3 | LTPJ | −0.04 | 0.02 | −2.19 | .525 | |
Length effect in Experiment 2 (constituents) vs. Experiment 3 (mostly nonconstituents) | 2 & 3 | Overall | 0.00 | 0.02 | 0.12 | .908 |
2 & 3 | LIFGorb | −0.03 | 0.03 | −1.16 | 1.000 | |
2 & 3 | LIFGtri | 0.01 | 0.03 | 0.28 | 1.000 | |
2 & 3 | LTP | 0.02 | 0.02 | 0.90 | 1.000 | |
2 & 3 | LaSTS | 0.01 | 0.01 | 1.05 | 1.000 | |
2 & 3 | LpSTS | 0.03 | 0.02 | 1.23 | 1.000 | |
2 & 3 | LTPJ | −0.02 | 0.02 | −1.47 | 1.000 |
Contrast . | Experiment . | fROI . | β . | σ(β) . | t . | p . |
---|---|---|---|---|---|---|
Constituent length for real-word conditions | 1 | Overall | 0.06 | 0.02 | 3.53 | .006** |
1 | RIFGorb | 0.09 | 0.03 | 2.69 | .065 | |
1 | RIFG | 0.10 | 0.02 | 4.23 | .006** | |
1 | RMFG | 0.03 | 0.02 | 1.62 | .375 | |
1 | RAntTemp | 0.08 | 0.02 | 4.96 | .003** | |
1 | RPostTemp | 0.05 | 0.01 | 3.94 | .007** | |
1 | RAngG | 0.00 | 0.01 | 0.39 | 1.000 | |
Constituent length for real-word conditions | 2 | Overall | 0.07 | 0.02 | 3.89 | < .001*** |
2 | RIFGorb | 0.07 | 0.03 | 2.66 | .028* | |
2 | RIFG | 0.11 | 0.03 | 3.23 | .012* | |
2 | RMFG | 0.08 | 0.02 | 3.01 | .017* | |
2 | RAntTemp | 0.07 | 0.01 | 5.74 | < .001*** | |
2 | RPostTemp | 0.07 | 0.02 | 3.79 | .004** | |
2 | RAngG | 0.04 | 0.02 | 2.66 | .028* | |
Constituent length for Jabberwocky conditions | 2 | Overall | 0.02 | 0.01 | 1.26 | .214 |
2 | RIFGorb | 0.02 | 0.02 | 0.98 | 1.000 | |
2 | RIFG | 0.01 | 0.02 | 0.29 | 1.000 | |
2 | RMFG | 0.00 | 0.02 | −0.05 | 1.000 | |
2 | RAntTemp | 0.03 | 0.01 | 2.89 | .046* | |
2 | RPostTemp | 0.04 | 0.01 | 3.16 | .045* | |
2 | RAngG | 0.00 | 0.02 | 0.13 | 1.000 | |
Lexicality effect (real-word > Jabberwocky) | 2 | Overall | 0.22 | 0.07 | 3.29 | .004** |
2 | RIFGorb | 0.24 | 0.06 | 3.80 | .002** | |
2 | RIFG | 0.09 | 0.09 | 1.02 | .773 | |
2 | RMFG | 0.10 | 0.09 | 1.17 | .737 | |
2 | RAntTemp | 0.35 | 0.04 | 7.88 | < .001*** | |
2 | RPostTemp | 0.32 | 0.06 | 5.00 | < .001*** | |
2 | RAngG | 0.18 | 0.07 | 2.59 | .049* | |
Constituent-Length × Stimulus Type (real-word vs. Jabberwocky) interaction | 2 | Overall | 0.05 | 0.02 | 2.78 | .011* |
2 | RIFGorb | 0.05 | 0.03 | 1.78 | .242 | |
2 | RIFG | 0.11 | 0.03 | 3.50 | .017* | |
2 | RMFG | 0.08 | 0.03 | 2.97 | .038* | |
2 | RAntTemp | 0.03 | 0.02 | 2.07 | .219 | |
2 | RPostTemp | 0.02 | 0.02 | 1.09 | .694 | |
2 | RAngG | 0.04 | 0.02 | 1.83 | .242 | |
Length effect in Experiment 3 (mostly nonconstituents) | 3 | Overall | 0.08 | 0.02 | 3.43 | .005** |
3 | RIFGorb | 0.09 | 0.02 | 3.69 | .006** | |
3 | RIFG | 0.11 | 0.02 | 4.62 | < .001*** | |
3 | RMFG | 0.06 | 0.03 | 2.14 | .133 | |
3 | RAntTemp | 0.11 | 0.02 | 6.81 | < .001*** | |
3 | RPostTemp | 0.10 | 0.02 | 5.85 | < .001*** | |
3 | RAngG | −0.01 | 0.02 | −0.33 | 1.000 |
Contrast . | Experiment . | fROI . | β . | σ(β) . | t . | p . |
---|---|---|---|---|---|---|
Constituent length for real-word conditions | 1 | Overall | 0.06 | 0.02 | 3.53 | .006** |
1 | RIFGorb | 0.09 | 0.03 | 2.69 | .065 | |
1 | RIFG | 0.10 | 0.02 | 4.23 | .006** | |
1 | RMFG | 0.03 | 0.02 | 1.62 | .375 | |
1 | RAntTemp | 0.08 | 0.02 | 4.96 | .003** | |
1 | RPostTemp | 0.05 | 0.01 | 3.94 | .007** | |
1 | RAngG | 0.00 | 0.01 | 0.39 | 1.000 | |
Constituent length for real-word conditions | 2 | Overall | 0.07 | 0.02 | 3.89 | < .001*** |
2 | RIFGorb | 0.07 | 0.03 | 2.66 | .028* | |
2 | RIFG | 0.11 | 0.03 | 3.23 | .012* | |
2 | RMFG | 0.08 | 0.02 | 3.01 | .017* | |
2 | RAntTemp | 0.07 | 0.01 | 5.74 | < .001*** | |
2 | RPostTemp | 0.07 | 0.02 | 3.79 | .004** | |
2 | RAngG | 0.04 | 0.02 | 2.66 | .028* | |
Constituent length for Jabberwocky conditions | 2 | Overall | 0.02 | 0.01 | 1.26 | .214 |
2 | RIFGorb | 0.02 | 0.02 | 0.98 | 1.000 | |
2 | RIFG | 0.01 | 0.02 | 0.29 | 1.000 | |
2 | RMFG | 0.00 | 0.02 | −0.05 | 1.000 | |
2 | RAntTemp | 0.03 | 0.01 | 2.89 | .046* | |
2 | RPostTemp | 0.04 | 0.01 | 3.16 | .045* | |
2 | RAngG | 0.00 | 0.02 | 0.13 | 1.000 | |
Lexicality effect (real-word > Jabberwocky) | 2 | Overall | 0.22 | 0.07 | 3.29 | .004** |
2 | RIFGorb | 0.24 | 0.06 | 3.80 | .002** | |
2 | RIFG | 0.09 | 0.09 | 1.02 | .773 | |
2 | RMFG | 0.10 | 0.09 | 1.17 | .737 | |
2 | RAntTemp | 0.35 | 0.04 | 7.88 | < .001*** | |
2 | RPostTemp | 0.32 | 0.06 | 5.00 | < .001*** | |
2 | RAngG | 0.18 | 0.07 | 2.59 | .049* | |
Constituent-Length × Stimulus Type (real-word vs. Jabberwocky) interaction | 2 | Overall | 0.05 | 0.02 | 2.78 | .011* |
2 | RIFGorb | 0.05 | 0.03 | 1.78 | .242 | |
2 | RIFG | 0.11 | 0.03 | 3.50 | .017* | |
2 | RMFG | 0.08 | 0.03 | 2.97 | .038* | |
2 | RAntTemp | 0.03 | 0.02 | 2.07 | .219 | |
2 | RPostTemp | 0.02 | 0.02 | 1.09 | .694 | |
2 | RAngG | 0.04 | 0.02 | 1.83 | .242 | |
Length effect in Experiment 3 (mostly nonconstituents) | 3 | Overall | 0.08 | 0.02 | 3.43 | .005** |
3 | RIFGorb | 0.09 | 0.02 | 3.69 | .006** | |
3 | RIFG | 0.11 | 0.02 | 4.62 | < .001*** | |
3 | RMFG | 0.06 | 0.03 | 2.14 | .133 | |
3 | RAntTemp | 0.11 | 0.02 | 6.81 | < .001*** | |
3 | RPostTemp | 0.10 | 0.02 | 5.85 | < .001*** | |
3 | RAngG | −0.01 | 0.02 | −0.33 | 1.000 |
p Values are generated by likelihood ratio tests of linear mixed-effects models, with by-fROI results corrected for false discovery rate (FDR) applied over all six fROIs using the Benjamini–Yekutieli procedure (Benjamini & Yekutieli, 2001) using a nominal significance level of α = .05. Starred p values indicate statistical significance under FDR correction (*p ≤ .05, **p ≤ .01, ***p ≤ .001). Significant regions are shown in bold in the fROI column.
Contrast . | Experiment . | fROI . | β . | σ(β) . | t . | p . |
---|---|---|---|---|---|---|
Laterality difference of constituent length for real-word conditions | 1 | Overall | 0.14 | 0.03 | 4.32 | < .001*** |
1 | LIFGorb | 0.20 | 0.05 | 3.71 | .011* | |
1 | LIFG | 0.16 | 0.04 | 3.81 | .011* | |
1 | LMFG | 0.16 | 0.05 | 3.42 | .015* | |
1 | LAntTemp | 0.07 | 0.02 | 3.19 | .019* | |
1 | LPostTemp | 0.15 | 0.03 | 5.75 | < .001*** | |
1 | LAngG | 0.08 | 0.03 | 3.08 | .020* | |
Laterality difference of constituent length for real-word conditions | 2 | Overall | 0.11 | 0.02 | 4.96 | < .001*** |
2 | LIFGorb | 0.17 | 0.03 | 6.50 | < .001*** | |
2 | LIFG | 0.12 | 0.03 | 3.74 | .002** | |
2 | LMFG | 0.17 | 0.02 | 7.00 | < .001*** | |
2 | LAntTemp | 0.07 | 0.02 | 4.67 | < .001*** | |
2 | LPostTemp | 0.09 | 0.01 | 6.30 | < .001*** | |
2 | LAngG | 0.05 | 0.02 | 2.81 | .019* | |
Laterality difference of constituent length for Jabberwocky conditions | 2 | Overall | 0.09 | 0.03 | 3.50 | .007** |
2 | LIFGorb | 0.08 | 0.02 | 3.99 | < .001*** | |
2 | LIFG | 0.13 | 0.02 | 5.50 | < .001*** | |
2 | LMFG | 0.18 | 0.03 | 6.61 | < .001*** | |
2 | LAntTemp | 0.05 | 0.01 | 3.98 | < .001*** | |
2 | LPostTemp | 0.09 | 0.01 | 6.38 | < .001*** | |
2 | LAngG | 0.00 | 0.02 | 0.05 | 1.000 | |
Laterality difference of lexicality effect (real-word > Jabberwocky) | 2 | Overall | 0.53 | 0.08 | 6.46 | < .001*** |
2 | LIFGorb | 0.44 | 0.09 | 5.05 | < .001*** | |
2 | LIFG | 0.59 | 0.11 | 5.22 | < .001*** | |
2 | LMFG | 0.80 | 0.09 | 9.01 | < .001*** | |
2 | LAntTemp | 0.42 | 0.06 | 6.67 | < .001*** | |
2 | LPostTemp | 0.62 | 0.08 | 8.20 | < .001*** | |
2 | LAngG | 0.31 | 0.07 | 4.19 | < .001*** | |
Laterality difference of Constituent-Length × Stimulus Type (real-word vs. Jabberwocky) interaction | 2 | Overall | 0.02 | 0.02 | 1.42 | .173 |
2 | LIFGorb | 0.08 | 0.03 | 2.97 | .075 | |
2 | LIFG | 0.00 | 0.03 | −0.10 | 1.000 | |
2 | LMFG | −0.01 | 0.03 | −0.23 | 1.000 | |
2 | LAntTemp | 0.02 | 0.02 | 1.20 | 1.000 | |
2 | LPostTemp | 0.00 | 0.02 | −0.05 | 1.000 | |
2 | LAngG | 0.05 | 0.03 | 1.81 | .569 | |
Laterality difference of length effect in Experiment 3 (mostly nonconstituents) | 3 | Overall | 0.12 | 0.02 | 5.59 | < .001*** |
3 | LIFGorb | 0.13 | 0.03 | 5.07 | < .001*** | |
3 | LIFG | 0.15 | 0.03 | 5.04 | < .001*** | |
3 | LMFG | 0.12 | 0.03 | 4.46 | < .001*** | |
3 | LAntTemp | 0.10 | 0.02 | 5.35 | < .001*** | |
3 | LPostTemp | 0.13 | 0.02 | 5.92 | < .001*** | |
3 | LAngG | 0.07 | 0.03 | 2.52 | .051 |
Contrast . | Experiment . | fROI . | β . | σ(β) . | t . | p . |
---|---|---|---|---|---|---|
Laterality difference of constituent length for real-word conditions | 1 | Overall | 0.14 | 0.03 | 4.32 | < .001*** |
1 | LIFGorb | 0.20 | 0.05 | 3.71 | .011* | |
1 | LIFG | 0.16 | 0.04 | 3.81 | .011* | |
1 | LMFG | 0.16 | 0.05 | 3.42 | .015* | |
1 | LAntTemp | 0.07 | 0.02 | 3.19 | .019* | |
1 | LPostTemp | 0.15 | 0.03 | 5.75 | < .001*** | |
1 | LAngG | 0.08 | 0.03 | 3.08 | .020* | |
Laterality difference of constituent length for real-word conditions | 2 | Overall | 0.11 | 0.02 | 4.96 | < .001*** |
2 | LIFGorb | 0.17 | 0.03 | 6.50 | < .001*** | |
2 | LIFG | 0.12 | 0.03 | 3.74 | .002** | |
2 | LMFG | 0.17 | 0.02 | 7.00 | < .001*** | |
2 | LAntTemp | 0.07 | 0.02 | 4.67 | < .001*** | |
2 | LPostTemp | 0.09 | 0.01 | 6.30 | < .001*** | |
2 | LAngG | 0.05 | 0.02 | 2.81 | .019* | |
Laterality difference of constituent length for Jabberwocky conditions | 2 | Overall | 0.09 | 0.03 | 3.50 | .007** |
2 | LIFGorb | 0.08 | 0.02 | 3.99 | < .001*** | |
2 | LIFG | 0.13 | 0.02 | 5.50 | < .001*** | |
2 | LMFG | 0.18 | 0.03 | 6.61 | < .001*** | |
2 | LAntTemp | 0.05 | 0.01 | 3.98 | < .001*** | |
2 | LPostTemp | 0.09 | 0.01 | 6.38 | < .001*** | |
2 | LAngG | 0.00 | 0.02 | 0.05 | 1.000 | |
Laterality difference of lexicality effect (real-word > Jabberwocky) | 2 | Overall | 0.53 | 0.08 | 6.46 | < .001*** |
2 | LIFGorb | 0.44 | 0.09 | 5.05 | < .001*** | |
2 | LIFG | 0.59 | 0.11 | 5.22 | < .001*** | |
2 | LMFG | 0.80 | 0.09 | 9.01 | < .001*** | |
2 | LAntTemp | 0.42 | 0.06 | 6.67 | < .001*** | |
2 | LPostTemp | 0.62 | 0.08 | 8.20 | < .001*** | |
2 | LAngG | 0.31 | 0.07 | 4.19 | < .001*** | |
Laterality difference of Constituent-Length × Stimulus Type (real-word vs. Jabberwocky) interaction | 2 | Overall | 0.02 | 0.02 | 1.42 | .173 |
2 | LIFGorb | 0.08 | 0.03 | 2.97 | .075 | |
2 | LIFG | 0.00 | 0.03 | −0.10 | 1.000 | |
2 | LMFG | −0.01 | 0.03 | −0.23 | 1.000 | |
2 | LAntTemp | 0.02 | 0.02 | 1.20 | 1.000 | |
2 | LPostTemp | 0.00 | 0.02 | −0.05 | 1.000 | |
2 | LAngG | 0.05 | 0.03 | 1.81 | .569 | |
Laterality difference of length effect in Experiment 3 (mostly nonconstituents) | 3 | Overall | 0.12 | 0.02 | 5.59 | < .001*** |
3 | LIFGorb | 0.13 | 0.03 | 5.07 | < .001*** | |
3 | LIFG | 0.15 | 0.03 | 5.04 | < .001*** | |
3 | LMFG | 0.12 | 0.03 | 4.46 | < .001*** | |
3 | LAntTemp | 0.10 | 0.02 | 5.35 | < .001*** | |
3 | LPostTemp | 0.13 | 0.02 | 5.92 | < .001*** | |
3 | LAngG | 0.07 | 0.03 | 2.52 | .051 |
p Values are generated by likelihood ratio tests of linear mixed-effects models, with by-fROI results corrected for false discovery rate (FDR) applied over all six fROIs using the Benjamini–Yekutieli procedure (Benjamini & Yekutieli, 2001) using a nominal significance level of α = .05. Starred p values indicate statistical significance under FDR correction (*p ≤ .05, **p ≤ .01, ***p ≤ .001). Significant regions are shown in bold in the fROI column.
These reanalyses of our data using group-level ROIs differ from our main results (where individual-level functional ROIs are used) in the presence of several false negatives, which result from the lower sensitivity of group-level analyses (see also Fedorenko, Nieto-Castañón, et al., 2012; Nieto-Castañón & Fedorenko, 2012). These differences yield some outcomes that are more similar to those reported by PDD. In particular, using group-level ROIs, inferior frontal regions show no main effects of lexicality, and anterior temporal and temporoparietal regions show no significant Jabberwocky effects. Thus, differences between our main findings and those of PDD are plausibly due in part to differences in analysis methods, such that some of our findings emerge only when a more sensitive analytic approach is adopted, which takes interindividual variability in functional topography into account. However, the use of group-level ROIs does not fully explain the differences between our study and PDD's. For example, even when using the group-level ROIs (the same ROIs used by PDD), inferior frontal areas in our data show a significant length by lexicality interaction, such that responses increase more steeply with chunk length in real-word conditions compared with Jabberwocky conditions. This outcome is inconsistent with PDD's interpretation that inferior frontal areas belong to an abstract syntax network. Therefore, in addition to evidence of differences driven by analytic choices, we also find straightforward replication failures: A key finding reported by PDD does not appear to hold in our sample, even when the analyses are closely matched.
Appendix 11: A Comparison of the Overlapping Sets of Conditions between Our Earlier Work (Fedorenko et al., 2010) and Experiment 2 in the Current Article
PDD conditions c12, c01, jab-c12, and jab-c01 correspond respectively to the sentence (S), word list (W), Jabberwocky (J), and nonword list (N) conditions that have been investigated in several prior studies, including by our group (Fedorenko et al., 2010). As shown in Figure A6, the pattern that we observed in Experiment 2 in the current study for this subset of conditions is remarkably similar to the patterns reported for Experiments 1 and 2 in Fedorenko and colleagues (2010); note that the difference in the overall response magnitude between the three experiments is most likely because of the fact that Experiment 1 in Fedorenko and colleagues (2010) and the current Experiment 2 used 12-word/nonword-long materials, and Experiment 2 in Fedorenko and colleagues (2010) used eight-word/nonword-long materials. Current Experiment 2 therefore constitutes a third within-laboratory replication—all with different sets of materials and non-overlapping sets of participants—of the pattern whereby sentences elicit the strongest response, word lists and Jabberwocky sentences intermediate response, and nonword lists the lowest response (see, e.g., Bedny et al., 2011, for another fMRI replication; see Fedorenko et al., 2016, for a replication in using electrocorticography). As discussed elsewhere, including in the main text, this pattern suggests that all the regions of the language network support both the processing of word meanings and combinatorial structure building.
Appendix 12: Analysis of Right Hemisphere Homotopic Regions
We have thus far followed PDD in exclusively analyzing LH language regions. In light of growing interest in the contribution of the right hemisphere (RH) to language processing (Martin et al., 2022), in this section, we include exploratory analyses of the key patterns within the RH homotopic language regions. Following, for example, Shain, Paunov, Chen, and colleagues (2023), we define these regions by first projecting the mirror images of our LH localizer masks onto the RH and then following the same functional localization procedure used in the main analyses (i.e., selecting the top 10% most responsive voxels to the sentences > nonwords contrast during the localizer task). This approach allows asymmetric patterns of activation across hemispheres at the individual level while continuing to ensure functionally comparable ROIs both within individuals (between hemispheres) and between individuals.
Acknowledgments
We acknowledge the Athinoula A. Martinos Imaging Center at the McGovern Institute for Brain Research, MIT. For technical support during scanning, we thank Steve Shannon and Atsushi Takahashi. We thank Tamar Regev for help with the ROI projection figures, the audience at the Neurobiology of Language conference in 2020 for helpful discussions of this work, and Rebecca Saxe, Ted Gibson, Nancy Kanwisher, Christophe Pallier, and Stan Dehaene for comments on the article. We also thank Christophe Pallier for sharing the ROI files.
Corresponding author: Cory Shain, Department of Brain and Cognitive Sciences, MIT, Cambridge, MA 02139, or via e-mail: [email protected].
Data Availability Statement
All data needed to evaluate the conclusions in the article are publicly available on OSF: https://osf.io/fduve/.
Author Contributions
Cory Shain: Conceptualization; Formal analysis; Investigation; Methodology; Software; Visualization; Writing—Original; Writing—Review & editing. Hope Kean: Conceptualization; Data curation; Formal analysis; Investigation; Methodology; Writing—Review & editing. Colton Casto: Data curation; Formal analysis; Investigation; Methodology; Writing—Review & editing. Benjamin Lipkin: Investigation; Methodology. Josef Affourtit: Investigation; Methodology. Matthew Siegelman: Investigation; Methodology. Francis Mollica: Conceptualization; Methodology; Project administration; Supervision; Writing—Review & editing. Evelina Fedorenko: Conceptualization; Funding acquisition; Investigation; Methodology; Project administration; Supervision; Writing—Original draft; Writing—Review & editing.
Funding Information
E. F. was supported by National Institutes of Health (https://dx.doi.org/10.13039/100000002), grant numbers: R01-DC016607, R01-DC016950, and U01-NS121471, and by the funds from the McGovern Institute for Brain Research, Brain and Cognitive Sciences Department (https://dx.doi.org/10.13039/100019335), and the Simons Center for the Social Brain (https://dx.doi.org/10.13039/100018792).
Diversity in Citation Practices
Retrospective analysis of the citations in every article published in this journal from 2010 to 2021 reveals a persistent pattern of gender imbalance: Although the proportions of authorship teams (categorized by estimated gender identification of first author/last author) publishing in the Journal of Cognitive Neuroscience (JoCN) during this period were M(an)/M = .407, W(oman)/M = .32, M/W = .115, and W/W = .159, the comparable proportions for the articles that these authorship teams cited were M/M = .549, W/M = .257, M/W = .109, and W/W = .085 (Postle and Fulvio, JoCN, 34:1, pp. 1–3). Consequently, JoCN encourages all authors to consider gender balance explicitly when selecting which articles to cite and gives them the opportunity to report their article's gender citation balance.
REFERENCES
Author notes
Equal contribution.
Co-senior authorship.