Abstract
The capacity for language is a defining property of our species, yet despite decades of research, evidence on its neural basis is still mixed and a generalized consensus is difficult to achieve. We suggest that this is partly caused by researchers defining “language” in different ways, with focus on a wide range of phenomena, properties, and levels of investigation. Accordingly, there is very little agreement among cognitive neuroscientists of language on the operationalization of fundamental concepts to be investigated in neuroscientific experiments. Here, we review chains of derivation in the cognitive neuroscience of language, focusing on how the hypothesis under consideration is defined by a combination of theoretical and methodological assumptions. We first attempt to disentangle the complex relationship between linguistics, psychology, and neuroscience in the field. Next, we focus on how conclusions that can be drawn from any experiment are inherently constrained by auxiliary assumptions, both theoretical and methodological, on which the validity of conclusions drawn rests. These issues are discussed in the context of classical experimental manipulations as well as study designs that employ novel approaches such as naturalistic stimuli and computational modeling. We conclude by proposing that a highly interdisciplinary field such as the cognitive neuroscience of language requires researchers to form explicit statements concerning the theoretical definitions, methodological choices, and other constraining factors involved in their work.
INTRODUCTION
The field of cognitive neuroscience of language aims to understand the neurobiological basis of language in the brain. On the basis of decades of electrophysiological and neuroimaging studies, several researchers have attempted to use individual pieces of research (“bricks”) for building grand theories (“edifices”) of the brain basis of language (e.g., Matchin & Hickok, 2020; Friederici, 2017; Hagoort, 2016; Hickok & Poeppel, 2016; Bornkessel-Schlesewsky & Schlesewsky, 2013; Tyler & Marslen-Wilson, 2008; Ullman, 2004). However, this process of model building is not straightforward, especially because researchers have defined “language” in different ways, leading them to focus on different phenomena, properties, and levels of investigation. Moreover, researchers have used a wide range of methodological approaches to investigate “their” definition of language. Consequently, the “brickyard” (Forscher, 1963) of cognitive neuroscience of language is filled with idiosyncratically shaped bricks (i.e., research results and directions) that might not necessarily fit together or easily map onto each other. This issue may be intrinsic to any field of neuroscience research investigating higher cognitive functions. Here, we examine issues in the study of the neurocognitive basis of language, highlighting problems as well as potential solutions.
To better understand how individual bricks may be designed and organized to build an edifice of “language,” one can employ the derivation chain framework introduced by Meehl (1990). This framework describes how any theory under investigation depends on several premises, so-called auxiliary assumptions. Accordingly, conclusions drawn based on a given experimental outcome rest on the validity of these (hidden) auxiliary assumptions. If any of the auxiliary assumptions turns out to be false, the derivation chain from theory to statistical inference will be weakened or broken. This weakened derivation chain ultimately compromises the conclusions that can be drawn based on a particular experiment. In other words, the main theory is only as strong as its supporting auxiliary assumptions. These derivation chains, previously discussed in other fields (e.g., Scheel, Tiokhin, Isager, & Lakens, 2021), are highly appropriate to describe many chains of assumptions present in the cognitive neuroscience of language.
In the following, we will first discuss how a wide range of assumptions imported from linguistics, psychology, and neuroscience constrain the theoretical foundations of most of the studies conducted in the interdisciplinary field of cognitive neuroscience of language (Theoretical Assumptions section). We will then consider an equally important factor affecting the reliability of the derivation chain, which refers to the assumptions made when translating this interdisciplinary theoretical apparatus into an empirical investigation (Methodological Assumptions section). In particular, from the experimental design to the measurements and analysis, researchers incorporate a large number of assumptions in the derivation chain. Each of these decisions shapes the contribution of a given study to the field of the cognitive neuroscience of language (see the Example of a Derivation Chain section in Zaccarella, Meyer, Makuuchi, and Friederici [2017]).
THEORETICAL ASSUMPTIONS
The first step in any derivation chain is to identify the construct that one intends to investigate experimentally. Although this may sound trivial to researchers from other fields, practitioners in cognitive neuroscience of language have hardly ever agreed on how to strictly define language and associated theoretical concepts. For example, some researchers consider language primarily as a means of communication (e.g., Beckner et al., 2009; Scott-Phillips, Kirby, and Ritchie, 2009; Tomasello, 2008), whereas others treat language as an abstract formal process operating over representations that can be externalized in a spoken, written, or signed form (e.g., Chomsky, 2017; Friederici, Chomsky, Berwick, Moro, & Bolhuis, 2017). Again, others define language as the act of communicating in any form, including not just speech and sign but also communicative gesture (e.g., Goldin-Meadow & Brentari, 2017; Özyürek, 2014; Arbib, Liebal, & Pika, 2008). For any particular definition or understanding of language, a certain interpretative context is therefore set and, even within a particular definition of language, different aspects of language are in specific focus (see Box 1). Accordingly, the choice of basic definitions directly influences what a researcher will consider as the key constructs of interest, which manipulations will be adopted in a given experimental design, and how the collected data will be analyzed (see Example of a Derivation Chain section: Zaccarella et al., 2017).
We suggest that the lack of agreement on the strict definition of language is also partially rooted in the fact that the derivation from linguistic theory into observational terms through empirical hypotheses is not always straightforward. It is by now widely known and established that neither linguistic theories nor processing theories directly map onto neurobiological processes and the units of analysis (for detailed discussions, see Baggio, 2020; Martorell, 2018; Embick & Poeppel, 2015; Poeppel & Embick, 2013). That is, there is currently no established direct mapping from a theoretical linguistic construct (e.g., Merge), nor from the construct of a psycholinguistic model (e.g., working memory) to the basic units of neuroscience (e.g., cell assemblies, frequency bands), that clearly defines the link between the computational, algorithmic, and implementational levels in our field (Figure 1). Therefore, researchers have many degrees of freedom in choosing their favorite construct, theoretical framework, processing theory, and hypothesis about their neural implementation.
Here, we do not aim to establish a consensus about how to define language or what phenomena to focus on in future research; instead, we propose that researchers should be explicit in describing the starting point of the derivation chain used in the construction of a particular edifice. One reason for this is that even similar neural data can receive divergent interpretations by different researchers, depending on basic linguistic definitions and assumptions that they explicitly but more often also implicitly choose to adopt. Once derivation chains are made explicit, the current diversity of viewpoints may actually leave room for integration in the long run. To use the analogy introduced above, we suggest that the different basic definitions and diverging research interests within the cognitive neuroscience of language might have led to the production of differently shaped bricks and simultaneously to the construction of edifices that serve different purposes (i.e., a “chaotic brickyard”; Forscher, 1963). However, these different bricks do not necessarily need to contradict each other, and how they might fit together may become clearer once the underlying auxiliary assumptions are made explicit. For example, findings obtained in the context of studying language as primarily a means for communication may highlight aspects that are disregarded in an approach focusing on language as a computational system building up phrases and sentences and vice versa.
Finally, in addition to explicitly and implicitly made auxiliary assumptions, there may be occasions in which no commitment to a particular theoretical standpoint is or can be made at all. Many studies on the neural basis of sentence processing only implicitly commit to some form of structural representation of sentences without any explicit link or assumption to theoretical descriptions of sentence structure (i.e., as to whether these representations are derived from phrase structure or dependency grammars; see Imports from Linguistics section below). Sometimes, however, no commitment is made at all, which can result in gaps in the derivation chain: For example, this may be the case when the meaning of phrases or sentences is studied without any assumptions made as to how these meanings may be represented (i.e., as models, language of thought formulae, conceptual graphs). The distinction between implicit assumptions and actual gaps in a derivation chain can be blurry in some instances, but both can have distinct consequences within the derivation chain framework: Although implicit assumptions—that the researcher is not aware of—affect the interpretability of a derivation chain and thus the interpretation of an experiment's results, actual gaps in a derivation chain can be the result of known disconnects between disciplines or result from limitations of the methods used (see Methodological Assumptions section below).
In the remainder of this article, we will mainly use the study of syntax in linguistics, psychology, and neuroscience as one of the different subcomponents of language in focus within the field of the cognitive neuroscience of language (see Box 1). However, we wish to emphasize that some fundamentals we discuss for syntax can be and have been applied to other subcomponents such as, for example, semantics or phonology.
AT1: Imports from Linguistics
Linguistics examines language as such. Each individual researcher decides what elements of linguistics enter into the derivation chain of their experiment. In principle, researchers can choose to disregard modern linguistics altogether and perform neuroscientific experiments without any connection to linguistic theory. For example, a researcher may choose to present participants with recordings of single words randomly drawn from a thesaurus and contrast this with a control condition in which participants listen to nonspeech noise. There may be good reasons to conduct an experiment along these lines. However, the conclusions about language processing based on this experiment will likely be limited to the truism that in one condition, the participants listened to linguistic stimuli, whereas, in the other condition, they did not. Crucially, even when this example (explicitly) disregards linguistic theory, the linguistic view that language is categorical in nature (i.e., there are no half-words) is automatically adopted (Chomsky, 2002; Salmon, 1969). Moreover, words or signs are treated as discrete units despite the fact that this is never reflected in the nature of the auditory, written, or signed signal. In fact, apart from prosodic phrasal boundaries, there is no acoustic break signaling the beginning and the end of individual words in the speech stream, nor holds or pauses indicating the beginning and end of signs in the visual stream of signs: In the case of prosodic phrase boundaries, listeners are known to perceive a boundary between syntactic phrases even without an acoustic break (Steinhauer, Alter, & Friederici, 1999).
A case in point for the current discussion is the way in which different theories describe the grammar reflecting the idealized speakers' abstract knowledge of language. The syntactic representation of a sentence can be described using many different frameworks, which, in turn, come with a number of implicit and explicit auxiliary assumptions. According to different theory-specific proposals, the syntactic structure of a sentence can therefore be analyzed assuming either binary branching (e.g., Chomsky, 1988, 1995) or n-ary branching (Pollard & Sag, 1994), which results in rather different-looking structural descriptions (see Figure 2A and 2C). Furthermore, a particular linguistic theory may start from the assumption that well-formed sentences have a so-called deep structure satisfying some structural requirements and a surface structure that complies with positional relationships (as in the early versions of the principles and parameters theory within the generative grammar tradition; Chomsky, 1988), or alternatively, that it should provide the complete derivation of a structure in its description (e.g., minimalism; Chomsky, 1995) at all times (see Figure 2A) or, conversely, that it should allow for the combination of “preassembled” structural elements already stored in memory (e.g., tree adjoining grammars; Joshi & Schabes, 1997; Joshi, Levy, & Takahashi, 1975; see Figure 2B).
The different syntactic analyses presented in Figure 2 all accurately capture the structure of the respective sentences according to the theoretical assumptions of each of the corresponding linguistic frameworks. From the point of view of the cognitive neuroscience of language, however, this raises two major questions: First, is there any reason to adopt one approach of capturing sentence structure over the other? Second, what are the implications of adopting one or the other approach for the design of a cognitive neuroscience experiment? More precisely, we have to consider whether on some level of abstraction different linguistic theories may turn out to be notational variants (for discussions, see Nefdt & Baggio, 2023; Johnson, 2015). That is, does Theory A actually predict effects that are not predicted by Theory B and, if so, how would these differences manifest themselves in brain and behavior so that they can be operationalized in an experimental design?
There are no easy answers to these questions, because researchers may have different reasons to prefer one structural description over another. As long as the phenomenon they are attempting to investigate in their experiment does not critically depend on a particular aspect of a specific theory, the choice of theory and the concomitant theoretical imports (e.g., binary branching and complete derivation) may be arbitrary. However, if an experiment attempts to investigate, for example, the nature of the syntactic combinatorial operations in language, then the choice of a theoretical description directly affects the experimental design: In a minimalist analysis, it is assumed that the combinatorial operation is binary, combining only two elements at a time, whereas more than two elements can be combined at the same time under the assumption of n-ary branching. An experiment motivated by linguistic theory attempting to study syntactic combination would have to take this theoretical difference into account (also see the Example of a Derivation Chain section in Zaccarella et al. [2017]).
AT2: Imports from Psychology
Psycholinguistics, at the interface between psychology and linguistics, examines the psychological processes involved in comprehending and producing language and considers factors that go beyond the formal and frequently atemporal descriptions of theoretical linguistics. This is best illustrated with an example: Although a listener (or signer) during online language comprehension encounters the leftmost element in a sentence first (e.g., “cats” in “cats chase mice”), the syntactic analysis of the sentence according to most linguistic theories would usually start from the last encountered element in the sequence (i.e., “mice”). Traditional linguistic theories primarily address language competence, that is, the internal linguistic knowledge of the ideal speaker/signer-hearer/viewer (Chomsky, 1965), without being concerned about how comprehenders actually process them. In contrast, psycholinguistics mainly deals with language performance, the real (as opposed to the idealized) speaker/signer-hearer/viewer's actual online language use (Harley, 2013).
An intuitive way to understand the difference between competence and performance is that real-life comprehension and production are constrained by cognitive resources and contextual information. These constraining factors are particularly evident in center-embedded structures such as in the example “The scientist that a journalist that a policeman knows cites is very famous.”, where it becomes very “expensive” (in terms of working memory resources) to link each subject to its corresponding verb without hesitation or rereading. From a theoretical–linguistic point of view, however, there is no limit to the number of potential embeddings (Miller & Chomsky, 1963)—that is, there is no rule in language that prevents the embedding of one sentence into another. Psycholinguistically, comprehenders get constrained by factors such as limited working memory performance, as it becomes increasingly difficult to comprehend structures that go beyond two (or three) embeddings (Cecchetto, Giustolisi, & Mantovan, 2016; Lewis, 1993, 1996; Blaubergs & Braine, 1974).
Researchers have frequently opted to analyze the link between theory and behavior in the context of Marr's three-level framework of analysis (Figure 1), which allows for more precise descriptions of the distinction between competence and performance (for detailed discussions, see Van Rooij & Baggio, 2021; Baggio, Van Lambalgen, & Hagoort, 2012). This is because the intuitive distinction between competence and performance factors is likely too simplistic, as it disregards that the aggregate of performance factors constitutes more than just constraints on competence, whereas the aggregate of competence factors constitutes not just the absence of performance constraints (for discussion, see Van Rooij & Baggio, 2021; Van Rooij, 2008; Frixione, 2001). That is, performance factors as such do not merely impose constraints on competence factors but constitute an empirical domain of facts on their own, so that the algorithms actually implementing a particular linguistic function are indeed more than just the sum of performance constraints while at the same time also being different from the relevant factors in a description on the computational level.
This in turn implies that the derivation chain of studies in the cognitive neuroscience of language also adopts a number of concepts that originate from general psychology and which play a role during language processing: For example, working memory (e.g., Swets, Desmet, Hambrick, & Ferreira, 2007; Fedorenko, Gibson, & Rohde, 2006), executive functions (e.g., Shao, Janse, Visser, & Meyer, 2014; Baddeley, Hitch, & Allen, 2009), predictive processing (e.g., van der Burght, Friederici, Goucha, & Hartwigsen, 2021; Weber, Grice, & Crocker, 2006), or automaticity of higher mental processes (Pyatigorskaya, Maran, & Zaccarella, 2023). Distinct linguistic theories differ in the strength of their link to these psychological theories. For example, the assumption that linguistic structures or constructions can be stored in the mental lexicon is implemented in the Tree-Adjoining-Grammar formalism (Figure 2B) as well as Construction Grammar (Goldberg, 1995). In contrast, traditional versions of the Minimalist Program (Figure 2A) assume that linguistic constructions are not stored in the mental lexicon but are instead thought to be fully derived. Within the context of Marr's three-level framework, this difference reflects formally different assumptions about the relationship between the computational and algorithmic levels of analysis, respectively, the degree to which a grammatical theory resembles a computational description or includes algorithmic components (see also Jackendoff, 2003). Although a more or less direct link between a description on the computational level to the algorithmic level does not make a certain linguistic theory more “psychologically real” than another, an explicit link to psychologically measurable constructs can be beneficial when investigating certain constructs experimentally.
Performance factors must be taken into account to gain a more complete understanding of language in the brain because any experiment (including collecting judgments about the acceptability of a particular sentence) necessarily taps into performance. At the same time, the study of an idealized notion of competence alone can sometimes help to interpret the available data in a certain way that allows for a distinct level of understanding that would not necessarily emerge based on the study of the algorithmic and implementational levels alone. For example, the study of language use alone may miss the suitable description of language structures as hierarchical organization of constituents (Frank, Bod, & Christiansen, 2012), whereas this insight arises relatively early when looking only at structural descriptions of linguistic competence. Similarly, the degree to which some theoretical linguistic assumptions (e.g., constituency or the full derivation under a minimalist approach) can (and should) be reflected in actual behavioral measures (e.g., RTs and response accuracy) remains an open question and subject to debate. Processing-friendly models of language competence have begun to show that some behavioral facts on language comprehension in real speakers can be better explained if fine-grained theoretical assumptions and specific directionality of derivation are taken into account (e.g., the type of lexical restrictions of the constituents involved in the structure; Chesi & Moro, 2015). Furthermore, a link between linguistics and processing was also addressed by Gibson (2000), suggesting a theory of linguistic complexity to explain components of sentence parsing, and by Levy (2008), proposing a probabilistic, expectation-based theory accounting for syntactic processing difficulties.
In general, the choice to begin conceptualizing an experiment from the algorithmic level of psychological theories of language processing (see Figure 1) or from the abstract analysis of language competence may not only lead to a focus on different phenomena but also lead to different interpretations of the data considered. This is evidenced by studies that have attempted to compare distinct models accounting for different performance factors (e.g., Chesi & Canal, 2019).
AT3: Imports from Neuroscience
Neuroscience primarily studies the mapping from the algorithmic level to the implementational level, which appears to be similarly challenging as the mapping between the computational to the algorithmic level in linguistics and psychology. Both psychology and neuroscience aim to study how language is processed from two different perspectives that may offer different explanations of the data observed. Yet, because of the general disconnect between psychology and neuroscience (e.g., Beste, 2021), the explanations offered by either cannot be straightforwardly translated. For example, it is currently not clear how different clusters observed in a fMRI study could be related to a difference in RTs, or vice versa. There is of course much more to say about the different outlooks and the actual disconnects of psychology and neuroscience in general. For example, although a psychological theory relates to how a particular kind of information (in our case: linguistic information) is being processed, it is currently still unclear how any kind of information is actually represented neurally (Poeppel & Idsardi, 2022; Gallistel, 2017; Krakauer, Ghazanfar, Gomez-Marin, MacIver, & Poeppel, 2017). Similarly, any psychological theory of (language) processing relies on a notion of memory, but it is currently unknown how neurons carry forward information in time (Langille & Randy Gallistel, 2020; Trettenbrein, 2016; Gallistel & King, 2009).
When designing an experiment on language using neuroscientific methods, researchers therefore face a three-way disconnect in their derivation chain (see Figure 1). For example, one of the defining properties of language is its discrete infinity, that is, the fact that we can effortlessly comprehend and produce sentences that we have never encountered before in our life. Having this in mind, different researchers in cognitive neuroscience of language have singled out this generative capacity of language that assembles constituents as their object of investigation. Some researchers have investigated the neural representation of word combination purely at the syntactic level (Zaccarella et al., 2017; Zaccarella & Friederici, 2015a, 2015b), whereas others have adopted a broader perspective on the combination of discrete linguistic elements comprising syntactic, phonological, and semantic levels (Hagoort, 2013, 2019; Hagoort & Indefrey, 2014). On a certain level of abstraction, both groups of studies have attempted to understand how and where linguistic elements are combined in the brain. In contexts where both teams of researchers investigated syntactic combinatoriality, the interpretations of the respective neural data, which has shown significant overlap in language-relevant brain regions, at times still have diverged because of the different derivation chains employed: That is, partially similar operationalizations in experimental designs and the use of the same experimental method (i.e., fMRI) yielded partially overlapping experimental results with different interpretations motivated by theoretical choices (see the Example of a Derivation Chain section: in Zaccarella et al. [2017]).
Lastly, neurocomputational models have recently been proposed to bridge some of the gaps (e.g., Martin, 2020). For example, computational models of sentence processing have been proposed as explanations of how readers incrementally process ambiguous sentences (Hale, 2014, 2016) as well as how working memory principles constrain sentence comprehension (Lewis, Vasishth, & Van Dyke, 2006). Evaluating model predictions against reading time data, these models constitute attempts at bridging the disconnect between linguistics and psychology, although this approach is not without limitations (Guest & Martin, 2023; Ten Oever, Kaushik, & Martin, 2022). Furthermore, such models have been used to explain the neural dynamics during naturalistic language comprehension (Brennan, 2016). Importantly, these computational models must be interpretable, in that they connect with or implement theoretical constructs from linguistics (Hale et al., 2022). In this spirit, as one suggested solution for dealing with the wealth of available linguistic theories within the field of cognitive neuroscience of language (see Imports from Linguistics section), neurocomputational models have been proposed to be able to evaluate different linguistic theories by applying models with different parameter specifications to neural data (Brennan, 2016).
METHODOLOGICAL ASSUMPTIONS
In addition to the auxiliary assumptions on the theoretical level, there are also auxiliary assumptions defined purely by methodological choices of the experimenter (Figure 3). Whereas some of these assumptions also affect other fields of cognitive neuroscience, some of them are specific to the cognitive neuroscience of language.
Language and Stimuli
AM1: The Language (Modality) Tested Is One Where the Relevant Linguistic Phenomenon of Interest Can Be Tested
A key characteristic that sets language apart from many other cognitive domains is the diversity of forms of the world's more than 7,000 languages (UNESCO, 2021). Consequently, the ability of a given study to highlight general neurocognitive features and principles of language processing rests on the assumption that the language under investigation shares central properties that can be observed across languages (Longobardi & Roberts, 2010; Baker, 2009; Evans & Levinson, 2009; Greenberg, 2005). For instance, if a study reports on the neural correlates of syntactic processing without specifying the language used, the assumption is made that syntactic processing in the language under investigation is representative of syntactic processing as such. Depending on the process, this auxiliary assumption may not be valid given the large cross-linguistic variability in syntactic structure. Indeed, various instances of cross-linguistic differences on the neural level have been found (e.g., Goucha, Anwander, Adamson, & Friederici, 2022; Bornkessel-Schlesewsky et al., 2011). Ignoring this assumption might have amplified apparently conflicting findings in the literature. For example, investigations of syntactic processing in English have highlighted the role of the posterior temporal lobe (Law & Pylkkänen, 2021; Matar, Dirani, Marantz, & Pylkkänen, 2021; Matchin, Brodbeck, Hammerly, & Lau, 2019; Matchin, Hammerly, & Lau, 2017), whereas studies conducted in German have pointed toward a central role of the left inferior frontal gyrus (IFG) (van der Burght, Goucha, Friederici, Kreitewolf, & Hartwigsen, 2019; Goucha & Friederici, 2015; Zaccarella & Friederici, 2015a).
At a broader (geographical) level, language studies on so-called Western, Educated, Industrialized, Rich, and Democratic (WEIRD) populations (Henrich, Heine, & Norenzayan, 2010) often report findings on “language processing” in general (e.g., “Intonation guides sentence processing in the left inferior frontal gyrus”), while only explicitly mentioning the specific language studied (e.g., “German”) in the Methods section. Studies on non-WEIRD populations generally mention the language investigated in a prominent way, for example, in the title or abstract (e.g., Garrido Rodriguez, Norcliffe, Brown, Huettig, & Levinson, 2023; Matar, Pylkkänen, & Marantz, 2019; Wu, Zaccarella, & Friederici, 2019; Ohta, Koizumi, & Sakai, 2017). The increased number of studies conducted on non-WEIRD languages in recent years allows cross-linguistic examinations (e.g., Malik-Moraleda et al., 2022), which represents a first step toward a clear definition of shared and language-specific neurocognitive bases of language processing.
Similarly important is to explicitly state the chosen language modality and to what extent this decision shapes conclusions on language processing and the modalities in which they can be externalized and perceived. Visually perceived sign languages (Pfau, Steinbach, & Woll, 2012; Klima et al., 1979) and the tactile languages of the deaf-blind (Edwards & Brentari, 2020; Checchetto, Geraci, Cecchetto, & Zucchi, 2018) provide interesting testing grounds for supposed universal processes. At the same time, the choice of language and modality therefore already constitute the (frequently implicit) adoption of auxiliary assumptions. This decision concerns the question whether (neuro-)cognitive processes are shared between comprehension and production but also spoken, written, and sign language (Trettenbrein, Papitto, Friederici, & Zaccarella, 2021; Arana, Marquand, Hultén, Hagoort, & Schoffelen, 2020; Vigneau et al., 2011) and which processes might be unique to each of these modalities (McQueen & Meyer, 2019). In other words, to what extent can processing mechanisms in a given domain be generalized to the language system as a whole, or are they limited to linguistic processing in the studied domain only? As such, a generic label such as language processing almost always represents an underspecification, unless the experimental task used has been established to capture a processing mechanism shared across modalities (e.g., Giglio, Ostarek, Weber, & Hagoort, 2022; Arana et al., 2020; Gastaldon, Arcara, Navarrete, & Peressotti, 2020; Matchin & Wood, 2020; Heim, van Ermingen, Huber, & Amunts, 2010).
AM2: The Materials Employed Are Capable of Testing the Hypothesis without Introducing Confounds
The experiments conducted in our field usually consist of the presentation of isolated words, phrases, sentences, or discourses that vary along specific linguistic dimensions of interest. As in other cognitive neuroscientific domains, the materials employed need to be able to test the hypothesis of interest without introducing additional confounds. At present, there are several tools available that allow researchers to control for the perceptual aspects of materials employed in language research (e.g., Trettenbrein & Zaccarella, 2021; Boersma, 2001). However, additional dimensions of interest relate to the psycholinguistic properties of the words employed. For example, a researcher might want to ensure that the difference between two conditions (e.g., abstract and concrete nouns) is not confounded by differences in lexical frequency or by phonological or orthographic neighborhood (i.e., how many words are pronounced or written very similarly to a given target; Marian, 2017; Marian, Bartolotti, Chabal, & Shook, 2012). This issue can be easily addressed by consulting linguistic corpora (i.e., databases), which allow researchers to extract psycholinguistic variables of interests (e.g., length, frequency of occurrence) for the entries (e.g., words or signs) of a specific language. Accordingly, the use of these corpora ensures that the stimuli (of the distinct categories/variables/factors) are matched along the relevant linguistic dimensions (for a methodological discussion, see also Sassenhagen & Alday, 2016). In cases where corpus data are not available for a certain language (e.g., as for many sign languages), subjective ratings can be used to establish psycholinguistic parameters (e.g., Trettenbrein, Pendzich, Cramer, Steinbach, & Zaccarella, 2021; Caselli, Sehyr, Cohen-Goldberg, & Emmorey, 2017).
An additional aspect of stimulus preparation in language studies relevant especially for EEG and MEG concerns pre-target words. For example, target words of interest that differ along specific linguistic dimensions of interest (e.g., “shirt” and “phone”) can be presented within sentences or phrases (e.g., “the driver wears a shirt” vs. “the driver wears a phone” when considering semantic plausibility). In turn, pre-target words (i.e., the contexts) need to be matched across relevant perceptual and psycholinguistic variables of interest or counter-balanced across conditions (e.g., see Maran, Numssen, Hartwigsen, & Zaccarella, 2022; Hasting & Kotz, 2008). If the pre-target words elicit sustained differences across conditions, common EEG/MEG preprocessing procedures such as baseline correction might artificially create an effect at the target word (for a detailed discussion, see Steinhauer & Drury, 2012).
Crucially, potential confounding variables can (and should) be included as regressors in the statistical model (Hamilton & Huth, 2020). An implicit assumption in psycho- and neurolinguistics research is that the information provided by the corpora is representative of the language as used by the participants in a study. Yet, this assumption can be violated, for example, if the corpus is based on relatively old texts that include archaic or disused words (see Brysbaert et al., 2011). This problem has led to the development of corpora based on more recent movie subtitles, because they seem to more closely match how language is used in everyday life (Boada, Guasch, Haro, Demestre, & Ferré, 2020; Soares et al., 2015; van Heuven, Mandera, Keuleers, & Brysbaert, 2014; Brysbaert et al., 2011; Cuetos, Glez-Nosti, Barbón, & Brysbaert, 2011; Cai & Brysbaert, 2010; Dimitropoulou, Duñabeitia, Avilés, Corral, & Carreiras, 2010; New, Brysbaert, Veronis, & Pallier, 2007). Importantly, the development of accurate corpora for languages (e.g., as for some signed languages, indigenous languages) that still lack these extensive resources is an important goal for the years to come.
In an attempt to overcome some of these issues, studies making use of so-called naturalistic stimuli have recently gained popularity (for reviews, see Hamilton & Huth, 2020; Alday, 2019; Willems, 2015). Rather than presenting carefully controlled stimuli as part of artificial laboratory tasks, these naturalistic approaches make use of sentences extracted from spoken corpora, short narratives, or audiobooks (Stehwien, Henke, Hale, Brennan, & Meyer, 2020). Naturalistic stimuli have been successfully used to study phonetic feature encoding (Mesgarani, Cheung, Johnson, & Chang, 2014), syntactic representations (Bhattasali et al., 2019; Hale, Dyer, Kuncoro, & Brennan, 2018; Brennan, Stabler, Van Wagenen, Luh, & Hale, 2016), linguistic predictions (Heilbron, Armeni, Schoffelen, Hagoort, & de Lange, 2022; Shain, Blank, van Schijndel, Schuler, & Fedorenko, 2020; Brennan & Hale, 2019), and semantic processing (Brodbeck, Presacco, & Simon, 2018; Broderick, Anderson, Di Liberto, Crosse, & Lalor, 2018; Huth, de Heer, Griffiths, Theunissen, & Gallant, 2016). Because of the increased ecological validity of the naturalistic stimuli, the assumption about the method's ability to capture the process of interest (i.e., naturalistic language processing) should be more readily met. Yet, although the stimulus material may be unconstrained and more natural, these naturalistic approaches pose numerous analysis-based choices that are, in turn, based on their own auxiliary assumptions. Traditional statistical techniques are often unable to account for the numerous confounding variables present in natural speech (Hamilton & Huth, 2020). Instead, testing for a variable of interest requires complex computational tools with parameters defined by the experimenter. This effectively leads to a shift in auxiliary assumptions related to the experimental design to the analysis phase of the experiment (see below).
Data Acquisition
AM3: The Neuroscientific Research Technique/Method Is Capable of Providing Data That Can Be Used to Test the Hypothesis of Interest
Similarly to other cognitive neuroscience domains, researchers in the field of cognitive neuroscience of language can rely on a large number of neuroimaging, neurostimulation, and neuromodulation techniques. For instance, neuroimaging techniques such as fMRI, EEG, and MEG provide correlational evidence on the link between brain functioning and cognition, whereas brain stimulation techniques (Hartwigsen & Silvanto, 2022; Bergmann & Hartwigsen, 2021) and lesion studies (Matchin et al., 2022; Vaidya, Pujara, Petrides, Murray, & Fellows, 2019) allow researchers to draw causal inferences. In addition, distinct research techniques show a difference in temporal precision, susceptibility to artifacts (e.g., Abbasi, Steingräber, & Gross, 2021; Ouyang et al., 2016; Luck, 2005) and to cancellation of signals from a particular brain area (e.g., Gorno-Tempini et al., 2002; Devlin et al., 2000; Jezzard & Clare, 1999; Ojemann et al., 1997), and sensitivity to the orientation of the electrical currents (e.g., Ahlfors, Han, Belliveau, & Hämäläinen, 2010; Ahlfors et al., 2009; Cohen & Cuffin, 1991). Consequently, each neuroscientific technique poses its own methodological constraints regarding the conclusions that can be drawn based on its data. These distinct methodological constraints might be the reason why, in language research, conclusions drawn based on EEG, MEG, and fMRI data might not always converge (Wang et al., 2021; Lau, Gramfort, Hämäläinen, & Kuperberg, 2013). Some of these methodological constraints apply to cognitive neuroscience research in general—here, we focus on the specific methodological issues that require extra consideration in language research in particular.
For instance, differences in the temporal resolution of neuroimaging methods become particularly important considering that language comprehension is characterized by a series of early automatic and late controlled processes (for reviews, see: Maran, Friederici, & Zaccarella, 2022; Friederici, 2011; Bornkessel & Schlesewsky, 2006). As such, it is important to note that distinct neuroimaging techniques vary in temporal precision (He & Liu, 2008) and therefore might capture distinct processing stages. For instance, EEG and MEG provide a direct measure of brain activity (Lopes da Silva, 2013) that best captures transient, feedforward neural activity (Kochari, Lewis, Schoffelen, & Schriefers, 2021; Wang et al., 2021; Vartiainen, Liljestrom, Koskinen, Renvall, & Salmelin, 2011). In contrast, the temporal resolution of fMRI is largely insensitive to such transient, feedforward neural activity (Segaert, Weber, de Lange, Petersson, & Hagoort, 2013; Vartiainen et al., 2011; Bunge & Kahn, 2009; Furey et al., 2006; Arthurs & Boniface, 2002). Consequently, fMRI might be more suitable to capture downstream, late-stage effects that are sustained or variable in time. Accordingly, whether studies with high and low temporal resolution are actually capturing similar linguistic processes remains unclear, possibly limiting the integration of their findings. Consideration of convergence and divergence across imaging modalities can yield new insights into the contributions of multiple components of the language network. Another important aspect that should be considered, especially for studies on language comprehension, is the fact that the distinct experimental environments in which neuroimaging studies (e.g., a noisy MRI environment) take place affect the patterns of neural activation (Pellegrino et al., 2022; Andoh et al., 2017).
Although neuroimaging methods such as MEG, EEG, and fMRI cannot provide causal evidence for the functional relevance of particular brain regions, this issue can be overcome using TMS (Polanía, Nitsche, & Ruff, 2018). Especially when combined with concurrent EEG measurements (TMS-EEG), TMS can be used to study the causal involvement of different cortical areas in language processing as proposed by neurobiological models of language (e.g., Hagoort, 2019; Friederici, 2011, 2012) with an extremely high temporal resolution. However, there are also several disadvantages to the use of TMS in language research: One obstacle is the uncomfortable stimulation of surface muscle tissue when TMS is applied over frontolateral and temporoparietal language areas, which are brain regions considered of central relevance for language processing in most of the recent models in the field (Matchin & Hickok, 2020; Hagoort, 2019; Friederici, 2017). Such discomfort needs to be taken into account when thinking of appropriate control (sham) conditions (e.g., see Duecker & Sack, 2015). When combined with simultaneous EEG measurements, the stimulation of this muscle tissue leads to the generation of long-lasting and distinctly large muscle artifacts that make it hard to interpret the underlying EEG signal (Salo, Mutanen, Vaalto, & Ilmoniemi, 2020). The development of new real-time visualization tools (Casarotto et al., 2022) can be used in an attempt to minimize muscle twitches before the start of the measurements. This approach is not only important when interested in early components of the EEG response to TMS, but also because these muscle twitches are clearly perceived by the participants and therefore result in unspecific brain responses (Conde et al., 2019). Furthermore, the loud clicking sound also induces an electrophysiological response, which cannot be easily masked (Russo et al., 2022) when interested in auditory language processing. These sensory problems need to be carefully taken into account when designing a TMS-EEG experiment probing language. For instance, one could think of creating good (linguistic) control conditions in terms of stimulus materials, in which the TMS-related conditions (i.e., muscle twitches, clicking sound) are equal (i.e., interaction effects).
It should also be noticed that, when TMS is employed, a large number of parameters need to be set by the experimenter (e.g., the choice of an online or offline stimulation protocol), which might influence whether a neurostimulation study will provide causal evidence or not (Qu et al., 2022), ultimately affecting the derivation chain strength. Another phenomenon when using TMS to study language processing is compensatory effects within the large-scale, distributed language network (Hartwigsen, 2018). For example, during auditory language processing, the listener needs to rapidly analyze the sound, meaning, and structure of spoken words to associate the heard sound patterns with meaningful concepts. These complex processes require the interaction of numerous brain regions. Given the large-scale nature of the language network, there is great potential for adaptive plasticity to compensate for a focal perturbation of a key region when using TMS (Hartwigsen, 2018). In line with this, evidence shows that unifocal perturbation with (repetitive) TMS is often not sufficient to perturb various language-related processes (see, e.g., Maran, Numssen, et al., 2022; Kroczek, Gunter, Rysop, Friederici, & Hartwigsen, 2019), whereas the use of combined perturbation of two brain regions leads to an observable effect (e.g., Schroën, Gunter, & Hartwigsen, 2020; Hartwigsen et al., 2016). Such adaptive plasticity, however, makes it difficult to study the causal dynamics underlying language processing using TMS. The use of condition-and-perturb approaches (Hartwigsen et al., 2012) can increase the perturbation load on the language network and might be necessary to distinguish between the lack of a causal involvement of a brain region in a process and the initiation of compensatory mechanisms. Consequently, the attribution of a cognitive function to any cortical region may be skewed. The conclusions on structure–function relationships from lesion studies are similarly complex, because they, too, rest on the (implicit) auxiliary assumption that no plastic changes have taken place.
In summary, when using neuroscientific research techniques, the various auxiliary assumptions involved introduce their own constraints on the inferences that can be drawn. In the case of language research in particular, however, special care needs to be taken because important regions in the language network are posed with imaging modality-specific limitations: TMS over the inferior frontal and anterior temporal lobes is particularly uncomfortable because of the proximity to facial muscle tissues, and as mentioned before, anterior temporal regions are difficult to capture using fMRI because of signal distortions (Devlin et al., 2000; Jezzard & Clare, 1999). Or, as Meehl might have put it, the implicit assumption that a given neuroimaging or neurostimulation technique is equally suitable for different parts of the cortex (e.g., AM: the signal of a given neuroscientific method is uniform throughout the cortex) might lead to the weakening of the derivation chain in various language research agendas.
Data Analysis
AM4: The Analytic Approach Is Capable of Testing the Hypothesis of Interest
Data from EEG and MEG language studies are traditionally analyzed focusing on ERPs (Luck, 2005) or neural oscillations (Buzsáki & Draguhn, 2004). On the one hand, several ERP components (e.g., early left anterior negativity [ELAN], left anterior negativity [LAN], N400, P600) have been linked to specific stages of linguistic processing (e.g., phrasal building, morphosyntactic analysis, semantic composition, integration; see Fritz & Baggio, 2022; Hernández, Puupponen, & Jantunen, 2022; Maran, Friederici, et al., 2022; Brouwer, Crocker, Venhuizen, & Hoeks, 2017; Zaccarella & Friederici, 2015b; Friederici, 2011; Bornkessel-Schlesewsky & Schlesewsky, 2008). On the other, distinct neural oscillations seem to subserve the multiscale property of language, from the processing of phonetic and syllabic units to phrases, and ultimately to more complex aspects of comprehension (e.g., Benítez-Burraco & Murphy, 2019; Lewis, Wang, & Bastiaansen, 2015; Giraud & Poeppel, 2012).
Focusing the analysis on either ERPs or neural oscillations reflects an implicit assumption made by the researcher on the nature of the hypothesized effect. ERPs are the result of averaging across trials and, accordingly, capture only activity that is both time-locked (i.e., in a constant time-relationship) and phase-locked (i.e., in a constant phase-relationship) to a stimulus or process of interest (Luck, 2005). Accordingly, if a linguistic effect has a variable time-course over trials or manifests itself via non-phase-locked activity, it might not be adequately captured by ERP analyses (Maran, Friederici, et al., 2022; Kochari et al., 2021). Of note, the low temporal resolution of fMRI makes this method less affected by this potential issue (Kochari et al., 2021). Thus, when an ERP-based analysis is employed, researchers are implicitly assuming that their effect of interest manifests itself via time- and phase-locked neural activity. This is not a trivial assumption given that, for example, some jittering across trials might be present when acoustic stimuli are employed.
In contrast to ERPs, analyses based on neural oscillations are better suited to account for the multidimensionality of MEG/EEG data (Cohen, 2011, 2014), because this type of analysis can capture the power and phase relationships of different frequency bands. This additional information can be used to compute several measures of functional and effective connectivity (e.g., coherence, phase-locking value) at the sensor or source level (Bastos & Schoffelen, 2016; Maran, Grent-'t-Jong, & Uhlhaas, 2016; Schoffelen & Gross, 2009). Given the large number of measures that can be extracted via time–frequency analysis, researchers often need to make implicit assumptions on the effect of interest when analyzing their data. The first choice is the frequency (or frequencies) of interest. This decision might not be straightforward when naturalistic stimuli are employed, given the intrinsic variability in time of linguistic information (e.g., syllables; Alday, 2019). Cluster-based permutation tests (Maris & Oostenveld, 2007) have been employed in some studies (e.g., Segaert, Mazaheri, & Hagoort, 2018) to address effects that might extend beyond the canonical frequency bands. Critically, this method bears some limitations on the specificity (e.g., in time, space or in frequency bin) of a significant effect (Sassenhagen & Draschkow, 2019), whose definition might require a follow-up investigation tailored to investigate its exact extent. Additional assumptions are made when choosing to focus on changes either in power or in phase of neural oscillations. These additional assumptions should be kept in consideration, because these measures might be differently sensitive to a particular experimental manipulation (Ding & Simon, 2013; Luo & Poeppel, 2007). Overall, these choices should be motivated by the extensively available literature on the ERP (Hernández et al., 2022; Maran, Friederici, et al., 2022; Friederici, 2011) and oscillatory (Benítez-Burraco & Murphy, 2019; Meyer, 2018; Lewis et al., 2015; Murphy, 2015; Giraud & Poeppel, 2012) correlates of language processing. Furthermore, one should take into account the considerations on how basic neurophysiological mechanisms might subserve linguistic operations (Friederici & Singer, 2015; Fries, 2015; Murphy, 2015).
Aside from electrophysiological methods, analyses of neuroimaging data involve various researcher degrees of freedom that might shape their outcome. When analyzing fMRI data with an ROI-based approach, it should be acknowledged that the approach is (naturally) more constraining. Because it is spatially more constraining, it strongly depends on the validity of the assumption that the process of interest can be (uniquely) attributed to one ROI. Similarly, novel analysis techniques with respect to the use of TMS have recently used modeling of the TMS-induced electric field in each individual subject. The magnitude of the electric field in a certain ROI can, in turn, be used to explain modulations in behavior in a number of language tasks (Numssen, Van Der Burght, & Hartwigsen, 2023; van der Burght, Numssen, Schlaak, Goucha, & Hartwigsen, 2023; Maran, Numssen, et al., 2022; Kuhnke et al., 2020). In this case, the auxiliary assumptions again imply that the process of interest can be localized to one or multiple ROIs and that stimulation and task performance are linearly related.
A special case of ROI-based analyses in fMRI research is a so-called localizer approach (e.g., Fedorenko, Hsieh, Nieto-Castañón, Whitfield-Gabrieli, & Kanwisher, 2010). Here, the individual brain is masked using a functional contrast image involving a language processing and a baseline task, for example, complex sentences versus strings of unconnected pseudowords. Although in some ways regarded as a more powerful approach than whole-brain analyses or group-based ROI-based analyses, the functional localizer approach necessitates researchers to justify the conditions that the localizer contrast is based on and to carefully explain how the functional localizer constrains the subsequent results.
Finally, a recent development has seen the analysis of fMRI and MEG/EEG data combined with machine learning techniques, often using naturalistic stimuli. As an alternative to manually annotating the speech stimulus (e.g., Kaufeld et al., 2020), these analyses make use of a computational model that is trained on the stimulus input to predict the resulting brain responses. In the subsequent analysis, a comparison is made between the predicted and measured brain responses (Goldstein et al., 2022; Heilbron et al., 2022). Importantly, the results of such neural analyses will strongly depend on the model assumptions. For example, Generative Pre-Trained Transformer 2 (GPT-2) (Radford et al., 2019), one of the most successful models, is designed to form a next-word prediction based upon a sequence of preceding words. As already mentioned above, this type of model might be employed to derive computational metrics (e.g., “surprisal”) that can be correlated with changes in brain activity or reading times. In principle, such correlations might highlight neurocognitive processes related to predicting upcoming words' features. However, the conclusions drawn from such studies necessarily rest on the assumption that the defining trait of a given model (e.g., prediction) captures key aspects of how humans process language. Furthermore, a significant correlation between a computational model and the human brain cannot justify by itself the conclusion that they are implementing the same process (Guest & Martin, 2023; Ten Oever et al., 2022). Accordingly, an examination of the model's assumption and its compatibility with the notions in the fields of psycholinguistics and neurolinguistics is a fundamental step in the interpretation of a given study. Here, the auxiliary assumptions involved in computational approaches may be relatively less transparent than those involved in a traditional factorial design, especially for readers with limited computational background. It is therefore particularly important that in such studies the analytical assumptions are carefully justified and that any potential constraints on the results are discussed.
EXAMPLE OF A DERIVATION CHAIN: ZACCARELLA AND COLLEAGUES (2017)
In the following section, we provide an example analysis of the derivation chain involved in Zaccarella and colleagues (2017) to show how the adopted linguistic theory and employed methodology can influence the experimental design and the interpretation of the observed results. This section should be read as an example of reasoning about how a certain approach drives the experimental outcome, without attempting to disqualify the original interpretation.
To summarize the study, fMRI was used to investigate the cortical regions involved in syntactic structure building. Healthy volunteers read sequences of three words presented word by word. In their 2 × 2 design, hierarchy type (sentence vs. phrase) and merge (+merge vs. −merge) were included as within-subject factors (see Table 1). This resulted in the presentation of four conditions: a sentence (the ship sinks), a phrase (on the ship), and noun lists matched with the sentence and phrase sequences (stem ship juice and leek mouth ship, respectively). Participants were instructed to judge whether the sequence was a phrase, a sentence, or a list. In a univariate analysis, functional contrasts compared neural activation between the sentences/phrases and their respective noun-list conditions (sentence vs. list and phrase vs. list). This main effect was reflected in functional activation in the left IFG and superior temporal gyrus (STG). Because the sentence and phrase conditions required syntactic structure building whereas the noun-list conditions did not, the comparison between sentences and phrases and their respective list conditions was described as the main effect of merge; the authors reasoned that composing the sequences into phrases and sentences required the computation merge, whereas the list sequences did not.
. | . | Merge . | |
---|---|---|---|
+ . | − . | ||
Hierarchy type | Sentence | DAS SCHIFF SINKT | HALM SCHIFF SAFT |
the ship sinks | stem ship juice | ||
Phrase | AUF DAS SCHIFF | LAUCH MUND SCHIFF | |
on the ship | leek mouth ship |
. | . | Merge . | |
---|---|---|---|
+ . | − . | ||
Hierarchy type | Sentence | DAS SCHIFF SINKT | HALM SCHIFF SAFT |
the ship sinks | stem ship juice | ||
Phrase | AUF DAS SCHIFF | LAUCH MUND SCHIFF | |
on the ship | leek mouth ship |
AT1: Import from Linguistics
Adopting a known hypothesis from linguistic theory, their choice had several implications: (1) The authors assumed that, in the noun-list conditions, words could not be integrated into structures; (2) the authors assumed that phrases and sentences were formed by merge—a syntactic operation that builds syntactic structures out of single words. In doing so, the authors made a clear commitment to linguistic theories that accept merge as a structure-building operation (Berwick, Friederici, Chomsky, & Bolhuis, 2013; Chomsky, 1993). (3) Lastly, the authors assumed that phrases and sentences differ with regard to their syntactic representations.
AT2: Import from Psychology
Given the nature of the stimuli under analysis, (4) the study did not account for the directionality of structure-building processing in real-time language comprehension, but assumed that merge would assemble words into phrases and sentences regardless of which two words would occur first in a certain trial (e.g., preposition + determiner in the phrase condition; determiner + noun in the sentence condition). This means that the authors assumed that merge may work both top–down and bottom–up to form syntactic structures.
AT3: Import from Neuroscience
In linking linguistics and psychology to neuroscience, the authors assumed that (5) merge works on syntactic features that must be accessed at the lexico-syntactic level in some specific cortical region or cortical network to prompt compositional processing and that (6) the binary-formed phrases must be stored and represented in some abstract form in some specific neural assembly or cortical network and further reused.
AM1: Methodological Assumptions
On the methodological level, a number of choices can be discerned that further define the derivation chain of this study: (7) The visual, word-by-word presentation was chosen to mimic the incremental nature of spoken language. (8) The choice of fMRI rests on the assumption that structure building leads to a difference in neural activity that can be detected (via a difference in blood oxygenation) in a univariate contrast and which cannot distinguish between early/late processes and predictive/integrative processing at work during language comprehension. This assumption, in turn, presupposes that (9) merge can be localized as an operation supported by one or several brain regions, which support syntactic structure building.
Overall, this example illustrates how the composition of the chain leads to the conclusion derived from the study: localizing a certain linguistic operation specified in linguistic theory, that is, merge, in two cortical regions of the left hemisphere (i.e., the left IFG and STG).
The analysis of the derivation chain in Zaccarella and colleagues (2017) ultimately reveals that the empirical findings of the study can actually be re-interpreted within the context of alternative models of the neurobiology of language that also posit operations for syntactic structure building (e.g., syntactic unification; Hagoort, 2005, 2013). That is, the models of Friederici and colleagues (2017) and Hagoort (2005, 2013) both rely on auxiliary assumptions from linguistics (Friederici: Adger, 2003; merge: Chomsky, 1995; Hagoort: Unify, Jackendoff, 2002; Joshi & Schabes, 1997) and posit the existence of structure-building operations that are formally distinct (see Nefdt & Baggio, 2023; Johnson, 2015). Yet, the two accounts could only be distinguished at the neural level if the experimental design explicitly operationalized the formal differences between them. In Zaccarella and colleagues (2017), the experimental manipulation compared sentences and phrases against mere lists of words to isolate neural processes related to syntactic structure building, yet did not tap into formal differences between merge and (syntactic) unification and may therefore be motivated by either framework (see, e.g., Snijders et al., 2009; Humphries, Binder, Medler, & Liebenthal, 2006). We suggest that the neural data of this study remain uninformative with regard to which theoretical definition should be preferred over another. Instead, the a priori selection of the respective auxiliary assumptions seems to define how the neural data are interpreted. Future work attempting to operationalize formal differences between distinct cognitive operations (i.e., merge and syntactic unification) in experimental designs may shed light on these issues.
Concluding Remarks
The cognitive neuroscience of language is a wide field with a great diversity of experimental findings and approaches. We have argued that this diversity mostly results from the derivation chains being substantially long, complex, and mostly implicit in our field. Consequently, researchers encounter many degrees of freedom in forming their derivation chain, shaped by the numerous theoretical and methodological choices they make between hypothesis and experimental result. Researchers may choose to start their derivation chain on the computational level of the abstract description of linguistic competence. They may instead choose to start from performance factors relevant on the algorithmic level of language processing. Finally, they may disregard either for the most part and attempt to cut their derivation chain short by starting from neurobiology (i.e., the implementational level). Although all these choices can be well motivated and justified, we have argued here that most derivation chains in our field almost always import hidden assumptions from all three fields: linguistics, psychology, and neuroscience.
We have no illusion that the field should suddenly reach a consensus on key terminology or approaches. Instead, in agreement with Scheel and colleagues (2021), we argue that a highly interdisciplinary field such as the cognitive neuroscience of language requires researchers to form explicit statements concerning their experiment's derivation chain. The auxiliary theoretical assumptions as well as the constraining factors of the methodology employed in their work should be clearly motivated and discussed. Researchers should be aware of the numerous theoretical auxiliary assumptions attached to a theory when making inferences regarding the neurobiology of language. The same applies to methodological auxiliary assumptions, which constrain the scope of a given experiment, limiting the research questions that can or cannot be addressed. If these explicit and implicit theoretical and methodological auxiliary assumptions, as well as possible gaps in the derivation chain, are not considered, an accurate appraisal of data in the field of cognitive neuroscience of language is compromised.
Lastly, we would like to point out that the prevalence of different definitions and lack of a clear consensus in cognitive neuroscience need not be considered problematic. Instead, we suggest that the observed diversity of viewpoints should be considered complementary. We acknowledge that it is not always clear from the outset how the bricks made by different brickmakers could eventually be reassembled into one large edifice. However, we suggest that for the field to move toward the long-term goal of integration, an important first step is to help other researchers to identify how each brick was made. That is, researchers should carefully describe the complete derivation chain involved in their study. A more transparent discussion of the implicit and explicit auxiliary assumptions behind our experiments may therefore significantly improve research on the neurobiology of language.
Acknowledgments
The foundations for this article were laid at the Leipzig Lectures on Language, a lecture series and symposium held online in 2021. We are grateful to our lecture series co-organizer Caroline Beese, as well as all speakers and attendants for the inspiring talks and discussions. In addition, we wish to thank Antje S. Meyer for comments on this article.
Reprint requests should be sent to Patrick C. Trettenbrein, Department of Neuropsychology, Max Planck Institute for Human Cognitive and Brain Sciences, Stephanstraße 1a, 04103 Leipzig, Germany, or via e-mail: [email protected].
Author Contributions
Constantijn L. van der Burght: Conceptualization; Investigation; Writing—Original draft; Writing—Review & editing. Angela D. Friederici: Conceptualization; Investigation; Writing—Original draft; Writing—Review & editing. Matteo Maran: Conceptualization; Investigation; Writing—Original draft; Writing—Review & editing. Giorgio Papitto: Conceptualization; Investigation; Writing—Original draft; Writing—Review & editing. Elena Pyatigorskaya: Conceptualization; Investigation; Writing—Original draft; Writing—Review & editing. Joëlle A. M. Schroën: Conceptualization; Investigation; Writing—Original draft; Writing—Review & editing. Patrick C. Trettenbrein: Conceptualization; Investigation; Writing—Original draft; Writing—Review & editing. Emiliano Zaccarella: Conceptualization; Investigation; Writing—Original draft; Writing—Review & editing.
Funding Information
Patrick C. Trettenbrein: Deutsche Forschungsgemeinschaft (https://dx.doi.org/10.13039/501100001659), grant numbers: 501984557, DFG SPP 2392 “Visual Communication” (ViCom). All authors: Max-Planck-Gesellschaft (https://dx.doi.org/10.13039/501100004189).
Diversity in Citation Practices
Retrospective analysis of the citations in every article published in this journal from 2010 to 2021 reveals a persistent pattern of gender imbalance: Although the proportions of authorship teams (categorized by estimated gender identification of first author/last author) publishing in the Journal of Cognitive Neuroscience (JoCN) during this period were M(an)/M = .407, W(oman)/M = .32, M/W = .115, and W/W = .159, the comparable proportions for the articles that these authorship teams cited were M/M = .549, W/M = .257, M/W = .109, and W/W = .085 (Postle and Fulvio, JoCN, 34:1, pp. 1–3). Consequently, JoCN encourages all authors to consider gender balance explicitly when selecting which articles to cite and gives them the opportunity to report their article's gender citation balance.
REFERENCES
Author notes
All authors contributed equally and are listed in alphabetical order.