Language has a complex grammatical system we still have to understand computationally and biologically. However, some evolutionarily ancient mechanisms have been repurposed for grammar so that we can use insight from other taxa into possible circuit-level mechanisms of grammar. Drawing upon recent evidence for the importance of disinhibitory circuits across taxa and brain regions, I suggest a simple circuit that explains the acquisition of core grammatical rules used in 85% of the world's languages: grammatical rules based on sameness/difference relations. This circuit acts as a sameness detector. “Different” items are suppressed through inhibition, but presenting two “identical” items leads to inhibition of inhibition. The items are thus propagated for further processing. This sameness detector thus acts as a feature detector for a grammatical rule. I suggest that having a set of feature detectors for elementary grammatical rules might make language acquisition feasible based on relatively simple computational mechanisms.
Language acquisition is fast, largely based on positive evidence (or sometimes no evidence at all; Senghas, Kita, & Ozyürek, 2004; Goldin-Meadow & Mylander, 1998), goes far beyond what learners hear or see in their environment (Pinker, 1984; Chomsky, 1959), and results in a uniquely complex grammatical system that stands out in the animal kingdom (Yang, 2013; Hauser, Chomsky, & Fitch, 2002). Even seemingly straightforward “memory” problems such as learning the meanings of words hide complexities that call for human-specific grammatical adaptations (Medina, Snedeker, Trueswell, & Gleitman, 2011; Pinker & Jackendoff, 2005). Unsurprisingly, we know very little about the underlying computational mechanisms at the circuit level.
However, some linguistic mechanisms are evolutionarily ancient and have been repurposed for linguistic use (Fitch, 2017; Endress, Cahill, et al., 2009; Endress, Nespor, et al., 2009; Dehaene & Cohen, 2007). In such cases, it might be possible to identify core linguistic mechanisms whose systems-level implementation might be tractable due to its evolutionary history.
Here, I use sameness/difference relations as a case in point. I will first show that many grammatical rules are based on such relations, especially in morphology and phonology, but that similar relations are critical in many other domains and animals, suggesting that they reflect a linguistic core mechanism with evolutionarily ancient roots. I will then suggest that such relations can be computed using an ubiquitous processing motif: disinhibition among neurons or neural populations.
Sameness/Difference Relations in Language and Other Domains and Animals
Sameness/difference relations are critical for many aspects of linguistic structure, especially in phonology and morphology. For example, some 85% of the world's languages use some form of reduplication (Rubino, 2013). Among many other uses, reduplications can signal changes in word class (e.g., from noun to verb, as in the Marshallese contrast between “takin–sock” and “takinkin–to wear socks”; Moravcsik, 1978), attenuation (as in the Alabama contrast between “kasatka–cold” and “kássatka–cool”; Hardy & Montler, 1988), or intensification; they can mark differences in number (e.g., singular vs. plural), tense (e.g., past vs. present), aspect (e.g., continued vs. repeated occurrence or temporary vs. permanent), size, or case (see Rubino, 2013, and references therein).
Phonological processes also often appeal to sameness/difference relations, with some processes requiring some features to be identical within a relevant constituent and others requiring them to be different. Processes that require identical features include vowel harmony and assimilation. Specifically, in languages with vowel harmony, vowels within words (or smaller domains) need to have one or more features in common (Rose & Walker, 2011). For example, Hungarian words generally have either only back vowels or only front vowels; grammatical suffixes thus come in two varieties, one with back vowels and one with front vowels. Accordingly, the dative suffix is –nak for words like “ablak–window” (resulting in forms like “ablaknak”) and –nek for words like “bíró–judge” (resulting in forms like “bírónek”; Hayes & Londe, 2006). Likewise, in languages with consonant assimilation, consonants must share a feature with other surrounding consonants. For example, in English, “football” might be pronounced as “foopball” because the place of articulation of the [t] at the end of [foot] gets assimilated to the place of articulation of the [b] at the start of “ball”; in contrast, in French, “football” might be pronounced as “foodball” because the voicing feature of the [t] (but not the place feature) gets assimilated to the following [b] (Darcy, Ramus, Christophe, Kinzler, & Dupoux, 2009). Both vowel harmony and assimilation thus introduce sameness relations among phonemes. Listeners use these sameness relations not only in word recognition (Darcy et al., 2009; Mitterer & Blomert, 2003; Suomi, McQueen, & Cutler, 1997) but also as cues to learn new words (Vroomen, Tuomainen, & de Gelder, 1998). Furthermore, sameness relations in the form of vowel harmony often interact with other area of grammar, such as stress assignment or morphology (Rose & Walker, 2011).
Although vowel harmony and assimilation require sameness relations among phonemic features, other phonological processes impose difference relations. Such processes include the obligatory contour principle (OCP; Frisch, Pierrehumbert, & Broe, 2004; McCarthy, 1986). Initially, the OCP was proposed to account for the observation that, in certain tone languages, tones cannot be repeated within words, but it also applies to other phonological phenomena. For example, in Semitic languages like Arabic and Hebrew, the basic meaning of words is given by their consonantal root; roots like /k t b/ are then transformed into surface forms such as “kataba–he wrote” and “kutiba–it was written” (Frisch et al., 2004). The OCP prevents consonantal roots from having repeated consonants, whereas other morphological processes can “create” (rather than prevent) sameness relations among consonants (Frisch et al., 2004; McCarthy, 1986). Such rules might also interact with other areas of grammar (Yip, 1988), and speakers apply them even when presented with novel nonsense words (e.g., Frisch & Zawaydeh, 2001; Berent & Shimron, 1997).
Sameness relations are also important during language acquisition. Reduplications are prominent in child-directed speech across languages (Ferguson, 1964), and children themselves “invent” forms with reduplicated syllables; these reduplicated forms might be important for acquiring multisyllabic words (Schwartz, Leonard, Wilcox, & Folger, 1980) and syllable-final consonants that would otherwise be lost (Fee & Ingram, 1982).
More generally, sameness relations have been critical for defining the computational complexity of phonological rules (Manaster-Ramer, 1986; Culy, 1985), and in developmental psychology, rules based on sameness relations have been the most prominent assay for studying rule learning in human infants (Marcus, Vijayan, Rao, & Vishton, 1999), to the extent that in a recent meta-analysis of “rule learning” in infancy, rule learning was treated as synonymous with the learning of sameness relations (Rabagliati, Ferguson, & Lew-Williams, 2019).
Sameness relations are also important for other forms of language use. Not only are rhymes and alliterations important in poetry (Fabb, 2015), but many language games that spontaneously arise in children also make extensive use of sameness relations in the form of reduplications (Bagemihl, 1995). For example, in the Chinese May-ka language game, syllables are duplicated, and then the vowel of the first duplicate is replaced by “ay” and the consonant of the second duplicate is replaced by “k”; ma (mother) thus becomes may-ka (Bao, 1990; Yip, 1982).
Despite their simplicity, sameness relations thus appear to be a core part of the language faculty.
However, sameness/difference rules are clearly not specific to language. They are crucial for many other aspects of cognition, including motor learning (Brooks, 1986), any comparison of sensory input to predictions or internal state (e.g., novelty detection in the hippocampus; Kumaran & Maguire, 2007) and STM tasks such as delayed match to sample tasks (Cope et al., 2018; Engel & Wang, 2011). Accordingly, grammar-like rules based on sameness/difference relations can be learned in many nonlinguistic domains in humans (Dawson & Gerken, 2009; Endress, Dehaene-Lambertz, & Mehler, 2007; Marcus, Fernandes, & Johnson, 2007; Saffran, Pollak, Seibel, & Shkolnik, 2007) and by many nonhuman animals (Versace, Spierings, Caffini, Ten Cate, & Vallortigara, 2017; Martinho & Kacelnik, 2016; Smirnova, Zorina, Obozova, & Wasserman, 2015; de la Mora & Toro, 2013; Neiworth, 2013; Hauser & Glynn, 2009; Murphy, Mondragón, & Murphy, 2008; Pepperberg, 1987; but see Hupé, 2017; Langbein & Puppe, 2017; van Heijningen, Visser, Zuidema, & ten Cate, 2009), possibly through a specialized sameness detector (Endress, 2013; Endress et al., 2007) that might exist from birth (Gervain, Berent, & Werker, 2012; Gervain, Macagno, Cogoi, Peña, & Mehler, 2008; Antell, Caron, & Myers, 1985). The computations underlying sameness/difference relations thus reflect a core linguistic mechanism whose systems-level implementation might be tractable due to its evolutionary history.
Here, drawing upon recent evidence stressing the importance of disinhibitory circuits (neurons that inhibit other inhibitory neurons) across a variety of taxa and brain regions (Koyama et al., 2016; Goddard, Mysore, Bryant, Huguenard, & Knudsen, 2014; Hangya et al., 2014; Xu et al., 2013; Mysore & Knudsen, 2012; Chevalier & Deniau, 1990), I suggest a simple circuit that acts as a sameness detector. Disinhibition has been observed in a variety of brain areas (Letzkus et al., 2015; Chevalier & Deniau, 1990), and some interneuron populations specifically inhibit other inhibitory interneurons (Hangya et al., 2014; Xu et al., 2013). Critically, some interneuron types receive both local and long-range input; such interneurons have been found to inhibit other inhibitory interneurons in auditory (Pi et al., 2013), visual (Pfeffer, Xue, He, Huang, & Scanziani, 2013), somatosensory (Lee, Kruglikov, Huang, Fishell, & Rudy, 2013), and prefrontal cortex (Pi et al., 2013), from where they can exert spatially remarkably specific disinhibition on other populations (Zhang et al., 2014). Accordingly, Hangya et al. (2014) argued that this disinhibitory circuit might be a cortical circuit motif. Other authors suggested a more local disinhibitory circuit motif with mutual inhibition among inhibitory neurons (Koyama & Pujala, 2018; Koyama et al., 2016; Goddard et al., 2014; Mysore & Knudsen, 2012).
Disinhibitory circuits have been proposed to account for a variety of cognitive phenomena, including attentional selection (Zhang et al., 2014; van Der Velde & de Kamps, 2001), gain control (Fu et al., 2014), sequential discriminations of stimulus strength of stimuli (Miller & Wang, 2006; Machens, Romo, & Brody, 2005; but see Barak, Sussillo, Romo, Tsodyks, & Abbott, 2013), categorization of stimuli (Goddard et al., 2014; Mysore & Knudsen, 2012; Kusunoki, Sigala, Nili, Gaffan, & Duncan, 2010), behavioral response selection (Zhao et al., 2019; Jovanic et al., 2016), associative learning (Letzkus et al., 2011), plasticity (Fu, Kaneko, Tang, Alvarez-Buylla, & Stryker, 2015), and social behavior (Marlin, Mitre, D'amour, Chao, & Froemke, 2015; Owen et al., 2013). Here, I suggest that the same biological mechanisms might provide a circuit-level mechanism for a core grammatical computation based on sameness versus difference computations.
Models of Sameness/Difference Relations
A number of models of how sameness relations might be computed have been proposed in the literature (Cope et al., 2018; Arena et al., 2013; Ludueña & Gros, 2013; Engel & Wang, 2011; J. S. Johnson, Spencer, Luck, & Schöner, 2009; Wen, Ulloa, Husain, Horwitz, & Contreras-Vidal, 2008; Hasselmo & Wyble, 1997; Carpenter & Grossberg, 1987). The underlying principles and assumptions vary substantially across models. Some rely on the fact that repeatedly activated representations suffer some form of neural “fatigue” (Kumaran & Maguire, 2007; Grill-Spector, Henson, & Martin, 2006); others rely on circuitry where the “combined” input from some form of memory and from sensory representations matching (or mismatching) the memory representations must be sufficiently strong (Wen et al., 2008; Hasselmo & Wyble, 1997; Carpenter & Grossberg, 1987) or where the “difference” between input from memory and from sensory representations is the critical variable (Engel & Wang, 2011). Still other models detect reduced levels of inhibition for novel compared with previously encountered items (Cope et al., 2018; J. S. Johnson, Spencer, et al., 2009). I discuss these models in more detail in Supplementary Material 1,1 where I show that they fall short on at least one of two criteria of grammar learning: They either do not generalize to unseen exemplars or they require labeled counterexamples.
To better illustrate the computational principles underlying the current dishibition-based circuit, I will first present a version of the model that can detect sameness relations in sequentially presented stimuli. Following this, I will sketch a version of the model that can detect sameness relations in spatially distributed, simultaneously presented stimuli and finally a model that can detect sameness relations in both simultaneously presented stimuli and sequentially presented stimuli.
Sameness Detection for Sequential Stimuli
Figure 1A shows a possible disinhibition-based architecture of how sameness might be detected for sequentially presented items. (Model equations are given in Appendix A; an R implementation is available online). The model comprises two populations of neurons (hereafter “layers”) that encode features of items (e.g., frequency, color, and so on; in Figure 1, the features are represented as geometric shapes).
The “source layer” receives input; input can be sensory or nonsensory, depending on where this circuit is located in the brain. Units in the “copy layer” receive excitatory one-to-one input from units in the source layer that code for the same feature. However, they also receive feature-specific tonic inhibition from an “inhibition layer” (which might consist of interneurons); tonic inhibition has been observed in a variety of brain regions and might subserve functions such as maintaining an appropriate level of excitability or the suppression of undesirable motor programs (Benjamin, Staras, & Kemenes, 2010; Farrant & Nusser, 2005; Semyanov, Walker, Kullmann, & Silver, 2004).
Because of the inhibition from the inhibition layer to the copy layer, input from the source layer is not propagated to the copy layer with a single stimulation. The critical aspect of this circuit is that each feature in the source layer also “inhibits” the corresponding feature in the inhibition layer, which, in turn, reduces inhibitory input to the copy layer for that feature. A similar phenomenon has been observed in auditory fear conditioning, where inhibition of (inhibitory) parvalbumin-positive interneurons allowed for associations between sounds and aversive stimuli to be formed (Letzkus et al., 2011).
Accordingly, once the inhibitory input to the copy layer ceases, there will be a time window during which the excitatory input from the source layer can drive the corresponding units in the copy layer. As a result, only repeated items will be propagated to the copy layer. Any readout mechanism for the copy layer (e.g., a population of thresholded neurons) could thus act as a sameness detector.2
I simulated this model at various levels of noise; at each noise level, I ran 50 simulations, representing 50 virtual participants. Figure 2 (left) shows that, in the copy layer, activation for repeated features is high, whereas activation for nonrepeated features is low. Repeated items are thus highly discriminable from nonrepeated items. This result is robust to the simulated noise level. A simple disinhibition-based circuit can thus act as a sameness detector that discriminates repeated features from not repeated features.
Although the primary goal of this model is to detect when two temporarily adjacent items are identical, whether or not it can detect the sameness of two objects with intervening material depends on the time constants of the disinhibitory effects. If disinhibition is sufficiently long lasting, the model will also detect the sameness of two nonadjacent items (e.g., of the two As in the sequence ABA). If so, it would predict that the further two items are separated (in terms of the amount of intervening time and/or the number of intervening items, which might or might not have separable effects), the harder it should become to detect the sameness of the two items. At least in infants, it might be harder to detect nonadjacent repetitions compared with adjacent repetitions (S. P. Johnson, Fernandas, et al., 2009; Kovács & Mehler, 2008, 2009).
That being said, the separation of two items is unlikely to be the only determinant of how easy it is to detect whether they are the same. For example, in a longer sequence like ABCDEDFGA, the two As are farther apart than the two Ds. Still, it might be easier to detect the sameness of the two As than that of the two Ds despite their greater distance, because initial and final items are more salient than medial items (Benavides-Varela & Mehler, 2015; Endress, Scholl, & Mehler, 2005). As a result, the representations of initial items are likely stronger than those of medial items and might thus create stronger and longer-lasting disinhibition. However, the goal of the current model is just to show that a simple and ubiquitous mechanism such as disinhibition can serve as the basis of a sameness detector, whereas more detailed predictions require a biophysically more realistic model.
Sameness Detection for Simultaneous Stimuli
In its current stage, the model can detect the sameness of sequentially presented stimuli, but not of spatially distributed, simultaneously presented stimuli, simply because space is not represented. Figure 1B shows a version of the model where items are presented simultaneously rather than sequentially. Again, there is a source layer, a copy layer, and an inhibition layer. The model differs from the sequential model in three critical aspects. First, all layers now represent space. In Figure 1B, the vertical axis represents the features as before, whereas the horizontal axis represents the spatial locations of the items (though space is presumably represented in some topological order in real neuronal populations). This change is necessary so that two simultaneously presented identical objects can be represented.
Second, the connectivity between the source layer and the inhibition layer has been changed. Units in the source layer send (i) inhibitory input to all units in the inhibition layer that code for the same feature across all locations and (ii) excitatory input to all units in the inhibition layer that code for different features; in other words, there is center-surround disinhibition among features. This ensures that, in the copy layer, different-feature input from the source layer stays inhibited, whereas same-feature input is disinhibited.
Third, the sequential model needs to update the activation of the copy layer before that of the inhibition layer; if the inhibition layer were updated first, a single presentation of a feature would be sufficient to produce disinhibition. In contrast, the simultaneous model needs to update the inhibition layer before the copy layer; if the copy layer were updated first, there would be no disinhibition for identical features.
I simulated this architecture using 50 virtual participants. As shown in Figure 2, identical items are highly discriminable from nonidentical items even at high levels of noise. A simple, disinhibition-based circuit can thus detect sameness relations among simultaneously presented identical objects.
A Combined Model of Sameness Detection for Simultaneous and Sequential Stimuli
Although the main differences between the sequential and the simultaneous circuit are simply due to how stimuli are presented (i.e., spatial representations and lateral inhibition among features could be added to the sequential model but are not necessary), the different update orders raise the question of whether a combined model can be developed that detects both sequential and simultaneous sameness relations. Practically speaking, sequential and simultaneous presentation might not be as different as they seem. For example, if observers attend simultaneously presented items one after the other (Liu & Becker, 2013; Vogel, Woodman, & Luck, 2006; but see Mance, Becker, & Liu, 2012), we need a sequential model to account for “simultaneous” sameness detection; conversely, if sequential items are placed in some kind of (short-term) memory before being compared, we need a simultaneous model for sameness detection in sequentially presented items. As such, a combined sequential/simultaneous model might be neither necessary nor desirable.
Be that as it might, such a combined model is shown in Figure 3.
This “combined” sameness detector is similar to the simultaneous sameness detector in that it comprises a source layer, a copy layer, and an inhibition layer and that the copy layer receives excitatory input from the source layer. However, (dis)inhibition is organized differently. The copy layer still receives tonic inhibition from those units in the inhibition layer that code for the same feature and spatial position. Furthermore, each feature of the input layer inhibits the corresponding feature in the inhibition layer across spatial positions (i.e., it disinhibits this feature in the copy layer) and excites all other features.
The critical difference is that disinhibition of features at the same location is delayed. To do so, I removed direct connections between the source layer and the inhibition layer that coded for the same feature at the same location (while keeping the center-surround disinhibition at other locations). Instead, I added a “self-disinhibition layer” where each unit (i) receives excitatory input from the corresponding feature and location in the source layer and (ii) sends inhibitory input to all units coding for the same feature (across locations) in the inhibition layer. (Although these modifications might seem to some extent ad hoc, as mentioned above, it is not clear if a combined sequential/simultaneous model is necessary or desirable in the first place.)
As shown in Figure 4, identical items were highly discriminable from nonidentical items in the simultaneous situation across noise levels; in contrast, in the sequential situation, discriminability suffered as noise increased.
The current results thus show that a simple and biologically realistic circuit can support a core grammatical computation that is used in more than 80% of the world's languages: grammatical rules based on sameness/difference relationships. In this circuit, nonidentical items are filtered out through tonic inhibition as well as center-surround inhibition. In contrast, when identical items are presented sequentially or simultaneously, inhibition is inhibited; this disinhibition of identical items then allows them to be propagated for further processing.
Unlike previous models of sameness detection (Cope et al., 2018; Arena et al., 2013; Ludueña & Gros, 2013; Engel & Wang, 2011; J. S. Johnson, Spencer, et al., 2009; Wen et al., 2008; Hasselmo & Wyble, 1997; Carpenter & Grossberg, 1987; see Supplementary Material 1), the model satisfies critical criteria of grammar acquisition: (1) It generalizes to unseen stimuli and (2) does not require any labeled counterexamples for learning, simply because this circuit architecture does not require any learning at all.
Once such a sameness detector is available, it can be used for building more complex grammatical rules. For example, after exposure to syllable sequences such as dubaba, 7-month-olds notice that the last two syllables are identical and generalize this sameness relation to new items (Marcus et al., 1999). Critically, they do not only have to detect the sameness relation between the last two syllables but also have to associate it with the “correct serial” position (Gervain et al., 2012; Endress et al., 2007). Once a sameness detector is available, it can form associations with representations of sequential positions or other stimuli (Kabdebon & Dehaene-Lambertz, 2019), allowing learners to acquire more complex, composite rules, which is one of the hallmarks of complex cognition (Hauser & Watumull, 2017; Dehaene, Meyniel, Wacongne, Wang, & Pallier, 2015; Corballis, 2014; Fitch & Martins, 2014).
This, in turn, suggests a fundamentally new view on language acquisition. Learners might be equipped with a potentially large number of potentially complex detectors for a variety of rules that act as feature detectors for grammatical rules (Endress, Nespor, et al., 2009). Learning then involves combining these features, potentially through the use of associative mechanisms. This would be consistent with results from formal language theory, where suitable preprocessing (e.g., through feature detectors) can reduce the complexity of the required computational mechanism. For example, a finite state automaton operating on trees can recognize context-free languages (Morgan, 1986), and even humble rules based on sameness relations can be shown to be beyond the reach of even context-free grammars (Manaster-Ramer, 1986; Culy, 1985).
Feature detectors for elementary grammatical rules might thus expand the range of grammars that even simple learning mechanisms (such as associative mechanisms) can learn, which, in turn, might make language acquisition feasible using relatively simple computational machinery.
APPENDIX A: MODEL EQUATIONS
A.1 Sequential Model
The feature f is encoded in the source layer, the inhibition layer, and the copy layer; the corresponding activations, are Sf(t) for a unit encoding feature f in the source layer, If(t) for such a unit in the inhibition layer, and Cf(t) for such a unit in the copy layer. Ef(t) is the external input, N(μ, σ) is a random value drawn from a normal distribution with mean μ and standard deviation σ.
At the end of each update cycle, the activations are curtailed to be between zero and one.
A.2. Simultaneous Model
In the simultaneous model, units represent both features and spatial locations. Sf,l(t) is thus the activation of a unit in the source layer that encodes feature f at location l, If,l(t) is the corresponding activation in the inhibition layer, and Cf,l(t) is the corresponding activation in the copy layer. Ef,l(t) is the external input.
At the end of each update cycle, the activations are curtailed to be between zero and one.
A.3. Combined Model
The combined sequential/simultaneous model is similar to the simultaneous model in that it comprises a source layer, a copy layer, and an inhibition layer and that the copy layer receives excitatory input from the source layer as well as tonic inhibition from those units in the inhibition layer that code for the same feature and spatial position. Furthermore, each feature of the input layer inhibits the corresponding feature in the inhibition layer across spatial positions and excites all other features. The critical difference between the simultaneous and the combined model is that there are no connections between the source layer and the inhibition layer that code for the same feature “at the same location” (whereas disinhibition occurs for other locations) and that same-location disinhibition of features proceeds through a “self-disinhibition layer” where each unit (1) receives excitatory input from the corresponding feature and location in the source layer and (2) sends inhibitory input to all units coding for the same feature (across locations) in the inhibition layer.
The symbols for the activation in the source, inhibition, and copy layers are the same as in the simultaneous model; activation in the self-disinhibition layer for a unit coding for feature f at location l is designated as Df,l(t) and is initialized using random values around zero.
At the end of each update cycle, the activations are curtailed to be between zero and one.
Reprint request should be sent to Ansgar D. Endress, Department of Psychology, City, University of London, Northampton Square, London EC1V 0HB, United Kingdom, or via e-mail: email@example.com.
Although I model disinhibition across different neural populations, the same computational principles could be implemented using reciprocal inhibition among inhibitory neurons as in earlier models of stimulus selection and categorization (Koyama & Pujala, 2018; Koyama et al., 2016; Goddard et al., 2014; Mysore & Knudsen, 2012). To do so, one would simply replace the inhibitory connections from the source layer to the inhibition layer with inhibition in the source layer that is itself subject to lateral inhibition.