The role of the cerebellum in speech perception remains a mystery. Given its uniform architecture, we tested the hypothesis that it implements a domain-general predictive mechanism whose role in speech is determined by connectivity. We collated all neuroimaging studies reporting cerebellar activity in the Neurosynth database (n = 8206). From this set, we found all studies involving passive speech and sound perception (n = 72, 64% speech, 12.5% sounds, 12.5% music, and 11% tones) and speech production and articulation (n = 175). Standard and coactivation neuroimaging meta-analyses were used to compare cerebellar and associated cortical activations between passive perception and production. We found distinct regions of perception- and production-related activity in the cerebellum and regions of perception–production overlap. Each of these regions had distinct patterns of cortico-cerebellar connectivity. To test for domain-generality versus specificity, we identified all psychological and task-related terms in the Neurosynth database that predicted activity in cerebellar regions associated with passive perception and production. Regions in the cerebellum activated by speech perception were associated with domain-general terms related to prediction. One hallmark of predictive processing is metabolic savings (i.e., decreases in neural activity when events are predicted). To test the hypothesis that the cerebellum plays a predictive role in speech perception, we examined cortical activation between studies reporting cerebellar activation and those without cerebellar activation during speech perception. When the cerebellum was active during speech perception, there was far less cortical activation than when it was inactive. The results suggest that the cerebellum implements a domain-general mechanism related to prediction during speech perception.
The cerebellum is a remarkable structure, having about 80% of the neurons in the brain but only around 10% of its mass (Herculano-Houzel, Catania, Manger, & Kaas, 2015). Compared to other primates, it is significantly larger in humans relative to the size of the neocortex. This expansion may reflect the need for complex motor programs associated with tool use and speech, hallmarks of human evolution (Barton & Venditti, 2017; MacLeod, Zilles, Schleicher, Rilling, & Gibson, 2003). Indeed, the lateral cerebellum scales up with mammals that learn vocally, including elephants, humans, seals, dolphins, and whales (Smaers, Turner, Gómez-Robles, & Sherwood, 2018). Consistent with more expansion in humans and its role in vocal learning, the cerebellum plays an important role in speech production (Ackermann, Mathiak, & Riecker, 2007). This is also consistent with the view since the early 1800s that the cerebellum is primarily a motor structure, considered the organ of sexuality by Gall, with Rolando and Flourens providing the first evidence for its more general role in motor function (Glickstein, Strata, & Voogd, 2009; Macklis & Macklis, 1992). Over the last 50 years, it has also become apparent that the cerebellum plays some role in “nonmotor” language domains. Activity in the cerebellum is observed during lower-level auditory functions, like speech timing and phonology, and higher-level tasks involving semantics, grammar, and comprehension (Mariën & Borgatti, 2018; Ackermann & Brendel, 2016; Mariën & Manto, 2015). However, what the cerebellum contributes to these tasks remains, to quote one “consensus” article, “an ongoing enigma” (Mariën et al., 2014). Here, we address questions about the role of the cerebellum during speech perception.
Although the evidence has remained elusive, some theories claim that the function of the cerebellum is domain-general (Diedrichsen, King, Hernandez-Castillo, Sereno, & Ivry, 2019). Thus, the contribution (or computation) contributed by the cerebellum to vocal learning and speech production would be similar to that contributed to speech perception. This idea is captured by the Universal Cerebellar Transform (UCT) theory (Schmahmann, 2019; Schmahmann, Guell, Stoodley, & Halko, 2019), which maintains that, because the cerebellum has a relatively homogenous architecture, with repeating corticonuclear micro complexes (Eccles, Ito, & Szentágothai, 1967), it performs a “consistent” computation. Any differences in what this computation contributes to would be determined by variations in cerebellar location and corresponding cortico-cerebellar connectivity.
There have been a number of proposals as to the nature of the domain-general computation in the cerebellum. One suggestion from the motor control literature is that it implements internal models specifically and prediction more generally (Siman-Tov et al., 2019; Popa & Ebner, 2018; Taylor & Ivry, 2014; Ito, 2008; Wolpert, Miall, & Kawato, 1998). Internal models are neural representations of an organism's interactions with the world. These can be used to predict the sensory consequences of movements and maintain movement accuracy. When differences between predicted and actual sensory states occur, internal models are updated such that, on subsequent movements, the discrepancy between predicted and actual sensory feedback is reduced (Lametti, Smith, Watkins, & Shiller, 2018; Wolpert, Diedrichsen, & Flanagan, 2011; Shadmehr, Smith, & Krakauer, 2010). In evidence that the cerebellum plays a key role in this process, participants with cerebellar disruption, because of stroke or brain stimulation, show slow or altered learning in a wide-range of tasks including movement adaptation in response to visual alterations of the limbs (Morton & Bastian, 2004; Martin, Keating, Goodkin, Bastian, & Thach, 1996; Baizer & Glickstein, 1974) and following physical perturbations of movement (Gibo, Criscimagna-Hemminger, Okamura, & Bastian, 2013; Rabe et al., 2009; Smith & Shadmehr, 2005).
Similarly, the cerebellum seems to play a key role in the predictive processing required for the maintenance of accurate speech production. Patients with cerebellar damage frequently present with a range of speech production deficits (Ackermann et al., 2007). They also exhibit impaired feedforward control of speech movements (i.e., an impaired ability to update internal models; Parrell, Agnew, Nagarajan, Houde, & Ivry, 2017). In healthy participants, altering the cerebellum with noninvasive brain stimulation has been shown to alter sensorimotor adaptation during speech production (Lametti, Smith, Freidin, & Watkins, 2017). A similar result was observed during sensorimotor adaptation associated with limb movements (Galea, Vazquez, Pasricha, de Xivry, & Celnik, 2011; Jayaram, Galea, Bastian, & Celnik, 2011), although the impact of cerebellar transcranial direct current stimulation on sensorimotor adaptation can be inconsistent (Jalali, Miall, & Galea, 2017). Linking the aforementioned limb and speech adaptation literature, a recent meta-analysis found that sensory feedback manipulations resulting in adaptation were more likely to be associated with cerebellar activity (Johnson, Belyk, Schwartze, Pinheiro, & Kotz, 2019).
Thus, according to UCT-like theories, the contribution of the cerebellum to motor control should extend to nonmotor domains. Does the cerebellum contribute predictions to speech perception (Moberget & Ivry, 2016)? Despite much theorizing (Argyropoulos, 2016; Hertrich, Mathiak, & Ackermann, 2016; Moberget & Ivry, 2016; Schwartze & Kotz, 2016; Mariën & Manto, 2015; Mariën et al., 2014; Ackermann, 2008; Ackermann et al., 2007; Callan, Kawato, Parsons, & Turner, 2007), that question is difficult to answer because the associated neurobiological research is sparse. Pubmed (queried March 2021) lists six articles with “cerebellum” or “cerebellar” and “speech perception” in the title, half of which are review articles (included in the references in the prior sentence). And yet, there are thousands of neuroimaging studies involving speech perception that report cerebellar activity (as we later show).
Explicit evidence for domain-generality of the cerebellum for speech perception does not exist (because speech perception studies by definition only study speech). Nonetheless, a number of task battery and meta-analyses studies more or less address this topic. In two studies, participants did task batteries, some of which were language related (King, Hernandez-Castillo, Poldrack, Ivry, & Diedrichsen, 2019; Guell, Gabrieli, & Schmahmann, 2018). The conclusions from one of these is that, despite overlap between language and social cognition tasks, the cerebellum represents cognitive functions in a domain-specific manner because of the spatially modular appearance of these different functions (Guell, Gabrieli, et al., 2018). Although the other task battery study does not explicitly make this claim, it also shows specific functions mapped to specific cerebellar regions, suggesting specificity (Diedrichsen et al., 2019; King et al., 2019). Five meta-analyses support the idea that the cerebellum plays specific roles in audition, speech perception, and language comprehension (Riedel et al., 2015; Balsters, Laird, Fox, & Eickhoff, 2014; Keren-Happuch, Chen, Ho, & Desmond, 2014; Stoodley & Schmahmann, 2009; Petacchi, Laird, Fox, & Bower, 2005). Two of these use a large number of studies to profile clusters of cerebellar ROIs in terms of associated behavioral domains (Riedel et al., 2015; Balsters et al., 2014). Although some of these were associated with speech and language, they were also associated with a range of other domains and subdomains. Although domain-generality is not discussed, the breadth of tasks related to specific regions of cerebellar activity seems at odds with a domain-specific account suggested by task battery studies.
Prediction likely plays an important role in speech perception. Because of differences in vocal tract lengths, accents, and speaking contexts, the acoustics of identical phonemes can vary considerably. To help solve this “lack of invariance” problem, it has long been noted that the brain uses visual and linguistic information to predict forthcoming auditory information, constraining the interpretation of acoustic signals (Skipper, 2014; Sjerps, Mitterer, & McQueen, 2011; Skipper, van Wassenhove, Nusbaum, & Small, 2007; Holt & Lotto, 2002; Ganong, 1980; McGurk & MacDonald, 1976; Ladefoged & Broadbent, 1957). This predictive process involves a wide array of “motor” regions also involved in speech production, although most of this work has only examined cortical motor regions (Skipper, Devlin, & Lametti, 2017; Skipper, 2015; Skipper, Nusbaum, & Small, 2005).
The majority of cerebellar studies pertaining to prediction have focused on “higher level” linguistic prediction (e.g., involving word meaning; Sheu, Liang, & Desmond, 2019; Pleger & Timmann, 2018; D'Mello, Turkeltaub, & Stoodley, 2017; Lesage, Hansen, & Miall, 2017; Argyropoulos, 2016; Moberget, Gullesen, Andersson, Ivry, & Endestad, 2014; Lesage, Morgan, Olson, Meyer, & Miall, 2012). Only a small number of studies have specifically investigated the role of the cerebellum in speech perception, regardless of its role in prediction. Some of these studies show that the cerebellum seems to contribute timing signals to the unfolding process of speech perception. Patients with cerebellar damage show impairments in the perception of speech sound contrasts distinguished by the purely durational cue occlusion time (e.g., the German words “boten” vs. “boden”; Ackermann, Gräber, Hertrich, & Daum, 1997). These patients do not show impairments in the identification of speech sound contrasts that can be distinguished by both durational and nondurational cues such as voice onset time (Ivry & Gopal, 1993; Repp, 1979). These two findings were later explored with fMRI, and the right cerebellum was linked to the durational characteristics of speech sounds (Mathiak, Hertrich, Grodd, & Ackermann, 2002). More generally, timing signals are clearly required for prediction (Kotz & Schwartze, 2010). Converging lesion, neuroimaging, and brain stimulation results link the cerebellum to both timing and predictive processing in speech perception (Lametti et al., 2016; Moberget & Ivry, 2016; Schwartze, Keller, & Kotz, 2016; Schwartze & Kotz, 2013, 2016; Guediche, Holt, Laurent, Lim, & Fiez, 2015; Kotz, Stockert, & Schwartze, 2014; Knolle, Schröger, Baess, & Kotz, 2012).
Existing cerebellar speech perception studies tend to involve tasks that lack ecological validity and involve motor responses, limiting claims about domain-generality and prediction. In terms of validity, the tasks used in most studies are not particularly representative of natural speech perception. For example, in one task battery study, “language” cerebellar regions are defined by activity associated with listening to and answering questions about Aesop's Fables subtracted from activity associated with reading math problems and selecting the correct answer from two alternatives (Guell, Gabrieli, et al., 2018). In another task battery study, language processing seems to be defined using tasks like verbal working memory with letters, verb generation versus reading, and/or a two-alternative forced choice semantic task following sequential reading of five words. Similarly, the tasks used in the five previously mentioned meta-analyses mostly involved single word generation, repetition, reading, or making semantic decisions (Keren-Happuch et al., 2014; Stoodley & Schmahmann, 2009; Petacchi et al., 2005).
Perhaps more problematic, most tasks used in these studies required participants to either read, leading to subvocal speech production, or make a metalinguistic judgment as indicated by a button response. Although studies often include task subtractions meant to control for motor engagement, it cannot be reasonably demonstrated that this was achieved (Friston et al., 1996; Poeppel, 1996). Similarly, studies having motor responses on discarded trials could still cause motor activation associated with participants' expectations that they will need to make responses. This is an important oversight given the historical perspective that the cerebellum is predominantly a motor structure.
Theory and a small amount of empirical work suggest that the cerebellum might make a domain-general predictive contribution to speech perception similar to the contribution it makes to speech production. There are, however, no studies that directly address both domain-generality and predictive processing. Task battery and meta-analyses studies suggest mixed results about domain-generality, whereas studies of prediction during speech perception can likely be counted on one hand and do not address domain-generality. Furthermore, there are few studies of speech perception and the cerebellum that examine natural speech perception, that is, speech perception in the complete absence of movement. To begin to address these gaps in the literature, we performed cerebellar meta-analyses, coactivation-based meta-analyses, and text-based functional profile analyses. Critically, we used a large number of studies that involve only “passive” speech perception without an overt motor response on any trial. As a minority of studies, we included passive perception studies involving tone and nonspeech sound stimuli (e.g., instrumental music) because many languages are tonal (Yip, 2002) and nonspeech sounds activate cortical areas associated with speech perception (Peretz, Vuvan, Lagrois, & Armony, 2015). We compared these passive perception studies to studies involving speech production and articulation (Figure 1 presents a schematic overview of this work).
Our first hypothesis was that the cerebellum plays a domain-general role in speech perception—that is, it makes a contribution to speech perception that is not inherently speech specific. Rather, any speech specificity partly derives from connectivity patterns that give the cerebellum a modular topological appearance. To test this, we examined regions of activity in the cerebellum related to speech perception and production. Speech perception and production are both sensorimotor processes that share subprocesses, but they are also distinct in important ways (e.g., production involves overt articulation). Thus, we anticipated a mix of overlapping and distinct activity patterns in the cerebellum reflecting the shared and unique components of perception and production. Using coactivation meta-analysis, we predicted that networks originating from speech perception, production, and overlapping regions would have different cortical connectivity. To test for domain-generality, we analyzed task-related terms mined from the abstracts of thousands of neuroimaging studies to see which of these predicted activity in cerebellar regions associated with speech perception, production, or their overlap. We expected that these regions would also be associated with a wide range of other tasks that are not speech or domain-specific.
Our second hypothesis was that the domain-general role played by the cerebellum and its connections during speech perception is related to prediction. To assess this, we tested a fundamental tenet of predictive models that prediction results in metabolic savings because the brain has to do less processing when predictions are accurate and, correspondingly, more processing for unexpected acoustic information (Moberget et al., 2014; Skipper, 2014). We compared speech-perception-related whole-brain activity between studies reporting cerebellar activity to whole-brain activity in studies without reported cerebellar activity. We hypothesized that, if the cerebellum is involved in prediction during natural speech perception, there should be a greater amount of activity throughout the brain when the cerebellum is not active during this task.
Figure 1A outlines the article selection steps. First, we created a maximum probability mask of the cerebellum using a probabilistic cerebellar atlas (Diedrichsen, Balsters, Flavell, Cussans, & Ramnani, 2009). The latter was created by averaging the cerebellar lobule masks from 20 participants, aligned to the MNI152 template by nonlinear registration (Diedrichsen et al., 2009). We then found all of the published articles that had activity somewhere in this mask and that appeared in the Neurosynth database (Version 0.7, released July, 2018; https://github.com/neurosynth/neurosynth-data; Yarkoni, Poldrack, Nichols, Van Essen, & Wager, 2011). This version contains 507,891 activation peaks or centers of mass from 14,371 studies with over 3200 term-based features. The intersection of the cerebellum mask and database resulted in 8206 articles (57% of the Neurosynth database).
Next, we searched Pubmed (October 2018) for all articles that matched a set of 20 search terms (acoustic, audiovisual, audition, articulate, hear, listen, music, naming, phonetic, phonological, speak, speech, speech perception, speech production, talk, tones, orofacial, pitch, vocal, and voice) and their variants (e.g., articulate, articulators, articulation, articulatory) and a set of eight Medical Subject Headings terms, a controlled vocabulary used by Pubmed for indexing life science articles (auditory perception, language, verbal behavior, hearing, hearing tests, speech, speech acoustics, speech production measurements). This search returned 1,002,940 articles. We then found the intersection of these articles and the 8206 articles in the Neurosynth database reporting cerebellar activity. This resulted in 2168 articles with cerebellum activity that potentially involved speech, language, and/or articulation (15% of the Neurosynth database).
We then went through the abstract and methods of these 2168 articles by hand to determine if they included a 1) natural speech perception task (i.e., passive speech/sound/music perception that simply involved listening without another explicit task) or 2) a speech production task (i.e., speaking overtly/covertly or moving the articulators). We required that a number of criteria be met for studies to be included. In particular, studies that focused on reading, used patient populations, tested participants younger than 18 years, or focused on resting-state analyses were excluded. Critically, perception studies that involved any motor response no matter how minor (e.g., a button press on 5% of trials to maintain alertness) were not included. Studies that involved the passive perception of tones and nonspeech sounds (e.g., instrumental music) were included in the analysis as a minority of studies. This decision was made for the following reasons: By some estimates, 60–70% of the world's languages are tonal (Yip, 2002), tones can be produced by the human vocal tract, and they are (arguably) similar to phonemes. Converging evidence from fMRI and direct neural recordings suggests that there's overlap in cortical activity patterns associated with speech and music listening (Peretz et al., 2015). There's also behavioral evidence that music and language processing draw on a shared resource (Kunert & Slevc, 2015). More generally, the basic units of speech are unknown, and it is unclear when sound perception changes to speech perception (Skipper et al., 2017; Bybee & McClelland, 2005; Goldinger & Azuma, 2003; Lotto & Holt, 2000). Of the original 2168 articles, 72 (3.32% or 0.50% of the full Neurosynth database; n = 1321 participants) were natural speech/sound perception studies (64% speech, 12.5% sounds, 12.5% instrumental music and 11% tones) and 175 (8.07% or 1.22%; n = 3787 participants) involved speech production or articulation.
We used Neurosynth to conduct meta-analysis on our sample of natural speech perception and production studies that activate the cerebellum. Neurosynth is a database and tool for performing term-based meta-analysis (Yarkoni et al., 2011). As designed, it uses a form of kernel density analysis to compare activations reported in studies that frequently use selected psychological terms (e.g., “language,” “working memory”) to activations reported in studies in the rest of the database that do not use these terms. Instead of using Neurosynth to perform a term-based meta-analysis, we simply provided it with the articles found to involve natural speech perception or speech production. Neurosynth compared activations reported in the provided studies to activations reported in the rest of the Neurosynth database. The resulting cerebellum activity maps reflect activations that occur more consistently in our two samples as compared to other studies. We examined baseline contrasts and overlaps. For baseline contrasts, we used a false discovery rate (FDR)-corrected threshold of q < .01 across the whole brain. We examined speech perception and production overlaps at the same individual FDR-corrected thresholds. For added protection, we also required that cluster sizes be greater than 10 voxels. Results are displayed on a cerebellar flatmap using Version 3.4 of the SUIT MATLAB toolbox (Diedrichsen & Zotow, 2015).
Cerebellar Coactivation Meta-Analysis
We next did a meta-analytic coactivation analysis from the regions unique to speech perception, production, and their overlap across all 14,371 neuroimaging studies in the Neurosynth database (Figure 1B). This analysis assumes that if a cerebellar region frequently coactivates with other brain regions across many studies and statistical contrasts, then that region can be considered to be part of a network with the coactive brain regions. The principle here was to perform a formal contrast between studies that activate each of the three sets of regions as compared to studies that tend to activate the other sets of regions. The resulting statistical maps identify voxels throughout the brain that have a greater probability of coactivating with the identified regions. A two-way chi-square (χ2) test was used to calculate p values for each voxel between the sets of studies. The resulting images were again thresholded using an FDR of q < .01. We again required that cluster sizes be greater than 10 voxels. This analysis and functional profile analyses (discussed in the next section) were based on de la Vega, Yarkoni, Wager, and Banich (2018) and de la Vega, Chang, Banich, Wager, and Yarkoni (2016; https://github.com/adelavega/neurosynth-mfc/ and https://github.com/adelavega/neurosynth-lfc).
To provide descriptive functional labels of the resulting coactivation patterns, we calculated the Pearson correlation of each vectorized coactivation map with meta-analyses available in the Neurosynth database. This resulted in r values that reflect the spatial similarity between each coactivation map and other large-scale meta-analysis (see Table 1).
Cerebellar Functional Profile Analyses
To test for domain-specificity or generality, we next generated functional profiles of the activity patterns in each of the speech perception, production, and overlap regions (Figure 1C). This was done by determining which of the terms in the Neurosynth database (which are mined from the text of the 14,371 abstracts) best predicted activity in each of the three sets of cerebellum regions. Specifically, this analysis determines whether a classifier could predict if a study activated specific perception, production, or overlap regions in the cerebellum given the terms mentioned in the study's abstract.
A naive Bayes classifier was trained to discriminate three sets of high-frequency terms associated with activation in each set of regions versus a set of studies that did not produce activation in those regions. Fourfold cross-validation was used for testing, and the mean score was calculated across all folds as a summary measure of performance. Models were scored using accuracy, or the fraction of samples correctly predicted. The log odds ratio (LOR), the probability that a term is present in active versus inactive studies, from the naive Bayes models from each set of regions was used to generate the functional profiles. The LOR indicates whether a term is predictive of activation in a given cerebellar set of regions. We output the terms with LORs that predicted activation in each of the sets of cerebellar regions at an uncorrected statistical threshold of p < .05. To conduct functional profile analyses, we went through the terms by hand and labeled each as either anatomical, fMRI, or task related. The anatomical label included any term related to brain anatomy (e.g., “cerebellum”); the fMRI label included any terms related to the fMRI signal (e.g., “bold signal”), stimuli (e.g., “video”), and methods (e.g., “contrasted”); and the task label was given to any task-related terms (e.g., “finger tapping”) and their associated functions (e.g., “speech production”).
We then created four further groups of terms. First, to validate the term-based approach, we labeled terms as confirmatory if they were specifically related to the cerebellum, natural/passive speech perception, or speech production. Second, to determine whether our perception, production, and overlap regions were actually speech specific, we labeled each term as to whether it was remotely speech related or whether it had no obvious relationship to speech. Third, to more generally examine the domain-specificity of regions, we labeled each term with the four gross psychological domains: perceptual, motor, cognitive, and social/emotional. Anything that did not fit into these categories was labeled as nonspecific. Finally, to provide some insight as to what general functional role the cerebellum may play in speech processing, we created a general category for terms associated with task demands (“expertise”) and mechanisms (“prediction”).
No Cerebellum Meta-Analysis
To test the prediction hypothesis, we did a second round of article selection (Figure 1D). Specifically, we repeated the article selection steps outlined above for cerebellum articles but in the n = 6165 articles in the Neurosynth database that do not report activation in the cerebellar mask. The intersection of these articles and the 1,002,940 articles from our original Pubmed search resulted in a sample of 1547 articles about speech, language, and/or articulation. We went through the abstract and methods of these articles by hand to find those involving natural speech perception (i.e., speech/sound/music perception in the complete absence of movement). Studies that explicitly stated that they did not scan the cerebellum were eliminated. This search resulted in 92 (5.94% or 0.64% of the whole Neurosynth database; n = 2026 participants) natural speech/sound perception studies that did not report cerebellum activation (64% speech, 18.5% tones, 11% sounds, and 6.5% instrumental music). We did a meta-analysis of speech without a cerebellar activity, using an FDR-corrected threshold of q < .01 across the whole brain and a minimum cluster size of 10 voxels. We examined how this meta-analysis differed from speech perception when the cerebellum was active by overlapping the speech perception with and without cerebellum activity meta-analyses. We also examined how these compared to speech production (regardless of whether the cerebellum was active or not) across the whole brain.
The aim of the study was to test the hypothesis that the cerebellum has a domain-general organization whose primary function is related to prediction, with cortico-cerebellar connectivity determining what this computation is applied to during speech perception. All articles in the Neurosynth database reporting cerebellar activity were identified (n = 8206), and from this sample, studies involving natural (i.e., completely motor free “passive”) speech perception (n = 72) and speech production (n = 175) were found and used in cerebellar meta-analyses (Figure 1A).
Figure 2A shows significant cerebellar activation during natural speech perception (red) and speech production (blue). Regions activated by both perception and production are in yellow. Speech activity is distributed through much of the cerebellum in a manner that does not correspond to cerebellar lobules. Activity patterns associated with speech perception and production were nearby to each other and showed abrupt transitions. This pattern was not an artifact of statistically contrasting speech perception and production. Figure 2B demonstrates that this arrangement, that is, nearby perception and production regions, with small regions of overlap, remains when using less conservative corrections for multiple comparisons.
To determine if our results were skewed by the inclusion of studies that used sound stimuli not (or only partially) producible by the human vocal tract, we reran the cerebellar meta-analyses excluding these studies. Specifically, we included only studies with speech or tones and excluding those with nonvocal music or sound stimuli. This resulted in 54 studies (75% of the original studies). The resulting spatial correlation between the image in Figure 2A and the new results was r = .99. One difference was that, at our FDR-corrected threshold of .01 and a cluster size of 10 voxels, only the perception and production overlap in VIIb survives correction. However, at an FDR-corrected threshold of .05 and a cluster size of 10 voxels, the overlapping VI voxels also survive. Similarly, when studies with tones are removed leaving 46 studies (64% of the original studies), the spatial correlation is r = .98 and the overlapping regions survive but only at a reduced corrected threshold.
Cerebellar Coactivation Meta-Analysis
We identified brain regions that significantly coactivate with perception, production, and regions of perception–production overlap in the cerebellum across thousands of studies. Cortically, perceptual cerebellar regions had greater coactivation, predominantly with the bilateral middle and anterior temporal cortex, angular gyrus, and inferior frontal gyrus (Figure 3, red, “Perception Network”). Perceptual coactivation also included the caudate bilaterally and the left thalamus. Production regions coactivate more with the precentral and postcentral sulcus and gyrus, the insula, as well as superior parietal regions (Figure 3, blue, “Production Network”). Medially, production coactivation regions also included the superior frontal gyrus and, subcortically, the putamen and thalamus bilaterally. Finally, regions in the cerebellum activated by both perception and production coactivate with the central sulcus, inferior parietal cortex, the transverse temporal gyrus and sulcus, and nearby superior temporal regions (Figure 3, yellow, “Overlap Network”). Medially, overlap coactivation regions also included the superior frontal gyrus and, subcortically, the caudate on the left and putamen and thalamus bilaterally. These subcortical regions were in different locations than clusters in the same structures associated with perceptual and production coactivation.
The perception, production, and overlap networks are similar to prior meta-analyses associated with language and semantic processing, motor planning/sequencing, and sensorimotor control, respectively. However, they were also similar to meta-analyses associated with tasks that seemingly have little or nothing to do with speech. For instance, the “Perception Network” was correlated with “theory of mind” meta-analyses and the “Overlap Network” with finger tapping (Table 1). This suggests that, although cerebellar regions were identified using studies involving only sounds and speech (without any motor response), the cerebellar networks originating from those regions are not necessarily specific to speech or language.
|Perception .||Production .||Overlap .|
|Meta-analysis .||Correlation .||Meta-analysis .||Correlation .||Meta-analysis .||Correlation .|
|mind tom||0.26||premotor cortex||0.38||coordination||0.23|
|Perception .||Production .||Overlap .|
|Meta-analysis .||Correlation .||Meta-analysis .||Correlation .||Meta-analysis .||Correlation .|
|mind tom||0.26||premotor cortex||0.38||coordination||0.23|
The perception, production, and overlap networks in Figure 3 were individually correlated with whole brain term based meta-analyses. The table contains the top 20 correlations, showing that these networks are neither speech or domain-specific. Given the number of voxels in each correlation, these are roughly Bonferroni corrected at a p value of .01/(20 × 3) < .0001.
Cerebellar Functional Profile Analyses
Functional profiles of perception, production, and overlap regions in the cerebellum were probed to assess domain-specificity or generality. We found all terms in the Neurosynth database that predicted activity in each set of regions at an uncorrected statistical threshold of p < .05. A “confirmatory” qualitative analysis revealed that all three sets of perception, production, and overlap regions were associated with many cerebellar anatomical and speech-related terms. Speech perception regions were uniquely associated with “listened,” “listening,” “passive,” and “passively.” They were also associated with speech and language terms and corresponding anatomical regions (e.g., “comprehension,” “semantic,” “temporal”). The speech production and overlap regions (as compared to the speech perception regions) were uniquely associated with “active,” “overt,” and “speech production.” Speech production regions were associated with production terms and corresponding motor regions (e.g., “articulatory,” “motor cortex”). Activity in overlap regions was associated with terms that tended to be more sensorimotor in nature (e.g., “sensorimotor cortex”), but were not as high level as terms that predicted activity in speech perception regions (e.g., “phonological”). Taken together, these results support the validity of the term-based approach.
We next filtered out all anatomical and fMRI terms, leaving 168 task-related terms. Figure 4 shows the top 10 task-related terms (ranked by their standardized LOR) that predicted activity in cerebellar regions associated with speech perception, production, and the overlap of the two. Despite the qualitative differences noted above that conform to task differences, a wide range of tasks were associated with activity in each set of regions and there were also similarities in terms between the sets (e.g., “motor” was the top term associated with each). To quantify this, we calculated the percentage of speech and nonspeech terms and the percentage of terms in the perceptual, cognitive, social/emotional, and motor domains associated with the speech perception, production, and overlap regions (Table 2). Only 23.81% of terms that predicted activity in the three sets of regions were speech specific, even though the regions were identified using a large number of studies that involved only speech and sounds. Speech-related terms were roughly equally distributed across perception (5.36%), production (10.71%), and overlap regions (7.74%; χ2 = 3.05, p > .05).
|Regions .||Speech .||Domains .|
|Perceptual .||Cognitive .||Emotional .||Motor .||Unspecified .|
|Regions .||Speech .||Domains .|
|Perceptual .||Cognitive .||Emotional .||Motor .||Unspecified .|
Psychological domains. All 168 task-related terms associated with each set of regions in the cerebellum were categorized as being speech related or not or into four gross psychological domains, counted and presented as percentages showing that the three sets of cerebellar regions are not speech or domain-specific.
Including all terms (speech and not speech), we examined whether the terms predictive of cerebellar activity in speech perception, production, and overlap regions were equally distributed across our four psychological domains: perceptual, cognitive, social/emotional, and motor. The distribution of terms was uniform in the case of the speech perception regions (χ2 = 2.48, p > .05), but not uniform in the case of the speech production (χ2 = 61.57, p < .001) and overlap regions (χ2 = 20.15, p < .01). In explanation, more than half of the terms associated with the speech production regions were from the motor category (although not speech/articulatory specific) and no terms from the social/emotional category were associated with either the speech production or overlap regions. About a quarter of all terms (24.40%) did not fall into a psychological domain and were classified as nonspecific. These terms were associated with each set of regions, but the distribution was not uniform (χ2 = 22.56, p < .001); 4 times as many of these terms were associated with production regions in the cerebellum. Taken together, the results suggest that speech perception, production, and overlap regions in the cerebellum are associated with a range of perceptual, cognitive, and motor tasks well beyond the domains of speech and language.
Finally, about 31.55% of the terms could be labeled as either associated with demands or mechanisms. About 71.70% of these came from terms that could not be labeled with the perceptual, cognitive, social/emotional, or motor categories. Demand-related terms were associated with increasing task difficulty and/or expertise (e.g., “faster”). Mechanism-related terms included “predictions” (speech perception and production regions), “sequence” and “sequential” (speech-production-related regions) and “coordination” and “timing” (speech production and overlap regions). These task-independent terms are consistent with prior accounts that link cerebellar functioning to predictive processing.
No Cerebellum Meta-Analysis
Functional profiles demonstrate that regions in the cerebellum associated with speech perception, speech production, and their overlap are also associated with a wide range of tasks well outside of the domain of speech, language, and vocal motor control. This result supports a domain-general view of cerebellar processing. A domain-general process often attributed to the cerebellum is prediction or expectancy/timing signals that are an aspect of prediction. A hallmark of predictive processing is metabolic savings (i.e., decreases in activity when events are predicted). To test whether there is a decrease in activity when the cerebellum is active, we identified studies in the Neurosynth database involving natural speech perception that did not report activity in the cerebellum (n = 92; Figure 1D). These studies were used in a second meta-analysis that compared whole-brain speech-perception-related activity when the cerebellum is active versus not active.
As shown in Figure 5, during speech perception, there are striking differences in brain activity as a function of whether the cerebellum is active or not. Specifically, when the cerebellum is active (in red and yellow), cortical activity related to speech perception is primarily located in the superior temporal plane, posterior inferior frontal gyrus, and posterior aspect of the superior frontal gyrus. When the cerebellum is not active during speech perception (blue and yellow), there is 1.68 times more brain activity overall that is distributed over a much larger area of the brain. This activity encompasses the same regions as when the cerebellum is active and, additionally, more posterior aspects of the superior and middle temporal gyrus and sulcus, inferior parietal lobule, postcentral gyrus and sulcus, precentral gyrus and sulcus, the inferior frontal gyrus, and thalamus. These additional regions are partially captured by a meta-analysis of speech production (shown in white outline). This result is consistent with the idea that the cerebellum plays a role in prediction during passive speech perception.
We tested the hypothesis that the cerebellum implements a domain-general, predictive mechanism that is deployed during speech perception as a function of connectivity. We identified studies from a large neuroimaging database reporting cerebellar activity during naturalistic (i.e., “passive”) speech perception without any motor response and compared these to speech production studies. We also found studies involving natural speech perception that did not report cerebellar activity. We used these in neuroimaging meta-analyses and coactivation meta-analyses and term-based cerebellar functional profile analyses (Figure 1). We observed multiple regions of activity throughout the cerebellum related to both speech perception and production that were largely separate, but with some overlap (Figure 2). These regions had unique tterns of functional connectivity (Figure 3, red, blue, and yellow). Across thousands of studies, the functional profiles of these networks (Table 1) and their seed cerebellar regions (Table 2) were not speech or domain-specific. Regions of the cerebellum activated by speech perception studies were also associated with mechanistic terms like “timing” and “prediction.” Finally, when the cerebellum was inactive, there was more cortical activity than when it was active (Figure 5). Here, we review these results in relation to the literature on cerebellar topology and use this to discuss how the results support a domain-general, predictive account of the cerebellum in speech perception.
Consistent with other studies using a range of tasks (Diedrichsen et al., 2019; King et al., 2019; Guell, Gabrieli, et al., 2018; Guell, Schmahmann, Gabrieli, & Ghosh, 2018; Stoodley, Valera, & Schmahmann, 2012; Buckner, Krienen, Castellanos, Diaz, & Yeo, 2011), the pattern of activity observed for speech traversed lobular boundaries (Figure 2). Although activity patterns do not conform to any obvious anatomy, they do seem to be functionally modular, with sharp boundaries between speech perception and production regions. This is consistent with studies suggesting different functions map to distinct cerebellar regions with abrupt transitions between them (King et al., 2019; Guell, Gabrieli, et al., 2018; Guell, Schmahmann, et al., 2018; Marek et al., 2018; Imamizu, Kuroda, Miyauchi, Yoshioka, & Kawato, 2003). The sharp boundaries in our results are even more striking given that activity patterns were not the result of a direct contrast between speech perception and production.
From a functional modularity perspective, speech perception and production activity in different regions likely corresponds to the different subtasks that these functions can be decomposed into. Overlapping regions are likely associated with subtasks similar to both. Indeed, after task decomposition, subtasks engage different cerebellar regions. For example, working memory can be broken down into an “articulatory loop” and “phonological store” and these subtasks consistently activate different cerebellar regions (Keren-Happuch et al., 2014; Chen & Desmond, 2005a, 2005b). Conversely, language and working memory (Ashida, Cerminara, Edwards, Apps, & Brooks, 2019; Stoodley et al., 2012) and social mentalizing (Van Overwalle, Baetens, Mariën, & Vandekerckhove, 2014) may overlap in the cerebellum because they share common subtasks.
Consistent with the appearance of functional modularity, it is claimed that there is a higher level of cerebellar organization into a nonmotor (cognitive, emotional, and/or social) zone and two motor zones, corresponding to different connectivity patterns. Specifically, past work has observed a nonmotor zone in posterior lateral regions, especially Crus I and II, and two motor zones, one in the anterior lobe and the other around Lobule VIII (Stoodley & Schmahmann, 2018; Gellersen, Guo, & O'Callaghan, 2017; Buckner et al., 2011). These have even been further subdivided into triple nonmotor and double motor zones (Guell, Gabrieli, et al., 2018; Guell, Schmahmann, et al., 2018; Buckner et al., 2011).
If one focuses only on the largest regions of activity in our data, there seems to be a medial speech perception zone (Crus I/IIl; HVIIb; Figure 2, red) and two speech production zones, one in the anterior and one in the posterior lobe (dorsal and ventral to the medial perception zone in Figure 2, blue). Speech perception also overlaps with speech production in two somatotopically organized motor zones at the same location of lip and tongue representations (e.g., compare Figure 2 to Figure 7B in Boillat, Bazin, & van der Zwaag, 2020; Boillat et al., 2020; Guell, Gabrieli, et al., 2018; Buckner et al., 2011; Grodd, Hülsmann, Lotze, Wildgruber, & Erb, 2001).
However, nonmotor versus two motor zones oversimplifies the observed pattern of activity (Diedrichsen et al., 2019). Our results suggest that there are up to 10 distinct zones for speech perception and more for speech production both distributed throughout both nonmotor and motor zones. This complexity might again be attributed to the fact that speech is a sensorimotor task that can be decomposed into many subtasks that do not neatly conform to “cognitive” and “motor” categories (Skipper et al., 2017). The latter cerebellar “sandwich” (Hurley, 2001) view (of a cognitive zone between motor zones) may derive from attempting to map multiple gross functions onto the cerebellum using winner-takes-all-like strategies. Our data maps a single “function” and shows that large swathes of the cerebellum are involved in speech perception, arguing for finer task decomposition to understand individual regions of activity.
Based on the uniformity of cerebellar structure, theories like the UCT (Schmahmann, 2019; Schmahmann et al., 2019) claim that cerebellar functions are domain-general, performing a similar computation throughout. Functional specialization in these models is determined by variations in cerebellar location and corresponding cortico-cerebellar connectivity. The fact that our data appears functionally modular while not conforming to any obvious anatomical boundaries could be consistent with either domain-specific or general accounts.
However, domain-generality was supported by the functional profile analyses. Specifically, speech perception regions in the cerebellum were not speech specific. They were also equally associated with perceptual, motor, cognitive, and emotional terms generally (see Tables 1 and 2) and (in alphabetical order) attention, audition, finger and hand movements, language, memory, pain, speech, theory of mind, and vision terms more specifically. Our speech perception studies were all sound and speech related, with no movements associated with them. It would, therefore, be hard to explain the results with a domain-specific theory as it is unlikely that every one of the nonspeech terms associated with speech perception regions have a linguistic explanation.
It is important to note that, although domain-generality implies a common computation in the cerebellum, it is entirely possible that this computation is used by different cognitive and motor processes in different ways. For instance, the cerebellum may contribute a timing signal that is used for prediction in some tasks, coordination in other tasks, and learning in a third set of tasks. Just as the pFC contributes working memory to a variety of tasks that use this resource in a variety of ways, the cerebellum's common computation may be utilized in different ways depending on the process it is contributing to.
If the cerebellum does perform a domain-general role in speech perception, language comprehension, and any other domain, that role must be determined by cortico-cerebellar connectivity. As we reviewed, it has been observed that there are abrupt functional divisions in the cerebellum and these have been shown to conform to structural and functional connectivity, perhaps determining the functional specialization of those regions (Schmahmann, 2019; Schmahmann et al., 2019). Furthermore, this connectivity is said to conform to the division of the cerebellum into (multiple) nonmotor zones and (two) motor zones (Buckner et al., 2011). Indeed, the speech perception, production, and overlap sets of regions formed surprisingly distinct networks with other subcortical and cortical regions as determined by coactivation meta-analysis. In functional terms, speech perception regions tended to form networks with “higher-level” language regions, speech production regions formed networks with premotor and other “higher-level” motor/speech production regions, and overlap regions formed networks with primary auditory and primary motor regions. That is, overlap regions were distinctly sensorimotor (Skipper et al., 2017; Skipper & Hasson, 2017). However, these connectivity patterns were also domain-general (Table 1), suggesting, again, that cerebellar regions are not speech specific. They may become so in speech-only circumstances, but we could not determine this as we used all studies to do the coactivation analysis.
Prediction is an increasingly accepted mechanistic account of how the brain works generally (Keller & Mrsic-Flogel, 2018; Clark, 2013) and the cerebellum more specifically, especially in the domain of motor control (Moberget & Ivry, 2019; Popa & Ebner, 2018). Domain-general theories suggest that the computation the cerebellum contributes to any one domain should be similar to the computation it contributes to others. Although it is possible that different processes may use this consistent contribution in different ways, we also might expect to see commonalities between behaviors in how the cerebellum's contribution is used. Indeed, it has been argued that the predictive role that the cerebellum plays in motor control is reused in speech perception and higher-level linguistic domains like semantics (Moberget et al., 2014; Schwartze, Tavano, Schröger, & Kotz, 2012).
Consistent with this account, our functional profiles of speech perception, production, and overlap regions were all associated with the terms “coordination,” “timing,” and “prediction.” These regions were each associated with unique cerebellar locations and associated connectivity profiles, consistent with a domain-general account. This suggests the possibility that these regions and associated networks are predicting at different levels of analysis, perhaps corresponding to the superior/inferior motor zone and medial cognitive zones. Indeed, sensory-prediction-related processes that mediate sensorimotor adaptation have been demonstrated in the superior motor zone where we show speech perception/production overlap (Guediche et al., 2015). Furthermore, the distribution of activity in the medial portion of the cerebellum completely overlaps the peaks of activity in five fMRI studies of linguistic predictability (D'Mello et al., 2017; Lesage et al., 2017; Moberget & Ivry, 2016; Bonhage, Mueller, Friederici, & Fiebach, 2015; Moberget et al., 2014; Tourville, Reilly, & Guenther, 2008). Consistent with this, overlap networks were more associated with sensorimotor regions whereas the perception-only networks were more associated with regions mediating higher-level linguistic processes in prior studies. Indeed, perception-only regions were uniquely associated with the term “comprehension.”
We also generated more direct evidence for the predictive account. That is, we tested a key tenant of predictive models that they result in less activity when predictions are accurate (Moberget et al., 2014; Skipper, 2014). If the cerebellum plays a predictive role, we hypothesized that cortical activity should be reduced when the cerebellum was active in contrast to when it is not. Indeed, we found that when the cerebellum was active, there was a nearly twofold reduction of cortical activity during speech perception compared to when the cerebellum was not active. Furthermore, much of this reduction was in the aforementioned sensorimotor and higher-level linguistic networks associated with the speech perception/production overlap and speech-perception-only networks, respectively. This suggests the possibility that cortico-cerebellar and cortico-cortical predictions trade off with each other. A hallmark of cerebellar function is expertise whereas a hallmark of the neocortex is flexibility. Perhaps the cerebellum is involved in predictions associated with perceiving more well-learned speech whereas the cortex more flexibly applies predictions in new contexts. Consistent with this, the cerebellum seems to play a specific role in “automatic speech,” that is, overlearned material (Ackermann, 2008; Ackermann, Wildgruber, Daum, & Grodd, 1998).
First, our sample of natural speech perception articles includes (as a minority) studies in which participants passively listened to instrumental music, sounds, and tones. We made the decision to include these studies because it is unclear when exactly sound perception changes to speech perception, many languages are tonal (Yip, 2002), and there is neuroimaging and behavioral evidence that speech and music perception draw on a common neural resource (Kunert & Slevc, 2015; Peretz et al., 2015). To examine the impact of these studies on the observed patterns of cerebellar activation, we reran the baseline meta-analyses with just speech and tone studies and then just speech studies alone. Patterns of activation were nearly identical to the case where all studies (speech and nonspeech) were included. However, with just speech studies, regions of production–perception overlap were mostly only observed at a reduced (although still corrected) statistical threshold. This may reflect a lack of statistical power because of the reduced sample. It is also possible that the overlapping area is enhanced by nonspeech studies because of motor recruitment. There is evidence that cortical motor systems become more engaged in speech perception as auditory signals become more foreign (Wilson, 2009; Wilson & Iacoboni, 2006). The extent to which regions of perception–production overlap in the cerebellum is observed during the perception of clear, native speech, needs to be further explored.
Second, there is a possibility that our sample of passive speech perception studies in which cerebellar activation was not reported (n = 92) includes studies that actually did find—but failed to report—cerebellar activations. If such studies are in this sample, they would likely reduce differences between studies reporting cerebellar activation and studies not reporting it. Removing these studies (if they exist in our sample) would likely lead to greater observed differences in cortical activity between the groups shown in Figure 5.
Finally, our findings reflect the published literature. We cannot control for the quality of the included data, for example, whether appropriate high-level contrasts were used. Furthermore, results may reflect theoretical biases. For instance, predictive coding is a trending topic and there is a known predictive role of the cerebellum in motor control. Thus, there may be a bias to discuss cerebellar activity in the context of a predictive framework. However, we included a large number of studies, almost none of which had any specific interest in the cerebellum (e.g., none of the speech perception articles had “cerebell”* in the title). They simply happened to report cerebellar activity during a task that met our criteria, likely reducing the impact of bias.
What role does the cerebellum play in speech perception? Our results are consistent with the perspective that the cerebellum plays a domain-general and predictive role in all functions, including speech. Furthermore, the type of prediction (e.g., motor or semantic) must be determined by task (and subtask)-specific cortico-cerebellar connectivity.
All data in the paper, including complete lists of all studies used in meta-analyses, are available as supplemental tables in a preprint version of the manuscript: https://www.biorxiv.org/content/10.1101/2020.06.05.136804v2.
The authors would like to thank S. Bobbitt, L. Wyatt and N. Guest for help coding the manuscripts that went into the meta-analysis. J. I. S. would like to thank a Banana.
Reprint requests should be sent to Daniel R. Lametti, Department of Psychology, Acadia University, 15 University Ave, Wolfville NS B4P 2R6, Canada, or via e-mail: firstname.lastname@example.org.
Jeremy I. Skipper: Conceptualization; Data curation; Formal analysis; Funding acquisition; Investigation; Methodology; Project administration; Supervision; Validation; Visualization; Writing—Original draft; Writing—Review & editing. Daniel R. Lametti: Conceptualization; Data curation; Formal analysis; Funding acquisition; Investigation; Methodology; Project administration; Supervision; Validation; Visualization; Writing—Original draft; Writing—Review & editing.
This work was supported by grants from the British Academy (https://dx.doi.org/10.13039/501100000286), Corpus Christi College Oxford, Acadia University, and the Natural Sciences and Engineering Research Council (NSERC) of Canada to D. R. L. (https://dx.doi.org/10.13039/501100000038). It was also partially supported by EPSRC (https://dx.doi.org/10.13039/501100000266) EP/M026965/1 to J. I. S.
Diversity in Citation Practices
A retrospective analysis of the citations in every article published in this journal from 2010 to 2020 has revealed a persistent pattern of gender imbalance: Although the proportions of authorship teams (categorized by estimated gender identification of first author/last author) publishing in the Journal of Cognitive Neuroscience (JoCN) during this period were M(an)/M = .408, W(oman)/M = .335, M/W = .108, and W/W = .149, the comparable proportions for the articles that these authorship teams cited were M/M = .579, W/M = .243, M/W = .102, and W/W = .076 (Fulvio et al., JoCN, 33:1, pp. 3–7). Consequently, JoCN encourages all authors to consider gender balance explicitly when selecting which articles to cite and gives them the opportunity to report their article's gender citation balance.