Some people experience auditory sensations when seeing visual flashes or movements. This prevalent synaesthesia-like visually evoked auditory response (vEAR) could result either from overexuberant cross-activation between brain areas and/or reduced inhibition of normally occurring cross-activation. We have used transcranial alternating current stimulation (tACS) to test these theories. We applied tACS at 10 Hz (alpha band frequency) or 40 Hz (gamma band), bilaterally either to temporal or occipital sites, while measuring same/different discrimination of paired auditory (A) versus visual (V) Morse code sequences. At debriefing, participants were classified as vEAR or non-vEAR, depending on whether they reported “hearing” the silent flashes. In non-vEAR participants, temporal 10-Hz tACS caused impairment of A performance, which correlated with improved V; conversely under occipital tACS, poorer V performance correlated with improved A. This reciprocal pattern suggests that sensory cortices are normally mutually inhibitory and that alpha-frequency tACS may bias the balance of competition between them. vEAR participants showed no tACS effects, consistent with reduced inhibition, or enhanced cooperation between modalities. In addition, temporal 40-Hz tACS impaired V performance, specifically in individuals who showed a performance advantage for V (relative to A). Gamma-frequency tACS may therefore modulate the ability of these individuals to benefit from recoding flashes into the auditory modality, possibly by disrupting cross-activation of auditory areas by visual stimulation. Our results support both theories, suggesting that vEAR may depend on disinhibition of normally occurring sensory cross-activation, which may be expressed more strongly in some individuals. Furthermore, endogenous alpha- and gamma-frequency oscillations may function respectively to inhibit or promote this cross-activation.
Some people can hear what they see: Flashing car indicator lights, animated Web browser adverts, neon shop displays, and people's footsteps may all evoke an auditory sensation (Fassnidge, Cecconi Marcotti, & Freeman, 2017; Rothen, Bartl, Franklin, & Ward, 2017; Saenz & Koch, 2008; Guttman, Gilroy, & Blake, 2005). This visually evoked auditory response (vEAR) is also referred to as “hearing-motion synaesthesia” (Saenz & Koch, 2008). Its prevalence has been estimated as at least ∼5% up to ∼20% (Fassnidge & Freeman, 2018; Fassnidge et al., 2017; Rothen et al., 2017). vEAR seems more prevalent than other more canonical varieties of synaesthesia, for example, where music, letters, or numbers can evoke perceptions of color (Simner et al., 2006), and thus insights gained from studying vEAR may be more generally applicable as a model for understanding the normal mechanisms by which information from different modalities is combined. However, since the first report (Saenz & Koch, 2008), there has been very little research on this phenomenon. Behaviorally, the ability to “hear” flashes can benefit discrimination of sequences of flashes (Fassnidge et al., 2017; Saenz & Koch, 2008; Guttman et al., 2005), possibly because the auditory modality is superior at encoding temporal structure (Guttman et al., 2005; Glenberg, Mann, Altman, Forman, & Procise, 1989). This visual discrimination advantage also correlates with impaired auditory detection in the presence of irrelevant flashes (Fassnidge et al., 2017). Although there has only been one recent correlational EEG study (Rothen et al., 2017), to our knowledge, no study has yet investigated the causal physiological mechanisms underlying vEAR.
There are two popular theories of synaesthesia that might explain vEAR. First, the cross-activation theory (Hubbard, Brang, & Ramachandran, 2011; Ramachandran & Hubbard, 2001) postulates that synaesthetic percepts result from overabundant connections between different brain regions resulting in greater cross-activation of one sensory representation by another (Tomson et al., 2011; Bargary & Mitchell, 2008; Hubbard & Ramachandran, 2005). Second, the disinhibition theory postulates that there is normal cross-connectivity in synaesthetes, which is normally inhibited in nonsynaesthetes but disinhibited in synaesthetes (Neufeld et al., 2012; Grossenbacher & Lovelace, 2001), again resulting in greater cross-activation. This latter theory has support from our finding that vEAR is associated with a range of other diverse perceptual traits, such as earworms, tinnitus, and auditory evoked phosphenes (Fassnidge & Freeman, 2018). Such generalized phenomena might be more dependent on systemic variables determining cortical excitability or inhibition, rather than on just idiosyncratic patterns of neuroconnectivity (Fassnidge & Freeman, 2018).
In this study, we used transcranial alternating current stimulation (tACS) to test experimentally whether visual and auditory cortices interact with each other differently in those who experience vEAR versus those who do not. Previous studies show that tACS can entrain endogenous oscillations, which may play a role in coordinating coherent patterns of neural activity around the cortex (Antal & Paulus, 2013). The effects of tACS may be complex and nonlinear (Herrmann, Rach, Neuling, & Strüber, 2013); thus, it is difficult to predict whether tACS should increase or decrease performance in the present case or what specific mechanisms it might affect. However, we can at least make two distinct a priori predictions about the effects of tACS in people with versus without vEAR based on the following logic: If a given group of participants express more of a given process and tACS modulates that process, then the effects of tACS on their performance should be greater than in a group that expresses less of this same process.
Following this logic, if vEAR participants express more cross-activation than non-vEAR and tACS modulates this cross-activation in some way, then tACS might affect performance of vEAR participants more than non-vEAR. For instance, if visual stimulation cross-activates auditory cortex more in vEAR (Hubbard et al., 2011; Ramachandran & Hubbard, 2001), then tACS to temporal areas might modulate visual performance more, compared with non-vEAR participants for whom there is less such visual-to-auditory cross-activation. Conversely, if vEAR participants express less inhibition than non-vEAR (Neufeld et al., 2012; Grossenbacher & Lovelace, 2001) and tACS modulates these inhibitory mechanisms, then tACS might have less effect in vEAR than non-vEAR participants.
We compared alpha- versus gamma-frequency tACS to test further hypotheses about the functional roles played by endogenous neural oscillations. Alpha frequencies are thought to play a specific role in cortical inhibition (Bonnefond & Jensen, 2015; Klimesch, 2012; Jensen & Mazaheri, 2010; Cooper, Croft, Dominey, Burgess, & Gruzelier, 2003; Fu et al., 2001) but may also serve to coordinate integration of audiovisual information (Cecere, Rees, & Romei, 2015; Romei, Gross, & Thut, 2012; Schroeder & Foxe, 2005) and active information processing (Palva & Palva, 2007). Gamma frequencies are thought to be important for active information processing and binding of multisensory information, as well as maintenance of information in working memory and intracortical signaling (Roux & Uhlhaas, 2014; Ray & Maunsell, 2010; Senkowski, Schneider, Foxe, & Engel, 2008; Yuval-Greenberg & Deouell, 2007; Senkowski, Talsma, Herrmann, & Woldorff, 2005; Başar, Başar-Eroglu, Karakaş, & Schürmann, 2001; Singer, 1999; Engel, König, Kreiter, & Singer, 1991). In keeping with these distinct roles (Wöstmann, Vosskuhl, Obleser, & Herrmann, 2018; Jensen & Mazaheri, 2010; Fries, Reynolds, Rorie, & Desimone, 2001), alpha- versus gamma-frequency tACS have respectively been found to have inhibitory versus facilitatory effects on performance, for example, in attentional paradigms with parietal stimulation sites (Wöstmann et al., 2018; Hopfinger, Parsons, & Fröhlich, 2017); however, gamma stimulation can also disrupt cross-hemispheric integration (Helfrich et al., 2014; Strüber, Rach, Trautmann-Lengsfeld, Engel, & Herrmann, 2014).
The above distinct hypothetical roles of endogenous oscillations lead to frequency-specific predictions for the effects of tACS on measures of visual and auditory task performance, following the logic outlined above. If vEAR depends on reduced inhibition and alpha frequencies play a role in inhibition, we might predict that effects of alpha tACS would be reduced in vEAR than non-vEAR participants. Alternatively, if gamma frequencies play a role in multisensory binding or signaling and if vEAR depends on increased cross-activation between vision and audition, then we might predict increased effects of gamma tACS in individuals whose performance depend on such cross-activation.
To test these hypotheses, we measured how tACS stimulation affected discriminability of pairs of either visual or auditory Morse code sequences (Saenz & Koch, 2008). We measured auditory and visual sequence discrimination performance while applying tACS at either 10 Hz or 40 Hz to either occipital or temporal areas and compared this with performance of the same task in the same modality but under sham stimulation.
Our first experiment compared the effects of alpha-frequency tACS on participants who reported that they could hear the flashes versus those who did not. We included classical musicians after having obtained pilot data that suggested they were more likely to experience vEAR, possibly due to their training and auditory imagery abilities. The second experiment compared the effect of gamma- versus alpha-frequency tACS on performance in groups defined not by subjective reports but by an implicit objective measure of any latent ability to recode visual stimuli into the auditory modality. We quantified this as the ratio between visual and auditory discriminability (V:A). Although auditory sequence discrimination is typically much better than visual (Saenz & Koch, 2008), in some participants, a latent ability to recode visual stimuli into the auditory modality might provide a specific advantage for visual sequence discrimination (Saenz & Koch, 2008; Guttman et al., 2005; Glenberg et al., 1989), thus giving them a higher V:A ratio.
Our results provide the first insight into the neural mechanisms underlying vEAR. They also offer tentative support for a general disinhibition account of vEAR and of traits such as musicianship and other synaesthesias, which we also found to be strongly associated.
In Experiment 1 (10-Hz tACS), we tested 36 paid participants who were naïve to the purpose of the study. These included 20 participants recruited from the student population and the local community (age range = 18–31 years, mean = 23.1 years, SD = 3.74; seven men). There were also 16 classical musicians from the London Royal College of Music (age range = 18–55 years, mean = 24.44 years, SD = 9.92; nine men). Experiment 2 (40-Hz tACS) had 16 naïve participants (age range = 18–28 years, mean = 20.13 years, SD = 2.64; one man) who received course credits or payment for taking part. All participants had normal or corrected-to-normal vision and reported normal hearing. All procedures were carried out with written informed consent following completion of a safety checklist and were approved by the local psychology ethics committee in accordance with the Declaration of Helsinki, under the condition that a first aid-trained experimenter was always present during sessions.
Apparatus and Stimuli
The experimental procedure was conducted using an Apple Mac Mini connected to a 17-in. Sony HMD-A420 cathode ray tube display. Auditory stimuli were presented through two Labtec PC speakers both positioned next to each other directly in front of and below the center of the monitor. Screen resolution was 800 × 600 pixels with a 120-Hz refresh rate, and a viewing distance was set at approximately 57 cm (controlled using a chin rest). A small white fixation point marked the center of the display. Responses were entered using the arrow keys on a standard computer keyboard. Experimental procedures and stimuli were programmed using Psychtoolbox for MATLAB.
Visual stimuli consisted of single white circular discs of luminance 81 cd m−2, presented centrally on a black background. Disk diameter was 3° of visual angle. Auditory stimuli were sine wave tones of maximum 91 dBA sound pressure level, frequency of 440 Hz. “Short” and “long” events were presented for periods of either 75 msec or 300 msec, respectively, during which stimulus amplitude immediately decayed linearly from maximum to zero amplitude.
Stimulation was delivered using a battery-driven DC-Stimulator Plus (NeuroConn) through two 5 × 5 cm conductive rubber electrodes housed in sponge pads that had been saturated with saline. Electrodes were secured using rubber straps and placed over either the occipital pole (O1, O2) or the temporal lobe (T3, T4), following the international 10–20 system (Figure 1).
We employed a 2 × 2 × 2 × 2 design with three within-subject factors: Rhythmic sequence (visual flashes vs. auditory beeps), Stimulation site (occipital pole vs. temporal lobe), and Stimulation condition (tACS vs. sham). Double-blinding was used for the stimulation condition. The fourth between-subject factor distinguished between people who reported hearing sounds evoked by the flashes (Experiment 1) or those who showed a relative benefit for visual sequence discrimination (both experiments).
Participants performed a rhythmic sequence discrimination task (Fassnidge et al., 2017; Saenz & Koch, 2008), over two sessions on separate days. In each session, participants performed the sequence discrimination task under both sham and stimulation conditions. Order of stimulation versus sham was fully counterbalanced across sessions. Order of montage was also independently counterbalanced between sessions.
In the stimulation condition, tACS was applied for 15 min during task performance. The sham condition used identical stimulation parameters, except that duration of stimulation was 30 sec. This was included to replicate any initial sensations experienced in the stimulation session. The current had a sinusoidal waveform with a frequency of 10 Hz and an intensity of 1000 μA. When switched on, the stimulation ramped up to a full intensity of over 2.5 sec, then ramped back down at the end of the stimulation sequence, over another 2.5 sec. Impedance was kept below 10 kΩ for all sessions by dripping saline onto the pads when required.
In the first experimental session, participants read an information sheet and completed a safety screening questionnaire before providing informed consent. This questionnaire screened for contraindications, including medical history of epilepsy, brain injury, frequent headaches, and anxiety or panic attacks. The experimental task was explained and demonstrated, and the participant was given a practice block. The relevant stimulation location for that session was then located using standard 10–20 procedures. Stimulation site depended on the randomized counterbalancing that had been established beforehand. Electrodes were then applied to the relevant area and secured by a rubber strap. To ensure double-blinding, a second experimenter programmed the stimulator before each task repetition, delivering either sham or stimulation according to a secret code in the counterbalancing table, but did not interact with the participant or the first experimenter and stayed outside the lab during data collection.
In each repetition of the sequence discrimination task, there were 50 auditory and 50 visual trials. These were presented in random order in five blocks of 20 trials, interleaved with mandatory fixed breaks of 30 sec. On each trial, two successive rhythmic sequences of stimuli were presented. In half of the trials, the events were all visual, and in the other half, all events were auditory (Figure 1). The modality of each trial was randomly permuted between trials. Each sequence was composed of eight stimulus events, each of which was presented for a short (75-msec) or a long (300-msec) duration. Order of events in each sequence was randomized with the constraint that there were four to five transitions between short and long events or vice versa. The interval between events was 100 msec, and the interval between the first and second sequence was 500 msec. On half of the trials, the two sequences were identical, and on the other half, they differed. In “different” trials, the first two events and the last event were always identical between pairs, whereas the order of the remaining events was randomly permuted. Immediately following the second sequence, participants were required to indicate whether they thought the two sequences were same or different by pressing either the left or right arrow key on a computer keyboard, respectively. No error feedback was given. The response initiated the next trial.
After the first task, the participant was given a minimum 5-min break, during which the second experimenter returned to reprogram the stimulator. Participants then performed a second iteration of the sequence discrimination task. A second experimental session followed no less than 24 hr later, in which the above procedures were repeated for the other stimulation location.
At the end of both sessions, participants completed a debrief questionnaire asking about any unpleasant effects of the stimulation, such as headaches, pain or discomfort, or anxiety. We did not ask participants whether they could distinguish experimental from sham conditions, for this might have created biases on subsequent testing sessions. However, as an alternative to this, we interviewed a subset of 27 participants in Experiment 1 about milder experiences such as cutaneous tingling and also flickering phosphenes. Flickering phosphenes have been associated with alpha-frequency tACS (Kanai, Chaieb, Antal, Walsh, & Paulus, 2008), but not gamma frequency; thus, we only asked this question for Experiment 1. We reasoned that participants who could detect such sensations might have had noticeably different experiences in sham and experimental conditions, thus potentially compromising blinding; in contrast, those who did not detect any sensations in either session would be much less likely to notice a difference between conditions.
All participants of Experiment 1 were also administered a short structured interview in which they were asked a series of yes/no questions about visually evoked sounds and synaesthetic experiences (Fassnidge et al., 2017).
Were you aware of using the flashes as if they were sounds, for example, “flash, flash-flash” = “beep, beep-beep”?
Did you actually hear faint sounds when you saw flashes?
In everyday life, are you ever aware of hearing sounds when you see flashing lights or movement, for example, shop displays, car indicators, or people walking?
Do you ever experience colors associated with letters or with music, or tastes or smells associated with sounds?
Have you ever been diagnosed as a synaesthete, or do you suspect you might be one?
Of 36 participants, 42% answered “yes” to the second question about hearing flashes, and 66% confirmed that they were aware of using the flashes as if they were sounds. Twenty percent confirmed prior awareness of hearing visual movements, which is similar to the proportion responding positively to the same question in Fassnidge and Freeman's (2018) large-scale Internet survey. Responses to the last two questions of the final questionnaire were redundant: Only the participants who answered “yes” to the last synaesthesia question (33%) answered “yes” to the previous question about specific sensory associations (28%). For simplicity, we consider below only responses to the synaesthesia question. Fisher's exact tests revealed that frequency of reporting hearing flashes was significantly higher in musicians (75%) than in non-musicians (12%, slightly lower than estimates of ∼20% from Fassnidge et al., 2017; odds ratio = 22.00, p < .0001). Those who reported hearing flashes also tended to report synaesthetic experiences (56% in musicians vs. 12%, but note that self-assessed prevalence typically exceeds estimates based on formal synaesthesia tests; Simner et al., 2006; odds ratio = 24.00, p < .0001). Musicians also tended to report synaesthesia more frequently (odds ratio = 9.43, p = .0043). Figure 2 shows these proportions as stacked bar charts. Responses to the remaining questions (about using flashes as if they were sounds and experiences of visually evoked sounds in everyday life) showed no significant relationships with musicianship or synaesthesia.
Sensitivity measures (d′) for participants' same/different discrimination judgments were first calculated following standard psychophysical methods (Green & Swets, 1966). Three original participants were replaced by new participants because their d′ scores for at least one of the auditory conditions was 0. To check whether there were any consistent carryover effects of tACS on performance in the sham condition when following stimulation, we analyzed d′ for just the sham conditions in an ANOVA for each of the two electrode montages, grouping data by the order of sham condition. Sham order had no significant main effect and did not interact significantly with Modality (visual vs. auditory), F(1, 34) < 1.24, p < .05.
d′ for participants in the 10-Hz stimulation experiment was analyzed in an ANOVA including Site (occipital versus temporal), Modality (visual vs. auditory), and Stimulation (sham vs. tACS) as repeated measures and yes/no responses to the second “hearing flashes” debriefing question as a grouping variable. Performance was significantly better overall in participants who reported “hearing flashes,” F(1, 34) = 12.34, p = .001, ηp2 =.27, and auditory (A) d′ was significantly higher than visual (V) performance on average, F(1, 35) = 134.90, p < .00001, ηp2 = .79. Means (and SEs) for the different groups and conditions are as follows: “yes”: V 2.39 (0.17), A 4.12 (0.33), n = 15; “no”: V 1.60 (0.19), A 3.05 (0.81), n = 21. There was no significant main effect of tACS versus sham or for site. However, there was a significant interaction between Site × Stimulation × Modality, F(1, 34) = 6.43, p = .016, ηp2 = .16. Furthermore, these variables interacted significantly with “hearing flashes,” F(1, 34) = 7.00, p = .012, ηp2 = .17. Similar analyses showed no significant main effects or interactions with grouping variables based on the other questions about using flashes as if they were sounds or everyday experience of visually evoked sounds. Given the observed effect sizes and sample sizes, estimated statistical power for detecting an interaction in the present ANOVA design was high (>.9, calculated using G*Power 3.1 software; Faul, Erdfelder, Lang, & Buchner, 2007).
To quantify the effects of tACS, we subtracted sham from tACS d′ scores for each stimulation site separately, so that negative scores represent a decrement in performance. The results are shown in Figure 3, with asterisk and bracket annotations indicating significance of Tukey multiple comparisons at p < .05. The pattern of results appears to reveal reciprocal effects on task performance of tACS applied at different sites, particularly in non-vEAR participants. For example, A performance was significantly impaired by temporal tACS relative to sham and to occipital stimulation, but there was a nonsignificant trend for A performance to slightly improve with occipital stimulation. V performance was also significantly poorer with occipital stimulation compared with A, which again showed a nonsignificant trend to improved discriminability. In contrast, there were no significant deviations from sham performance in vEAR participants.
An additional analysis assessed the performance of a subsample of 27 participants whose experiences of flicker and tingling sensations were probed in more detail during debriefing. Eight reported flickering phosphenes (sometimes described as occurring in peripheral vision, with bluish color) and four reported tingling in one or both sessions; the remaining participants reported no such experiences in either session. Participants reporting sensations might arguably have experienced differences between sham versus experimental condition, leading to potential biases and placebo effects; in contrast, those who experienced no sensations would have had no such cues enabling them to distinguish between conditions. We first tested for an association between the frequency of reporting “stimulation sensations” (12 of 27) and reporting “hearing flashes” (14 of 27). If associated, then the critical effects shown in Figure 3 might be explained away as a placebo effect. However, there was no significant association (Fisher's exact test: odds ratio = 0.25, p = .13). Bayesian analysis of contingency using JASP software (Jamil et al., 2017) resulted in Bayes factor of 2.61 (based on a joint multinomial sampling scheme), showing only weak evidence for an association, compared with the null hypothesis of no association.
We also tested in an ANOVA whether the subtracted difference between stimulation and sham d's depended on “stimulation sensations” (including in the model “hearing flashes,” stimulation site, and modality as additional independent variables). There were no significant group differences between participants reporting flicker or tingling and those reporting no sensations in either session (n = 15), F(1, 23) = .03, ns, and no significant interactions of these groups with Stimulation site and/or Audiovisual task or “hearing flashes” groups (F ≤ 0.7). In contrast, “hearing flashes” still interacted significantly with both site and modality, F(1, 23) = 8.173, p = .009, in this subsample of participants, showing the same kind of reciprocal pattern seen in Figure 3. Complementary results were revealed by a Bayesian ANOVA using JASP software with default priors (Rouder, Morey, Speckman, & Province, 2012). This analysis specifically compared a two-factor model of the Site × Modality interaction, with the same model but including an interaction with either or both the “stimulation sensations” and “hearing flashes” grouping variables (Table 1). All other main effects and their combinations were considered under the null model. The model with the strongest evidence included the Site × Modality × Hearing Flashes interaction (Bayes factor of 17.54 relative to the null model). In contrast, Site × Modality × Stimulus Sensations had a Bayes factor of only 0.76 relative to the null model. We divided Bayes factors for models including the three-factor interaction by the Bayes factor for the two-factor model to assess whether inclusion of the additional between-subject grouping factors in the interaction provided a better explanation of our observations than the simple two-factor model alone. We found that the Site × Modality × Stimulation Sensations interaction model had a Bayes factor of 0.32 relative to the Site × Modality model; thus, the latter simpler model was 3.12 times more likely to explain our observations. In contrast, the Site × Modality × Hearing Flashes interaction was 20.42 times more likely than the Site × Modality model alone and 63.65 times more likely than Site × Modality × Stimulation Sensations. It was also 17.41 times more likely than a more complex interaction model including all four factors. From these analyses, we can therefore conclude that it is highly unlikely that performance was affected by awareness of stimulation sensations or any resulting placebo effects and much more likely that they are explained by differences in the tendency to experience visually evoked auditory sensations.
|Models .||P(Model) .||P(Model|data) .||BFModel .||BF10 .||Error % .|
|Site × Modality||0.17||0.083||0.45||2.24||10.76|
|Site × Modality × Stimulation Sensations (+ Site × Modality)||0.17||0.028||0.15||0.76||13.30|
|Site × Modality × Hearing Flashes (+ Site × Modality)||0.17||0.65||9.23||17.54||55.58|
|Site × Modality × Stimulation Sensations × Hearing Flashes (+ Site × Modality + Site × Modality × Stimulation Sensations + Site × Modality × Hearing Flashes)||0.17||0.10||0.53||2.59||15.090|
|Models .||P(Model) .||P(Model|data) .||BFModel .||BF10 .||Error % .|
|Site × Modality||0.17||0.083||0.45||2.24||10.76|
|Site × Modality × Stimulation Sensations (+ Site × Modality)||0.17||0.028||0.15||0.76||13.30|
|Site × Modality × Hearing Flashes (+ Site × Modality)||0.17||0.65||9.23||17.54||55.58|
|Site × Modality × Stimulation Sensations × Hearing Flashes (+ Site × Modality + Site × Modality × Stimulation Sensations + Site × Modality × Hearing Flashes)||0.17||0.10||0.53||2.59||15.090|
Results of Bayesian ANOVA comparing two-way interaction between Occipital vs Temporal stimulation sites (Site) and Visual vs Auditory task (Modality), with models that also include grouping variables for ‘Hearing Flashes’ and ‘Stimulation Sensations.’ All main effects and interactions of no interest are represented under the null model, and are also included within the models of interest. See main text for details.
In case the trends seen in Figure 3 had been diluted by participants who overall experienced weaker effects of tACS than others, we correlated individual scores for each task under occipital stimulation against temporal stimulation (see Figure 4). This analysis found significant negative correlations specifically in non-vEAR participants, (responding “no” to our second question, left graph), confirming that greater decrements in performance related to one stimulation site coincided reciprocally with greater improvement at the other site. In particular, impairment of A performance under temporal stimulation significantly correlated with improved performance of the same task under occipital stimulation, r(19) = −.69, p < .001. A similar significant negative correlation was observed for V performance, r(19) = −.79, p < .001, where greater impairments under occipital stimulation coincided with greater improvements under temporal stimulation. No such trends were observed for vEAR participants (right graph).
We tested a separate group of 16 non-vEAR participants with 40-Hz tACS and compared their performance with the non-vEAR participants in Experiment 1, excluding the musicians (n = 17). This sample size was sufficient to achieve high power (>.9) for detecting an interaction effect in the present mixed ANOVA design, given effect sizes observed in the previous study (calculated using G*Power software). We restricted the sample to non-vEAR to test the assumption made by the disinhibition hypothesis that there is normal crosstalk between modalities even in people who do not experience synaesthetic phenomena. To assess this crosstalk, we used an implicit measure of the ratio of visual to auditory performance under sham stimulation (V:A). Higher values (e.g., closer to 1) indicate that visual performance is almost as good as auditory and thus possibly relying more on visual-to-auditory recoding.
Participants were first grouped according to whether their individual V:A ratio was below or above the median across all participants (0.47, SD = 0.25). The higher scoring V:A group included eight participants in the 10-Hz group and seven participants in the 40-Hz group. d′ Values were then analyzed in a four-way ANOVA, with Modality and Site as repeated measures and with Frequency (10 Hz vs. 40 Hz) and V:A group as between-subject variables. There was a significant interaction of V:A with Modality, F(1, 29) = 10.16, p = .003, ηp2 = .26, and between V:A, Modality, and Frequency, F(1, 29) = 13.76, p < .001, ηp2 = .32. The full pattern of mean results is shown in Figure 5 (with bracket and asterisk annotations indicating Tukey multiple comparisons significant at p < .05). In the group for whom V performance was more similar to A (labeled “V ≈ A” on the right of the figure), 40-Hz stimulation specifically impaired V performance when applied to the temporal sites, t(6) = 3.08, p < .022, Cohen's D = 0.50 (see asterisk).
Further analysis of the 40-Hz data using ANCOVA with V:A as a continuous variable revealed a significant interaction between V:A with Modality, F(1, 14) = 10.29, p = .0063, ηp2 = .42. In the 10-Hz data, the corresponding interaction was not significant, F(1, 15) = 0.25, ns. The 40-Hz interactions are visualized in Figure 6 as scatterplots of performance (relative to sham) against V:A scores. Both occipital and temporal 40-Hz tACS impaired V performance more in participants with higher V:A scores: occipital, ρ(14) = −0.52, p = .039; temporal, ρ(14) = −0.53, p = .035 (see left two scatterplots in Figure 6). There were no other significant correlations. These results suggest that the more an individual relies on their audition to benefit visual task performance, the more their performance can be disrupted by gamma-frequency tACS. The comparable effects of tACS over both stimulation sites suggest that the stimulation may interfere with the auditory recoding of visual information and also possibly the transfer of information to auditory cortex from occipital areas.
We have performed the first transcranial electrical brain stimulation experiment to investigate the neural basis for the little-known vEAR, which may be a frequently occurring type of synaesthesia. Our results support our a priori predictions based on the hypothesis that vEAR depends on disinhibition of normally occurring latent cross-connectivity between visual and auditory cortices. Given prior evidence of the high prevalence of vEAR and present evidence of associations with other synaesthesias and musicianship, these findings may have generalizable implications for understanding mechanisms underlying individual differences in synaesthesia and in normal interactions between modalities, as well as furthering understanding of the functional roles of cortical oscillations and the effects of brain stimulation.
Although one popular theory of synaesthesia depends on greater cross-connectivity between cortical areas (Hubbard et al., 2011; Ramachandran & Hubbard, 2001), an alternative explanation is that the cross-connectivity is normal but just disinhibited in synaesthesia (Neufeld et al., 2012; Grossenbacher & Lovelace, 2001). Our results support both of these hypotheses but depending on the frequency of stimulation and task measures used.
In support of the disinhibition hypothesis, we had reasoned that if people with vEAR had less inhibition and if 10-Hz tACS modulates such inhibition, then 10-Hz tACS should affect them less than people with vEAR. This prediction was confirmed in Experiment 1, where we found less modulation of performance from 10-Hz tACS in vEAR than in non-vEAR participants. Curiously, we found clear symmetrical reciprocal effects of stimulation at different sites in non-vEAR participants, where improvements in performance related to one stimulation site correlated with decrements related to stimulation at the other site. For example, the more visual performance improved under temporal stimulation, the more it was disrupted by occipital stimulation; conversely, the more auditory performance was impaired by temporal stimulation, the more it benefited from occipital stimulation. This antagonistic pattern is suggestive of a mutual inhibition between vision and audition in non-vEAR participants, where stimulation serves to bias the competition in favor of the nonstimulated modality. This is consistent with fMRI evidence that stimulation in one modality suppresses the BOLD response to an unstimulated modality (Langner et al., 2011; Johnson & Zatorre, 2005; Laurienti et al., 2002; Shulman et al., 1997; Haxby et al., 1994). vEAR participants appeared, on average, to be immune to this push–pull effect of alpha tACS. This pattern could suggest that there is reduced competition between modalities in both directions or conversely a general increase in cooperation between vision and audition. Such cooperation could explain why performance in vEAR participants was better for both the visual and auditory tasks, as we have previously found (Fassnidge et al., 2017). It also seems consistent with the subjective experience of “hearing flashes” in vEAR participants, who might experience a nominally unimodal visual stimulus as effectively audiovisual, enabling them to use vision and audition together to perform the discrimination task. However, the symmetrical pattern of results also implies that people who are susceptible to visually evoked auditory sensations might also be susceptible to auditory-evoked visual sensations. Although we did not measure it here, our previous study (Fassnidge et al., 2018) indeed found that reports of visually evoked auditory sensations and ratings of auditory vividness of silent movies were significantly associated with reports of the converse phenomenon of flashes triggered by sudden sounds, experienced most typically while lying in the dark (Boroojerdi et al., 2000; Jacobs, Karpik, Bozian, & Gøthgen, 1981; Lessell & Cohen, 1979).
These findings all support the theory that some kinds of synaesthesia are characterized by a reduction of physiological disinhibition, which may result in unmasking of connections that are normally existent in most brains (Grossenbacher & Lovelace, 2001). There is previous evidence that visual and auditory brain areas have reciprocal connections (Gleiss & Kayser, 2014; Falchier et al., 2010; Schroeder, Lakatos, Kajikawa, Partan, & Puce, 2008; Clavagnier, Falchier, & Kennedy, 2004) and that different modalities can inhibit each other (Iurilli et al., 2012; Kayser & Remedios, 2012; Ohshiro, Angelaki, & DeAngelis, 2011; Talsma, Senkowski, Soto-Faraco, & Woldorff, 2010; Ward & Mattingley, 2006; Mattingley, Driver, Beschin, & Robertson, 1997). Such inhibition may be carried by alpha-frequency oscillations (Bonnefond & Jensen, 2015; Jensen & Mazaheri, 2010; Klimesch, Sauseng, & Hanslmayr, 2007; Cooper et al., 2003; Fu et al., 2001), which might be modulated or enhanced by tACS (Wöstmann et al., 2018; Hopfinger et al., 2017; Antal & Paulus, 2013; Herrmann et al., 2013).
As illustrated in Figure 7, we tentatively propose one way in which, by interfering with such a mutually inhibitory mechanism, 10-Hz tACS might bias the balance of competition between auditory and visual areas. In vEAR individuals, tACS might have a little effect, because there is weaker prior inhibition to interfere with, and consequently less competition (or more cooperation) between modalities. However, in non-vEAR participants, tACS to auditory areas might enhance auditory inhibition and/or disrupt inhibitory competition of auditory with visual areas, in both cases biasing competition in favor of visual performance, which would consequently improve. Disinhibited visual areas might then send back more competitive inhibition to auditory areas, further reducing auditory performance. Stimulation to visual areas might have the opposite effect of biasing competition in favor of auditory areas, resulting in improved auditory performance and reduced visual performance. Note that although we have schematically indicated inhibition between modalities directly, this inhibition might also come via feedback from higher multimodal areas such as STS (Neufeld et al., 2012; Grossenbacher & Lovelace, 2001). Also, it should be noted that, to our knowledge, such an effect of tACS specifically on biased competition between modalities has not been observed before; thus, this model would need to be substantiated in future research.
Although the data support an association between vEAR and reduced cross-modal inhibition, a question remains about the causal role of inhibition. On the one hand, it is possible that the apparent reduction of inhibitory interactions shown in our vEAR participants is the result of using an advantageous attentional strategy of attending to both the sight and the evoked sound of the unimodal flash sequences, whereas non-vEAR participants would normally adopt a strategy of attending selectively to just the stimulated modality. Alpha tACS could interfere with the ability to suppress the irrelevant modality. This could also explain why the musicians in our sample tended to have vEAR, if their musical training promotes joint attention to both the sound of music and the coordinated movements of the conductor and other musicians. There is evidence that musicians do indeed place greater weight on visual movement cues when assessing the quality of a performance (Tsay, 2013). However, an alternative account is that systemic individual differences in the expression of inhibitory mechanisms play a causal role in the experience of vEAR. If so, such individual differences might cause a wider variety of effects on perception than just audiovisual synaesthesia. This is supported by the evident association of vEAR not only with musicianship and general synaesthesia (as found here) but also with tinnitus and musical imagery (Fassnidge & Freeman, 2018). Further evidence from brain stimulation studies suggests that experimental manipulation of cortical excitability (which might result from reduced inhibition) can enhance or induce a variety of cross-sensory phenomena, including grapheme-color and mirror-touch synaesthesia and auditory-modulated visual phosphenes (Bolognini, Miniussi, Gallo, & Vallar, 2013; Terhune, Tai, Cowey, Popescu, & Cohen Kadosh, 2011; Bolognini, Senna, Maravita, Pascual-Leone, & Merabet, 2010; Schroeder et al., 2008). These diverse phenomena are not necessarily dependent on cross-modal attention nor associated with behavioral advantages. Indeed, advantages in visual sequence discrimination are correlated with poorer auditory signal detection in the presence of irrelevant visual stimulation (Fassnidge et al., 2017). Taken all together with the present results, this broad pattern of association suggests that there may be a common causal role for disinhibition in accounting for individual differences in a variety of cross-modal and unimodal phenomena.
This disinhibition theory jointly assumes that vEAR is not only characterized by a reduction of inhibition between vision and audition but also that such inhibition is necessarily mediated by normally occurring connections between these areas, whether direct or indirect. This might predict a greater effect of tACS on individuals who have greater connectivity, but Experiment 1 found less effect of tACS in vEAR individuals. This might be because, first, an analysis based just on subjective awareness of “hearing flashes” might not be sensitive to the full range of individual differences in latent cross-connectivity. For example, we previously found that some participants suffered impaired auditory signal detection in the presence of an irrelevant visual flash, consistent with a subliminal cross-activation of auditory areas from vision, but this was independent of their awareness of “hearing” these flashes (Fassnidge et al., 2017). Second, if there are latent differences in cross-connectivity, this might not have been affected by 10-Hz tACS if that specifically plays a role in modulating inhibition.
Experiment 2 addressed these issues by using a 40-Hz tACS combined with an implicit measure of the ratio of visual to auditory task performance, which might increase the more participants are able to recode flashes into the auditory modality (Saenz & Koch, 2008). tACS (40 Hz) might play a role in disrupting information processing, multisensory binding, or allocation of attentional resources (Roux & Uhlhaas, 2014; Senkowski et al., 2005, 2008; Yuval-Greenberg & Deouell, 2007; Başar et al., 2001; Singer, 1999; Engel et al., 1991). We specifically predicted that if 40-Hz tACS performs such functions, then participants who recode visual events into the auditory modality should find it harder to do this during stimulation to temporal areas, because this would interrupt the integration of visual inputs within auditory areas. Our results confirm this prediction: temporal (as well as occipital) 40-Hz tACS affected visual sequence discrimination, particularly in those who showed higher V:A ratios. We suggest that, in this study, 40-Hz tACS might be more effective at specifically interrupting the signaling between cross-activating brain areas, particularly in high V:A individuals whose latent ability to recode visual events into the auditory modality normally gives them a visual advantage. Given that 40-Hz oscillations may also play a role in intracortical signaling (Ray & Maunsell, 2010), it is also possible that 40-Hz tACS might also have affected the processing of visual signals once they arrived in temporal areas, without necessarily affecting the intercortical transmission of those signals. Such an intracortical mechanism might therefore be expected to affect auditory processing too; however, present evidence for this specific intracortical mechanism is weaker (e.g., see right scatterplot in Figure 5).
In summary, our experimental results provide support for both of the theories outlined previously. The results of Experiment 2 support the a priori prediction based on the hypothesis of greater cross-connectivity between cortical areas (Hubbard et al., 2011; Ramachandran & Hubbard, 2001) that vEAR individuals would be affected more, rather than less, by tACS. In contrast, the results of Experiment 1 support the alternative a priori prediction based on disinhibited cross-activation in synaesthesia (Neufeld et al., 2012; Grossenbacher & Lovelace, 2001), finding that vEAR individuals are affected less by tACS.
This study had some methodological issues that are important to consider. One potentially critical issue is that it was not possible to ensure perfect blinding, given that tACS at low frequencies such as in the alpha to beta range are known to evoke flickering phosphenes in peripheral vision (Kanai et al., 2008). We consider it unlikely that peripheral flickering could have systematically interfered with sequence discrimination of flashes, which had high contrast and relatively low temporal frequency and which were presented at fixation. It also seems implausible that such response biases alone could have conspired to create the detailed pattern of reciprocal effects shown in Figure 3 and an apparent dependence on “hearing flashes.” Nevertheless, we undertook additional analyses to assess whether reported stimulation sensations could have affected performance. First, there was no significant association between “hearing flashes” and reported stimulation sensations; thus, stimulation sensations cannot fully explain away the effect of hearing flashes on performance. Second, results from classical and Bayesian analyses showed no evidence that stimulation sensations affected the critical cross-over interaction pattern in Figure 3; in contrast, the evidence was much stronger for the role of “hearing flashes” in determining participants' response to stimulation. We therefore conclude that the specific pattern of results is more likely to have depended on hearing flashes than on placebo effects due to subjective differences between stimulation conditions. We did not attempt to replicate this analysis with gamma stimulation because this does not tend to evoke phosphenes (Kanai et al., 2008); thus, there was less chance of compromised blinding.
Endogenous alpha frequency can vary between individuals from about 8–12 Hz (Cecere et al., 2015; Zaehle, Rach, & Herrmann, 2010), and one might therefore wonder whether the null effect of tACS stimulation in vEAR participants could reflect a systematic mismatch between endogenous and exogenous stimulation frequency. Although we cannot assess this here, we note that mismatches as much as ±2 Hz are still effective at influencing perception and performance (Cecere et al., 2015); thus, it seems unlikely that such mismatches would be sufficient to entirely eliminate tACS effects, unless they were substantially greater than 2 Hz.
Average sequence discrimination performance tended to be higher in vEAR participants, as we have found previously (Fassnidge et al., 2017). It is therefore possible that 10-Hz tACS stimulation might have affected performance less in the vEAR group if their performance was already, on average, at a high level. However, any ceiling effects might be expected to reduce improvements more than impairments from tACS, whereas both were reduced to a similar extent in vEAR participants, on average; furthermore, this seems unlikely to explain the results of Experiment 2, which depended on groups distinguished by relative performance between auditory and visual conditions, rather than by absolute performance levels.
It is possible that the interval between sham and stimulation conditions may not have been long enough to completely rule out after-effects. However, we found no differences in the pattern of results depending on session order; in any case, any order effects would be most likely to only weaken apparent stimulation effects and could not explain away differences between stimulation sites that were tested on separate days.
Our results seem consistent with our a priori predictions that specifically assumed modulation of auditory areas. However, inference is limited by uncertainty over the precise brain areas that were stimulated by tACS. Particularly in contrast to the occipital electrode montage, our temporal montage may have caused more current flow through the center of the head; thus, the effects we observed with temporal tACS may reflect modulation of medial and subcortical structures, as well as cortical areas outside auditory cortex, in addition to auditory areas. Interpretations about the functions of stimulated areas must therefore be taken with caution. Future studies using more sophisticated “high-definition” tACS and current flow modeling will be needed to assess whether the effects observed depend specifically on stimulation to auditory and visual cortices.
Finally, it might be wondered whether a single question about whether flashes were actually heard as sounds was sufficient to attribute vEAR abilities to participants. Although no standard diagnostic of this ability exists as yet, we have previously found that this question is predictive of objective sequence discrimination performance (Fassnidge et al., 2017) whereas the question about whether flashes were used in the discrimination task as if they were sounds (i.e., not necessarily actually heard) was not predictive. The present results showed a similar dissociation, where the cross-over interaction effects of stimulation depended significantly on reports of hearing flashes (Figure 3), but not significantly on reports of merely “using” flashes as sounds. This suggests that our questions could validly distinguish people who have a genuine subjective experience of actual auditory sensations (which was predictive of tACS effects) from those who do not but who nevertheless try to use flashes as if they were sounds (which was not predictive). The present evidence suggests that our results may reflect specific physiological differences associated specifically with visually evoked auditory sensations, but this remains to be substantiated using direct physiological measures (e.g., EEG or fMRI).
In conclusion, our results suggest that there are normally occurring individual variations in the level of cross-activation between visual and auditory cortices, which can be disrupted by 40-Hz tACS of auditory areas; however, the subjective awareness of “hearing flashes” may additionally depend on reduced inhibition between visual and auditory cortices, resulting in cooperative rather than competitive interactions between vision and audition, as evidenced by a relative immunity to effects of 10-Hz tACS. Although different stimulation frequencies and tasks result in these apparently opposing effects, the pattern of results is altogether consistent with both the cross-connectivity and disinhibition theories of synaesthesia: There are normally occurring connections between cortical areas that can vary in their richness between individuals, and the disinhibition of these connections is correlated with conscious awareness of vEAR. In addition, our results support distinct functional roles of alpha versus gamma oscillations in respectively regulating the balance of inhibitory competition between modalities versus promoting with signal integration. Altogether, the apparent prevalence of vEAR, the availability of behavioral measures, and also the accessibility of auditory and visual cortices to brain stimulation, all make vEAR a uniquely convenient and potentially powerful platform for experimentally testing the neural basis of both synaesthesia and normal multisensory integration and also for gaining further insights into the causal roles of cortical oscillations of different frequencies in normal perception.
This work was supported by grants from British Academy and Leverhulme (SG151380 and IAF-2017-006 to E. F.). We thank Marinella Cappelletti for comments on the manuscript and James Yearsley for advice on Bayesian analysis.
Reprint requests should be sent to Elliot Freeman, Cognitive Neuroscience Research Unit, Department of Psychology, City University London, Northampton Square, London EC1V 0HB, United Kingdom, or via e-mail: firstname.lastname@example.org.