The ability to synchronize movements to a rhythmic stimulus, referred to as sensorimotor synchronization (SMS), is a behavioral measure of beat perception. Although SMS is generally superior when rhythms are presented in the auditory modality, recent research has demonstrated near-equivalent SMS for vibrotactile presentations of isochronous rhythms [Ammirante, P., Patel, A. D., & Russo, F. A. Synchronizing to auditory and tactile metronomes: A test of the auditory–motor enhancement hypothesis. Psychonomic Bulletin & Review, 23, 1882–1890, 2016]. The current study aimed to replicate and extend this study by incorporating a neural measure of beat perception. Nonmusicians were asked to tap to rhythms or to listen passively while EEG data were collected. Rhythmic complexity (isochronous, nonisochronous) and presentation modality (auditory, vibrotactile, bimodal) were fully crossed. Tapping data were consistent with those observed by Ammirante et al. (2016), revealing near-equivalent SMS for isochronous rhythms across modality conditions and a drop-off in SMS for nonisochronous rhythms, especially in the vibrotactile condition. EEG data revealed a greater degree of neural entrainment for isochronous compared to nonisochronous trials as well as for auditory and bimodal compared to vibrotactile trials. These findings led us to three main conclusions. First, isochronous rhythms lead to higher levels of beat perception than nonisochronous rhythms across modalities. Second, beat perception is generally enhanced for auditory presentations of rhythm but still possible under vibrotactile presentation conditions. Finally, exploratory analysis of neural entrainment at harmonic frequencies suggests that beat perception may be enhanced for bimodal presentations of rhythm.
Musical rhythms tend to be experienced as a sequence of auditory events. From this sequence, a perception of a temporal regularity, referred to as the beat or a pulse, is derived. Perception of a beat is often accompanied by continuous synchronized motor movements, referred to as sensorimotor synchronization (SMS). Prevalent examples of SMS range from the discrete tapping of a foot or hand to full-body movement as seen in dance. SMS often emerges spontaneously and with minimal effort but is affected by stimulus properties. For instance, SMS has been shown to be more accurate for low-frequency (196 Hz) tones compared to high-frequency (466.2 Hz) tones (Hove, Marie, Bruce, & Trainor, 2014) and when the presentation rate falls between 1 and 2.5 Hz (van Noorden & Moelants, 1999).
The spontaneous and effortless nature of SMS has led to the proposal that the mere perception of a beat may rely on the motor system in some manner (Chen, Penhune, & Zatorre, 2008). Neuroimaging studies examining beat perception have implicated the BG and premotor cortex, in particular. These are brain regions commonly associated with the initiation and planning of movement (Balleine, Liljeholm, & Ostlund, 2009). Research has shown that the BG and premotor cortex are active when perceiving a beat and that the level of activity is positively correlated with self-reported measures of the salience of the beat (Grahn & Brett, 2007; Zatorre, Chen, & Penhune, 2007). The role of the motor system in beat perception is further corroborated by functional connectivity analyses that have revealed a strong coactivation between auditory and motor planning areas (Grahn & Rowe, 2009). In addition, rhythm discrimination is impaired in Parkinson disease (Grahn & Brett, 2009), a neurodegenerative disorder that acutely affects dopaminergic-based motor systems (Braak & Braak, 2000).
Entrainment Models of Beat Perception
Although neuroimaging studies that track the hemodynamic response have been critical in revealing the specific cortices, and networks, of the brain that are involved in beat perception, these studies are not able to provide insight into the underlying temporal dynamics of neural activity. Accordingly, methods that track neuroelectric activity of the brain, such as magnetoencephalography (MEG) and EEG, may be seen as important and complementary tools for investigating beat perception. These methods afford researchers an ability to assess neural activity across multiple time scales relevant to beat perception.
Neural activity is intrinsically oscillatory, fluctuating between peaks and troughs of increased and decreased excitability (Llinás, 2014). These peaks and troughs can align with the regularities in an exogenously occurring stimulus such as those found in musical rhythms. Using Fourier analysis, any oscillatory signal can be decomposed into “magnitude” and “phase” spectra. The natural variability present in the phase of endogenous neural oscillations affords an opportunity for alignment with an exogenous oscillator of the same frequency. This phenomenon has been referred to as neural entrainment (Buzsaki, 2006).
Dynamic attending theory (Large & Jones, 1999; Jones & Boltz, 1989; Jones, 1976) posits that patterns in our environment elicit entrainment of attentional processes, which can in turn optimize perceptual abilities (i.e., by way of temporal predictions). This entrainment of attentional processes has been formalized in neural resonance models (e.g., Lakatos, Karmos, Mehta, Ulbert, & Schroeder, 2008; Large & Kolen, 1994), which are able to demonstrate that beat perception may emerge from real-time phase alignment of neural oscillations with an external rhythm (e.g., Large & Snyder, 2009). Whereas the early instantiations of dynamic attending theory focused on auditory input, more recent treatments invoking neural resonance models have focused on environmental input, which could ostensibly come from any modality (e.g., Bauer, Debener, & Nobre, 2020).
Numerous studies have assessed different aspects of neural entrainment using MEG and EEG (e.g., Nozaradan, 2014; Fujioka, Trainor, Large, & Ross, 2012; Henry & Obleser, 2012). In a seminal paper by Nozaradan et al. (2011), a method of analysis, termed frequency tagging, was proposed to quantify the level of neural entrainment. In this analysis, a Fourier transformation is applied to averaged epochs of neuroelectric data collected in response to a given stimulus. The resulting outcome provides an assessment of spectral magnitude at a given frequency. The spectral magnitude at the frequency of the beat has been used to index the extent of neural entrainment.
Although this method has been widely adopted, it is not without criticism. Concerns have been raised about the disregard of phasic information (Rajendran & Schnupp, 2019), which must ultimately be coupled with magnitude information to fully recover a time series. Moreover, there is criticism surrounding the interpretation of spikes contained in the magnitude spectra and whether they reflect a continuous oscillatory signal, or a stochastic sequence of evoked potentials. Nevertheless, recent work by Doelling, Assaneo, Bevilacqua, Pesaran, and Poeppel (2019) has demonstrated that an oscillator model better predicts electrophysiological responses to rhythmic stimuli than does an evoked model.
Research using the frequency-tagging method has demonstrated that passively listening to isochronous rhythms elicits an increase in magnitude at the frequency of the perceived beat as well as its harmonics (Nozaradan, 2014; Nozaradan, Peretz, Missal, & Mouraux, 2011). For example, an isochronous rhythm presented at 2 Hz may elicit an increase in amplitude at 2, 4, or 6 Hz. The extent of entrainment appears to be partly because of top–down attentional processes, as it has been shown that spectral magnitude at harmonics of the beat increases when the participant is instructed to imagine a beat at that frequency (Nozaradan, Peretz, & Mouraux, 2012; Nozaradan et al., 2011).
Neural entrainment also seems to be related to the fidelity of temporal judgments about auditory events. For example, detection of the simultaneity of auditory events was found to be improved when those events were embedded in an isochronous rhythm compared to silence (Lauzon, Russo, & Harris, 2020). Furthermore, tempo discriminations have also been found to be more sensitive when judgments are preceded by isochronous compared to complex rhythms (Drake & Botte, 1993). One possibility is that these improvements in temporal judgments are owed to the ability of the isochronous rhythm to elicit neural entrainment (cf. interval models as described in McAuley & Miller, 2007; Miller & McAuley, 2005; McAuley & Jones, 2003; Matell & Meck, 2000). Remarkably, temporal judgments have even been shown to be influenced by the phase of neural entrainment. Henry and Obleser (2012), found that detection of a perturbation in an auditory stream was optimal when it occurred at the peak of the entrained neural oscillation as compared to its trough. It should be noted that other studies have found evidence for optimal processing at different phases of an entrained neural oscillation (Lenc, Keller, Varlet, & Nozaradan, 2020; Mathias, Zamm, Gianferrara, Ross, & Palmer, 2020; Lakatos, Gross, & Thut, 2019).
Some evidence suggests that SMS is also possible in rhythms perceived through nonauditory modalities. For example, research has demonstrated that participants are able to synchronize to the beat of rhythms presented through light flashes (Hove, Fairhurst, Kotz, & Keller, 2013; Kolers & Brewster, 1985) or vibrotactile stimulation (Ammirante, Patel, & Russo, 2016). Despite these findings, there is an abundance of evidence to support the idea of an auditory advantage for SMS (Repp & Su, 2013). It has been shown that, compared to visually presented rhythms (presented via flashes), SMS to auditory presented rhythms is more accurate and less variable (Patel, Iversen, Chen, & Repp, 2005, Repp 2003, Jäncke, Shah, & Peters, 2000). Moreover, SMS to auditory presented rhythms tends to be more anticipatory (e.g., tapping precedes the onset of the beat) than SMS to visually presented rhythms. Anticipatory timing is a unique property of beat perception that can be distinguished from reactive timing (Lorås, Sigmundsson, Talcott, Öhberg, & Stensdotter, 2012).
In addition, when participants are presented with rhythms that are presented bimodally, involving auditory and visual streams that are out of phase, tapping synchronization tends to follow the timing of the auditory stream (Repp & Penel, 2002). The evidence for an auditory advantage obtained in behavioral studies is further substantiated by a neuroimaging study, which showed increased involvement of motor planning regions during the perception of auditory rhythms compared to temporally matched visual rhythms (Grahn, Henry, & McAuley, 2011).
Evolutionary Hypothesis for Auditory Advantage
Evolutionary explanations for an auditory SMS advantage emphasize the role of action simulation. The Action Simulation for Auditory Perception hypothesis (Patel & Iversen, 2014) suggests that endogenous oscillatory motor generators responsible for periodic movement (e.g., locomotion) provide a framework from which we may perceive and synchronize to auditory beats. They further suggest that these motor generators provide an excellent context for beat perception and synchronization, as they exist in a frequency range that is comparable to the beat rate in music (0.5–4 Hz; see London, 2012).
A second postulate of the Action Simulation for Auditory Perception hypothesis is that beat perception and synchronization were exapted from the neural mechanism that evolved to support “vocal learning” (see Egnor & Hauser, 2004, for additional background). Here, vocal learning refers to the ability to “produce vocal signals based on auditory experience and sensory feedback” (Patel, 2006, p. 101). Evidence for the role of vocal learning in beat perception can be found in studies showing that nonhuman primates, classified as nonvocal learners, do not possess the ability to track a beat (Merchant, Grahn, Trainor, Rohrmeier, & Fitch, 2015, Honing, Merchant, Haden, Prado, & Bartolo, 2012). In contrast, distally related songbirds, which are classified as vocal learners, show a remarkable capacity for tracking a beat (Patel, Iversen, Bregman, & Schulz, 2009). Although some evidence has been provided that nonhuman species are capable of SMS under certain conditions (Gámez et al., 2018; Cook, Rouse, Wilson, & Reichmuth, 2013), their ability to do so is quite limited (Patel & Iversen, 2014).
The Auditory Advantage Revisited
Despite prior evidence showing an auditory advantage for SMS and the likely involvement of neural entrainment in that advantage, recent research suggests that the extent of the auditory advantage is highly contingent on the nature of the rhythm. For example, when visual rhythms are presented using biological motion (e.g., a bouncing ball), the auditory advantage for SMS seems to dissipate (Su, 2014; Hove & Keller, 2010). Similar findings have also been observed for visual stimuli exhibiting dynamic color changes (Varlet, Marin, Issartel, Schmidt, & Bardy, 2012). Moreover, cortical regions related to beat perception seem to be more active to visual rhythms that incorporate dynamic rather than stochastic changes (Hove et al., 2013). These findings may be interpreted as suggesting a modality specialization to temporal processing based on experience. Whereas we have minimal exposure to rhythmic flashes of light, we generally have substantial exposure to rhythmic visual motion (e.g., walking and talking). Hence, a parsimonious view of the evidence is that the auditory advantage is owed to our vast experience with auditory rhythms. These findings call the nature of the auditory advantage into question.
Most of the research examining the auditory advantage has compared visual and auditory rhythms. However, a more natural modality to make comparisons with may be somatosensory. Different classes of mechanoreceptors found in the dermis layer of the skin that are responsible for transducing vibrotactile stimulation show frequency tuning that is compatible with the tonotopic organization of hair cells in the cochlea (Vallbo, Olausson, Wessberg, & Kakuda, 1995). Moving beyond mechanisms of transduction, the inferior colliculus receives substantial input from auditory as well as somatosensory channels (Gruters & Groh, 2012). Furthermore, model estimations of central nervous system processing reveal that, for unimodal conditions, temporal information is optimally integrated from auditory modalities. However, following the inverse effectiveness rule (Holmes, 2007), reliance on nonauditory information becomes more relevant when auditory cues are inconsistent or inaccurate (Elliot, Wing, & Welchman, 2010).
Behavioral research also suggests that complex auditory waveforms can be accurately discriminated on the basis of vibrotactile stimulation alone. More specifically, vibrotactile stimulation can be used by observers to discriminate between musical timbres (Russo, Ammirante, & Fels, 2012) as well as between voices of the same gender (Ammirante, Russo, Good, & Fels, 2013).
A small number of studies have investigated beat perception in rhythms that are presented through vibrotactile stimulation. One such study compared SMS to rhythms of varying complexity presented in auditory, vibrotactile, and multimodal conditions (Ammirante et al., 2016). Stimuli were presented through a matrix of voice coils on the back. This setup allowed the researchers to manipulate the vibroacoustic properties of the stimulus as well as the size of the contact area of stimulation. Results of this study showed that, for isochronous rhythms that were presented over a large contact area, tapping accuracy was comparable between vibrotactile, auditory, and bimodal conditions. However, for nonisochronous rhythms, tapping accuracy in vibrotactile conditions fell off. Moreover, in the case of isochronous rhythms, tapping was anticipatory in both auditory and tactile modalities. Combined, these results suggest that beat perception is possible with isochronous vibrotactile presentations of rhythm.
In the aforementioned study that focused on simultaneity judgments for tones presented in a rhythmic context, Lauzon et al. (2020) also manipulated the modality of rhythmic presentations. They observed that the enhancement in simultaneity judgments obtained for events embedded in an isochronous rhythm was present under auditory presentation of rhythm but was not present under vibrotactile presentation of rhythm. If we assume that the enhancement observed in the auditory condition was underpinned by neural entrainment, then these findings suggest that vibrotactile presentations of rhythm are not capable of eliciting neural entrainment.
Together, the results from these two studies suggest an unresolved debate with regard to whether the auditory modality is superior to the vibrotactile modality with respect to beat perception. Results from Ammirante et al. (2016) suggest that SMS is comparable between auditory and vibrotactile rhythms, albeit only for isochronous rhythms. Conversely, results from Lauzon et al. (2020) suggest a clear auditory advantage for temporal processing. This contradiction invites further investigation comparing auditory and vibrotactile beat perception.
The Current Study
There were two motivations for this study. The first was a replication of the behavioral methods of Ammirante et al. (2016). The replication was only partial, as we chose to drop the size of contact manipulation, focusing only on a large contactor area so as to maximize the likelihood of beat perception through somatosensory input. The second was to extend Ammirante et al. (2016) by incorporating EEG methods, allowing for direct observation of neural entrainment across modalities. In addition, a secondary, more exploratory motivation for this study was to investigate the potential for bimodal enhancement of beat perception when combining auditory and somatosensory input. To address these motivations, SMS and neural entrainment were assessed for rhythms presented in auditory, vibrotactile, and bimodal conditions. The complexity of the rhythm was also manipulated (e.g., isochronous vs. nonisochronous) to determine whether there is an interaction between complexity and presentation modality.1 In the interest of replicating Ammirante et al. (2016), we assessed SMS precision using variability. We also assessed mean asynchrony as a measure of SMS accuracy (after Repp, 2005). Neural entrainment was assessed using the frequency-tagging method (after Nozaradan et al., 2011).
We expected to replicate the behavioral findings of Ammirante et al. (2016), whereby (1) SMS to isochronous rhythms would be comparable for auditory and vibrotactile presentations but an auditory advantage will be revealed for nonisochronous rhythms and (2) SMS would be comparable for auditory and bimodal presentations. Patterns of neural entrainment were hypothesized to mirror patterns observed in SMS data, such that increased SMS would be associated with increased spectral power in EEG at beat frequencies.
Thirty-four participants took part in the study. Thirty participants were recruited through Ryerson University's SONA pool and compensated by course credit. The remaining four participants were recruited through the Ryerson community and were compensated $10/hour. All participants provided informed consent. Data from five participants had to be discarded because of technical malfunctions, leaving us with a sample size of 29 (17 women, 0 nonbinary; mean age = 21.8 years, age range = 17–55 years). Music training in our sample ranged from 0 to 10 years of formal training (mean = 3.1, SD = 3.2). All participants reported having normal hearing and no history of psychological disorders, neurodegenerative diseases, or brain trauma. The study was approved by Ryerson University's research ethics board (2017-389).
Rhythmic stimuli varied in complexity and were characterized as isochronous or nonisochronous based on the Povel and Essens (1985) model (see Figure 1). Signals used to generate rhythmic stimuli consisted of a stochastic series of sine tones created using the Signal Processing Toolbox developed in MATLAB (Matlab_2016b, Version 220.127.116.111655). Each sine tone had a frequency of 196 Hz and a duration of 20 msec inclusive of a linear on/off ramp of 0.05 msec. Sine tone specifications were consistent across all modalities.
All trials began with a lead-in metronome, consisting of eight isochronous tones with an 800-msec interonset interval (IOI). This lead-in was intended to draw the participant's attention to the beat level. Isochronous trials consisted of an additional 64 tones presented with an IOI of 800 msec (i.e., 1.25-Hz beat rate). Nonisochronous trials contained 15 rhythmic sequences used in Ammirante et al. (2016). These rhythmic sequences contained four isochronous tones at the beat rate (800 msec); however, they also contained additional events at the quarter, half, and three-quarter subdivisions of the beat (200, 400, and 600 msec, respectively). Nonisochronous trials were constructed from concatenating the 15 sequences in a unique random order on each nonisochronous trial. All trials had a total duration of 57 sec.
Following Ammirante et al. (2016), stimuli were blocked into three modality conditions (auditory, vibrotactile, and bimodal). Order of modality was counterbalanced between subjects. Each block contained six nonisochronous and six isochronous trials, which were interleaved in a counterbalanced order between participants. In the auditory condition, stimuli were presented using 3M E-A-RTONE 3A insert earphones, a piezoelectric earphone that minimizes interference with EEG signals. In the vibrotactile condition, rhythms were presented using the Subpac S2, a backpack containing a matrix of vocal coils that cover the entire thoracic region. To control for external noise produced by the Subpac, white noise was presented via the piezoelectric insert earphones at a fixed level. White noise was used in the vibrotactile condition to mask any audible signal emanating from the vibrotactile stimuli. In the bimodal condition, stimuli were presented using both the piezoelectric insert earphones and the Subpac.
Two pilot experiments were conducted. A convenience sample of five participants was selected from the laboratory to complete the pilot experiments. None of these participants was involved in the main experiment. The first pilot was conducted to identify the minimum sound level of white noise needed to mask any residual sound arising from the vibrotactile-only conditions (e.g., via bone conduction). Participants were asked to adjust the presentation level of white noise until they could no longer hear a sequence of four vibrotactile sine tones. The adjusted level was averaged across participants and used as the presentation level for the white-noise mask in the actual experiment. A second pilot experiment was conducted to match the perceived magnitude of vibrotactile and auditory stimuli. On each trial of the calibration study, a repeating sequence of two tones was presented. The first tone was auditory, and the second tone was vibrotactile-only. The participant was asked to adjust the level of the auditory tone until it appeared to match the magnitude of the vibrotactile tone. As with the noise-mask pilot, the adjusted level was averaged across participants and used as the presentation level in the actual experiment.
Participants were asked to tap along to rhythms using their dominant hand on a Roland HPD-10 MIDI (Musical Instrument Digital Interface) drum pad. All output from the drum pad was routed through a FocusRite audio interface to ProTools software, where MIDI events were logged.
A 128-channel electrode (BioSemi ActiveTwo) system with a sampling rate of 512 Hz was used to collect neuroelectric data. Electrode placement followed the standard 10/20 montage. Additional external electrodes were placed bilaterally at three sites. Two external electrodes were placed over the mastoid bones as a reference. Two external electrodes were placed on the outer canthus of the eyes to monitor horizontal eye movements. Two external electrodes were placed over the zygomatic bones to monitor eye blinks. EEG data were recorded using ActiView software.
Before participants began the experiment, they filled out questionnaires regarding demographic and musical training information. After the completion of the questionnaires, participants completed the tapping and passive listening tasks. The entire procedure took place within one testing session lasting approximately 2 hr. The order of these tasks was counterbalanced across participants.
Participants completed 36 trials: three modalities (vibrotactile, auditory, bimodal) × 2 rhythm types (isochronous, nonisochronous) × 6 repetitions. Participants were instructed to tap along to the beat of each rhythm using the index finger of their dominant hand on the drum pad. Participants were told that the beat does not necessarily correspond to every onset. This clarification was provided to prevent participants from tapping to every onset in a nonisochronous rhythm. Before experimental trials began, participants were exposed to one nonisochronous rhythm from each modality. This allowed them to become acquainted with what to expect from each modality as well as to further ensure that participants were indeed tapping to the beat and not tapping out the entire rhythmic pattern. No auditory feedback was provided. The total time required to complete the tapping task was approximately 45 min including instructions.
After capping, participants were seated in a sound-attenuated chamber and were instructed to minimize the amount of overt movement during recording. Participants then began the passive listening task, which consisted of 36 trials in total: 3 modalities (vibrotactile, auditory, bimodal) × 2 rhythm types (isochronous, nonisochronous) × 6 repetitions. The number of trials used was determined on the basis of the minimum requirement of data for independent component analysis (ICA; Swartz Center for Computational Neuroscience, 2020). Like the tapping task, trials were blocked by modality, which was counterbalanced across participants. Total time required for the passive listening task was approximately 1 hr, inclusive of instructions and the capping protocol.
Tapping was recorded as a MIDI file, which registers the onset time and velocity of each tap. The MIDI files were subsequently processed using miditoolbox, a MATLAB toolbox designed to process MIDI files. Mean asynchrony of tapping was calculated by taking the mean time between intertrial interval and interbeat interval. Variability was calculated as the standard deviation of mean asynchrony. Both calculations were determined using the closest beat onset to the initial tap.
All processing and signal analysis of EEG data was completed in EEGLAB (Delorme & Makeig, 2004), a MATLAB-based toolbox.
Data of each participant were initially rereferenced to the average activity across all channels. An average rereference is preferential to a mastoid reference because it keeps data full-ranked, which is useful in optimizing the segmentation of data into independent components. Slow drift can contaminate channels because of perspiration. A common technique to correct for this is applying a high-pass filter to the data with a cutoff of 1 Hz. Independent of concerns about slow drift, ICA has been shown to be optimal when using a 1-Hz high-pass filter (Winkler, Debener, Müller, & Tangermann, 2015). However, as our frequency of interest was close to 1 Hz, this method was not ideal. Thus, data were initially subjected to a 0.1-Hz high-pass filter. After filtering, bad channels were flagged and eliminated using the clean_rawdata toolbox. An average of 12.3 electrodes was rejected per participant. After rejection of bad channels, the remaining channels were rereferenced to account for changes in rank. Data were then epoched into 50.4-sec segments starting 1 sec after the onset of each trial and ending 6.6 sec before the end of each trial. Selection of a 50.4-sec epoch was motivated by (1) a desire to have epoch duration equivalent to an integer multiple of the beat frequency, (2) elimination of the transient audio-evoked spike in activity elicited by sound onset (Saupe, Schröger, Andersen, and Müller, 2009), and (3) account for the time required for entrainment to manifest in neuroelectric data (Regan, 1989). Data were epoched into 36 distinct segments (i.e., 12 for each modality). The epochs were then subjected to an ICA using the runica algorithm initially developed by Delorme and Makeig (2004). Artifactual components, including eye blinks and lateral movements, were rejected using a machine-based method, ADJUST, which rejects components based on artifactual spectral and temporal features (Mognon et al., 2011). Finally, epochs within a condition were averaged for each subject. This epoch averaging is thought to enhance signal-to-noise ratio by diminishing input from all non-phase-locked signals.
All electrode signals from averaged epochs were transformed into the frequency domain using a fast Fourier transformation (FFT). A Hanning window was used to calculate the magnitude spectrum for each epoch. Frequency bins were defined as 0.01984 Hz wide (1/50.4 sec.), which ensured that a frequency bin would always land on the beat frequency (0.01984 × 63 = 1.25 Hz). EEG signals were assumed to contain a mix of the oscillatory signals induced from the beat as well as noise-related signals from muscle movements and other artifactual features. To isolate the neural signal induced by the beat, a noise floor was subtracted from each bin of the magnitude spectra. This method subtracts the average magnitude from four bins surrounding, but not adjacent to, a relative frequency bin (after Nozaradan et al., 2012). Two of these four surrounding bins are selected above the relative frequency bin (+0.059 to +0.078 and +0.078 to +0.097), and two are selected from below (−0.059 to −0.078 and −0.078 to −0.097). This method relies on the assumption that EEG activity, devoid of the induced oscillation, should be approximately the average of surrounding bins. Although our FFT parameters were defined to ensure that the beat frequency was an integer number of our sampling window, windowing a sinusoid will always lead to spectral leakage (Harris, 1978). To minimize concerns about spectral leakage, neural entrainment values were calculated for each trial and electrode using the mean amplitude across three frequency bins centered around the beat frequency (after Nozaradan et al., 2012).
In addition, separate entrainment values were calculated at the first, second, and third harmonic of the beat frequency (2.5, 3.75, and 5Hz). Investigating entrainment at harmonic frequencies of the beat may provide further insight regarding differences in the neural response across modality conditions. Following methods used in Nozaradan et al. (2012), we averaged entrainment across all electrodes, providing a measure of whole-brain neural entrainment.
Because this study followed a repeated-measures design, there exists an intrinsic violation of the assumption of independence of observations. To obtain a rough estimate of the amount of dependence in the data, the interclass correlation coefficient (ICC) and design effect (DEFF) were calculated for neural entrainment data and tapping variability. The ICC is a measurement of the amount of homogeneity within a cluster (between participants), which is then used to calculate the DEFF, which is a measure of sampling variability attributable to the study design (McCoach & Adelson, 2010). A DEFF coefficient of 1 indicates a lack of dependence in the data, and coefficients above 2 indicate a high amount of dependence. We found that neural entrainment values did not contain high amounts of dependence (ICC = .16, DEFF = 1.8) and were thus modeled using linear regression. However, tapping variability (ICC = .41, DEFF = 15.42) and mean asynchrony (ICC = .24, DEFF = 9.38) contain high levels of dependence. To account for this, differences in tapping variability and mean asynchrony were modeled using a multilevel linear mixed-effect model, with participants as a random effects factor. This allowed for the intercept of each participant to be included as a random effect. All multilevel models were fitted using the toolbox lme4 (Bates et al., 2015) developed for R.
All models incorporated rhythmic complexity (isochronous, nonisochronous), sensory modality (auditory, vibrotactile, bimodal), and an interaction term as predictor variables. Musical training was included as a covariate in all models. Each predictor variable was dummy coded, with auditory and isochronous rhythms used as the reference for sensory modality and rhythmic complexity, respectively. In the advent of a significant interaction, we examined the simple slopes with reverse dummy codes for rhythmic complexity and ran another regression model.
To examine how tapping results mapped on to neural entrainment, we also ran correlations between variability and neural entrainment and between mean asynchrony and neural entrainment. The neural entrainment and tapping data sets differed in size, because of the averaging method used to calculate entrainment (see Methods). Thus, to run correlations, averages of mean asynchrony and variability were calculated for each condition per participant.
Variability (SMS Precision)
Tapping variability obtained in each condition is plotted in Figure 2. A mixed-effect regression model was carried out to examine the extent to which rhythmic complexity and modality predicted tapping variability. Results showed that, compared to the intercept-only model that contains no predictors, a model that contained Modality and Rhythmic Complexity was a significantly better fit to the data, χ2(8) = 59.8, p < .001, accounting for 45% of the variance (conditional R2 = .45). A model containing Musical Experience as a covariate did not add to explained variance, χ2(9) = 1.74, p = .18; thus, we removed it from our final model.
Collapsing across Modality (see Figure 2A), there was less variability in isochronous compared to nonisochronous trials, b = 0.025, 95% CI [0.016, 0.033], t(870) = 5.96, p < .001. Collapsing across Rhythmic Complexity (see Figure 2B), there was less variability in auditory compared to vibrotactile trials, b = 0.018, 95% CI [0.007, 0.028], t(871) = 3.51, p < .001, but no difference between auditory and bimodal trials, b = 0.0005, 95% CI [−0.009, 0.01], t(870) = 0.1, p = .92. There was also an interaction between Rhythmic Complexity and the Modality Contrast between auditory and vibrotactile conditions (see Figure 2C), b = 0.032, 95% [0.0011, 0.051], t(870) = 3.13, p < .01. Further examination of the simple slopes revealed that, although there was less variability in auditory compared to vibrotactile trials in the nonisochronous condition, b = 0.033, 95% CI [0.019, 0.048], t(870) = 4.71, p < .001, this same auditory advantage was not obtained in the isochronous condition, b = 0.002, 95% CI [−0.12, 0.016], t(870) = 0.28, p = .77.
Mean Asynchrony (SMS Accuracy)
Mean asynchrony values obtained in each condition are plotted in Figure 3. A mixed-effect regression model was carried out to examine the extent to which rhythmic complexity and modality predicted mean asynchrony. Results showed that, compared to the intercept-only model that contains no predictors, a model that contained Modality and Rhythmic Complexity was a significantly better fit to the data, χ2(8) = 26.1, p < .001, accounting for 26% of the variance (conditional R2 = .26). Similar to the variability model, a model containing Musical Experience as a covariate did not add to explained variance, χ2(9) = 0.14, p = .7, and thus was removed from our final model.
Collapsing across Modality, we observed a smaller mean asynchrony in nonisochronous compared to isochronous trials, b = −0.019, 95% CI [−0.039, 0.0002], t(840) = −1.93, p < .05. However, this was the only significant difference observed between conditions. There was no difference between auditory and vibrotactile trials, b = 0.008, 95% CI [−0.012, 0.028], t(840) = 0.78, p = .43, or between auditory and bimodal trials, b = 0.007, 95% CI [−0.012, 0.027], t(840) = 0.76, p = .44. Nor was there any significant interactions between Rhythm Complexity and differences in auditory and vibrotactile trials, b = −0.002, 95% CI [−0.03, 0.026], t(840) = −0.14, p = .88, or in auditory and bimodal trials, b = 0.014, 95% CI [−0.05, 0.007], t(840) = −1.54, p = .13.
Spectra obtained from EEG data for each condition are plotted in Figure 4, and mean entrainment values for each condition are plotted in Figure 5. A linear regression model was carried out to examine whether modality and rhythmic complexity predicted neural entrainment. Model fit parameters showed that the model significantly predicted neural entrainment, F(5, 143) = 6.06, p < .001, accounting for 17% (R2 = .17) of the overall variance. Furthermore, it was shown that, although a model containing Musical Experience as a covariate was a significant fit to the data, F(6, 143) = 5.26, p < .001, it did not significantly increase R2, R2 change = .007. Thus, musical experience was not included in the final model.
Collapsing across Modality (see Figure 5A), we observed a greater degree of neural entrainment in isochronous compared to nonisochronous trials, b = −0.026, 95% CI [−0.045, 0.007], t(143) = −2.71, p < .01. Collapsing across Rhythmic Complexity (see Figure 5B), we observed a greater degree of neural entrainment in auditory compared to vibrotactile trials, b = −0.032, 95% CI [−0.051, −0.012], t(143) = −3.29, p < .01, but no differences between auditory and bimodal trials, b = 0.004, 95% CI [−0.015, 0.023], t(143) = 0.41, p = .67. We did not obtain an interaction between Rhythmic Complexity and the Modality Contrast between auditory and vibrotactile, b = 0.022, 95% CI [−0.004, 0.049], t(143) = 1.64, p = .1, nor the modality contrast between auditory and bimodal trials, b = −0.004, 95% CI [−0.032, 0.022], t(143) = −0.35, p = .72.
Following Nozaradan, Peretz, and Keller (2016), we investigated whether individual differences in entrainment were predictive of behavioral measures. To account for possible interdependencies owing to our repeated-measures design, we used a generalized estimating equation (GEE) to estimate parameters of a generalized linear model. GEEs belong to a class of regression techniques that may be considered semiparametric (Ballinger, 2004), which allowed us to assess whether degree of neural entrainment was predictive of (a) mean tapping variability (precision) or (b) mean asynchrony (accuracy) across conditions. Neither regression equation was found to be significant. Assuming that data from musically trained participants may have been less noisy, we subsequently ran post hoc GEEs, limiting our sample to musically trained participants only (i.e., those with 5 or more years of formal training). Similar to Nozaradan et al. (2016), we found that increases in neural entrainment were predictive of increases in mean asynchrony, z = 2.02, p < .05.
Bimodal Enhancement at Beat Harmonics
Visual inspection of the distribution of spectral magnitude at harmonics of the beat (Figure 4) revealed large spikes in magnitude for bimodal presentations of isochronous trials. This observation motivated us to conduct a series of exploratory post hoc t tests (see Table 1) to compare differences in mean entrainment at harmonic frequencies between bimodal and auditory isochronous trials. A total of four pairwise t tests were used to examine differences between bimodal and auditory entrainment at the beat frequency (1.25 Hz) and the three subsequent harmonics (2.5, 3.75, and 5 Hz). A Holm correction was used to correct for increases in family-wise error.
|Post Hoc Comparisons of Neural Entrainment at Beat and Harmonics .|
|Frequency of Interest .||Mean Entrainment .||t Value .||95% CI .|
|Auditory .||Bimodal .|
|1.25 Hz (beat)||0.085||0.087||−0.21||−0.025, 0.021|
|2.5 Hz||0.046||0.063||−2.21*||−0.027, −0.001|
|3.75 Hz||0.032||0.038||−1.36||−0.014, 0.003|
|5 Hz||0.034||0.046||−2.01||−0.027, −0.00005|
|Post Hoc Comparisons of Neural Entrainment at Beat and Harmonics .|
|Frequency of Interest .||Mean Entrainment .||t Value .||95% CI .|
|Auditory .||Bimodal .|
|1.25 Hz (beat)||0.085||0.087||−0.21||−0.025, 0.021|
|2.5 Hz||0.046||0.063||−2.21*||−0.027, −0.001|
|3.75 Hz||0.032||0.038||−1.36||−0.014, 0.003|
|5 Hz||0.034||0.046||−2.01||−0.027, −0.00005|
All t tests are comparisons between isochronous rhythms.
p < .05 (after Holm correction).
Results from this analysis revealed no significant difference in auditory versus bimodal entrainment was obtained at the beat frequency (1.25 Hz), the second harmonic (3.75 Hz), or the third harmonic (5 Hz). However, there was a significant difference obtained at the first harmonic of the beat (2.5 Hz).
The behavioral component of the current study was a replication of Ammirante et al. (2016). The four main results pertaining to SMS precision were consistent with those obtained in the earlier study. First, we observed an enhancement in SMS precision for isochronous compared to nonisochronous rhythms. Second, we observed an enhancement in SMS precision for auditory compared to vibrotactile presentations of rhythms (i.e., an auditory advantage). Third, this enhancement in SMS precision was qualified by an interaction with rhythmic complexity, whereby the auditory advantage persists for nonisochronous rhythms only. Fourth, the bimodal condition did not lead to an enhancement in SMS precision compared to auditory presentations of rhythm. Whereas the earlier study did not investigate SMS accuracy, we do so here and find that mean asynchrony values are consistently very small and always falling below the threshold of anticipatory tapping suggested by Repp and Su (2013, p. 405), that is, <150 msec. The notion here is that, although the tap does not precede the stimulus (on average), it is shorter than the shortest possible RT (∼150 msec), which can be taken as evidence of anticipation (also see Mates, Radil, & Pöppel, 1992, p. 701).
The current study also represents an important extension of Ammirante et al. (2016). By collecting EEG data during passive observation, we were able to obtain an index of neural entrainment to the same rhythms for which SMS was assessed. Consistent with our SMS precision findings, we observed a greater degree of neural entrainment for (i) isochronous compared to nonisochronous rhythms, (ii) auditory compared to vibrotactile presentations of rhythm (i.e., an auditory advantage), and (iii) no differences in entrainment for bimodal compared to auditory presentations of rhythm. However, distinct from our SMS precision findings, there was no interaction between rhythmic complexity and modality in the neural data, that is, the effects were simply additive. It is worth while noting that neural entrainment at the beat frequency occurred in all conditions, including vibrotactile. We assessed this using post hoc t tests by comparing entrainment values to zero in all conditions (all ps < .001). This finding suggests that, although the extent of beat perception may vary across modality conditions, it was always present.
The topographic maps in Figure 4 reveal a characteristic focal area of entrained activity around central electrodes for auditory and bimodal conditions (e.g., Lenc, Keller, Varlet, & Nozaradan, 2018; Nozaradan et al., 2011, 2012). In addition to this characteristic focal area, entrainment was pronounced over PFC, especially in the bimodal condition. This pattern of entrained prefrontal activity is consistent with the idea of entrained attentional oscillations proposed in dynamic attending models (Large & Jones, 1999) and may further suggest that attentional oscillations are preferentially entrained under bimodal conditions (Suess, Hartmann, & Weisz, 2020; Anderson, Ferguson, Lopez-Larson, & Yurgelun-Todd, 2010). The enhancement of entrained attentional oscillations in bimodal presentation conditions might facilitate top–down attentional switching between the beat and integer multiples of the beat.
The Auditory Advantage
One of our main goals in the current study was to resolve conflicting evidence concerning an auditory advantage for beat perception. In the case of isochronous rhythms, we observed comparable levels of SMS precision between auditory and vibrotactile stimuli. However, in the case of nonisochronous rhythms, we observed a significant auditory advantage in SMS precision. Overall, these SMS results coupled with the entrainment results substantiate the notion that vibrotactile beat perception is possible. Nevertheless, they do suggest that vibrotactile beat perception may be limited in rhythms with higher levels of temporal complexity.
To explain this rhythmic complexity by modality interaction, we may consider the presumed role that motor areas play in beat perception (Kung, Chen, Zatorre, & Penhune, 2013; Grahn & Brett, 2007; Zatorre et al., 2007). Prior findings have shown that increases in the temporal complexity of a rhythm predict the level of activity in motor areas (Lewis, Wing, Pope, Praamstra, & Miall, 2004). An interpretation of these results is that the role of motor areas is compensatory, in that they support the ability to perceive a beat in temporally complex stimuli. This compensatory involvement of motor areas may only manifest with auditory input because of the intrinsic connectivity in place to support auditory–motor interactions. This connectivity has been proposed to support processing of rhythmic stimuli that occur at time scales found in speech (Assaneo & Poeppel, 2018) and music (Fujioka et al., 2012; Grahn & Rowe, 2009; Chen et al., 2008). On the basis of the current results, we suggest that our ability to perceive a beat in temporally complex vibrotactile stimuli is compromised as there would likely be no compensatory support from motor areas available. Although functional connectivity between somatosensory and motor areas is known to exist (e.g., McGregor & Gribble, 2017), the feedback from somatosensory to motor is tuned to higher frequencies that may be used to support grip force adjustments (e.g., Augurelle, Smith, Lejeune, & Thonnard, 2003; Flanagan & Wing, 1997). The relevant activity in this context is above 20 Hz (Giabbiconi, Trujillo-Barreto, Gruber, & Müller, 2007), a frequency range that is well beyond beat frequencies found in music.
Another possible consideration to account for the auditory advantage in nonisochronous rhythms is based on relative experience. For hearing listeners, most of our experience with nonisochronous patterns involves auditory input (i.e., complex rhythms found in music). Thus, our vast experience with nonisochronous auditory rhythms should increase the likelihood that motor areas are recruited to support beat perception compared with nonisochronous vibrotactile rhythms. An experience-based account of these findings is corroborated by research showing that, compared to nonmusicians, musicians show increased recruitment of motor areas during auditory beat perception (Chen et al., 2008).
To examine the extent to which experience mediates beat perception, it would be interesting to examine SMS to vibrotactile rhythms in nonhearing participants (i.e., deaf and hard of hearing). It has been observed that individuals living with deafness show enhanced tactile sensitivity (Levänen & Hamdorf, 2001), which may be because of compensatory plasticity after onset of deafness (Good, Reed, & Russo, 2014). As such, it is possible that individuals living with deafness may show enhanced beat perception for vibrotactile presentations of rhythms. Although some research has compared hearing and nonhearing individuals in SMS for vibrotactile presentations of rhythms (Tranchant et al., 2017), no study has done so while varying rhythmic complexity.
Yet, another possible explanation for the observed differences across modalities is related to inherent differences in the resolution of temporal processing. In particular, it is possible that the IOIs present in the nonisochronous rhythms were shorter than the upper rate (lower IOI) limit for SMS in vibrotactile stimuli. To address this potential explanation, we conducted a very preliminary psychophysical test (n = 1) to determine rate limits for SMS with isochronous rhythms presented in auditory and vibrotactile modalities, modeled after Repp (2003). The result of this test revealed an upper rate limit of 149.7 msec for auditory metronomes and 210.4 msec for vibrotactile metronomes. The vibrotactile upper rate limit falls between the auditory and visual rate limits but is nearly equivalent to the shortest IOI present in the nonisochronous rhythms tested here (200 msec). Although the rate limit data are very preliminary and more support is needed, the comparable rate limits observed between auditory and vibrotactile SMS provide initial evidence that the failure to perceive subdivisions in the nonisochronous vibrotactile rhythms cannot be fully explained by differences in SMS rate limits across modalities.
Exploratory post hoc t tests were conducted to determine if magnitude at the harmonic frequencies was enhanced in bimodal presentations of isochronous rhythms. After correcting for multiple comparisons, we observed higher levels of spectral magnitude at the first harmonic of the beat (2.5 Hz) for bimodal compared to auditory presentations of isochronous rhythm. Spectral magnitude at harmonic frequencies has been interpreted with respect to voluntary attention directed at integer multiples of the beat (Tierney & Kraus, 2014; Nozaradan et al., 2012; Nozaradan et al., 2011). Moreover, humans have a unique ability to readily switch their attention between different beat rates (Repp & Su, 2013). Neural resonance models demonstrate that perception of multiple beat rates, and the ability to switch between them, may be supported by ongoing oscillatory activity (Lakatos et al., 2008; Large & Kolen, 1994). Thus, the observed enhancement of entrainment at the first harmonic of the beat in bimodal presentations of isochronous rhythms may facilitate top–down attentional switching between the beat and integer multiples of the beat (e.g., double time). Although speculative, this account is consistent with our interpretation of the topographic maps, revealing enhanced activity at the beat frequency over PFC for bimodal conditions.
Although this interpretation of the bimodal advantage may be grounded in neural resonance models, it clearly requires further empirical study. Future research could assess SMS after a prompt to switch the tapping rate (e.g., from 1.25 Hz to 2.5 Hz). A modified measurement of task-switching costs (Wylie & Allport, 2000) could be employed to examine the difference in time to regain tapping accuracy after a prompt. A reduction in task-switching costs in bimodal rhythm conditions would further substantiate this proposed explanation for a bimodal advantage in neural entrainment.
For musically trained participants, we found that the degree of neural entrainment was predictive of mean asynchrony, such that higher levels of entrainment led to higher levels of mean asynchrony. This finding is similar to a correlation found by Nozaradan et al. (2016) but should be interpreted with caution given our small sample size and the post hoc nature of the analysis. In addition, it should be noted that the current study and Nozaradan et al. (2016) involve a range of mean asynchrony values, ranging from negative to positive. It is reasonable to question whether positive values can be interpreted as being more accurate. On the other hand, the positive mean asynchrony values observed in the current study were always less than +150 and may therefore be considered anticipatory (see Repp & Su, 2013, p. 405).
As noted earlier, an important limitation of this study is the unknown effect of experience on our results. Most hearing individuals are likely to have minimal experience interacting with vibrotactile-only presentations of rhythm. The lack of experience is most likely to manifest in diminished SMS capacity for temporally complex rhythms, where the compensatory involvement of motor areas would be expected. In addition to testing individuals living with deafness, it would be interesting to conduct a longitudinal study with hearing participants wherein differences in beat perception between auditory and vibrotactile rhythms are monitored over a series of training sessions. Our expectation would be that SMS capacity to temporally complex vibrotactile rhythms would improve over time.
A second limitation of the study concerns our ability to ensure that participants were attending to the entrained beat frequency (1.25 Hz) during the passive listening task. We attempted to predispose participants to entrain at 1.25 Hz by providing an eight-beat metronome at that rate before the start of every trial. However, despite the use of this metronome, it remains possible that participants switched their attention to other frequencies of the beat. This concern is mitigated by the observation that magnitude in the frequency spectrum was higher at the presumed beat frequency compared to its first and second harmonics (2.5 and 3.75 Hz; see Obleser & Kayser, 2019).
Finally, we acknowledge that the rhythms used in our study do not reflect those that are typically encountered in music and that the tasks were rather monotonous. As a result, data may have been partially affected by motivational factors that change over time (e.g., boredom, fatigue). This is a problem that plagues EEG studies that employ passive observation tasks. Currently, averaging techniques used in EEG analyses preclude an examination of these motivational factors; however, robust statistical modeling techniques (e.g., growth curve models, multilevel modeling) have the potential to address these questions (Volpert-Esmond, Merkle, Levsen, Ito, & Bartholow, 2018).
The current study represents a successful replication and extension of Ammirante et al.'s (2016) investigation of vibrotactile beat perception. The behavioral results of Ammirante et al. (2016) were corroborated, wherein more SMS precision was observed for isochronous rhythms compared to nonisochronous rhythms and more SMS precision was observed for auditory compared to vibrotactile presentations of nonisochronous rhythms. These behavioral findings were largely corroborated by the EEG results. The one area of discrepancy in the EEG results was the lack of an interaction between rhythmic complexity and modality. Overall, these findings support the idea of an auditory advantage underpinned by auditory–motor connectivity as well as the notion that vibrotactile presentations of rhythm are fully capable of supporting beat perception. Our exploratory analysis of neural entrainment in harmonic frequencies is also suggestive of a bimodal advantage.
F. A. Russo, Natural Sciences and Engineering Research Council of Canada (http://dx.doi.org/10.13039/501100000038), grant number: RGPIN-2017-06969.
Diversity in Citation Practices
A retrospective analysis of the citations in every article published in this journal from 2010 to 2020 has revealed a persistent pattern of gender imbalance: Although the proportions of authorship teams (categorized by estimated gender identification of first author/last author) publishing in the Journal of Cognitive Neuroscience (JoCN) during this period were M(an)/M = .408, W(oman)/M = .335, M/W = .108, and W/W = .149, the comparable proportions for the articles that these authorship teams cited were M/M = .579, W/M = .243, M/W = .102, and W/W = .076 (Fulvio et al., JoCN, 33:1, pp. 3–7). Consequently, JoCN encourages all authors to consider gender balance explicitly when selecting which articles to cite and gives them the opportunity to report their article's gender citation balance.
We are grateful to Paolo Ammirante for early discussions about this project and to Gabriel Nespoli for his contributions to experimental design and analyses. We would also like to acknowledge Fran Copelli and Hunter Robinson for their creative input with figure design.
Reprint requests should be sent to Sean A. Gilmore, Department of Psychology, Ryerson University, 350 Victoria Street, Toronto, ON, Canada M5B 2K3, or via e-mail: firstname.lastname@example.org.
Note that our use of the terms “isochronous” and “nonisochronous” map on to the terms simple and complex used by Ammirante et al. (2016). The rhythms that we describe as nonisochronous contain no irregular subdivisions and would be considered strongly metrical by most music theoretic accounts.