Abstract

The fusion of rhythm, beat perception, and movement is often summarized under the term “entrainment” and becomes obvious when we effortlessly tap our feet or snap our fingers to the pulse of music. Entrainment to music involves a large network of brain structures, and neural oscillations at beat-related frequencies can help elucidate how this network is connected. Here, we used EEG to investigate steady-state evoked potentials (SSEPs) and event-related potentials (ERPs) during listening and tapping to drum clips with different rhythmic structures that were interrupted by silent breaks of 2–6 sec. This design allowed us to address the question of whether neural entrainment processes persist after the physical presence of musical rhythms and to link neural oscillations and event-related neural responses. During stimulus presentation, SSEPs were elicited in both tasks (listening and tapping). During silent breaks, SSEPs were only present in the tapping task. Notably, the amplitude of the N1 ERP component was more negative after longer silent breaks, and both N1 and SSEP results indicate that neural entrainment was increased when listening to drum rhythms compared with an isochronous metronome. Taken together, this suggests that neural entrainment to music is not solely driven by the physical input but involves endogenous timing processes. Our findings break ground for a tighter linkage between steady-state and transient evoked neural responses in rhythm processing. Beyond music perception, they further support the crucial role of entrained oscillatory activity in shaping sensory, motor, and cognitive processes in general.

INTRODUCTION

Head nodding, foot tapping, and finger snapping are commonly experienced examples of the effortless and engaging fusion between rhythm perception and movement generation. These behaviors are the observable outputs of underlying cognitive timing mechanisms that enable us to extract a pulse from music, to identify a hierarchy of strong and weak beats in this pulse, to form rhythmic expectations, and to plan appropriate movements (Vuust & Witek, 2014; Large, 2008). Together, these processes fall under the concept of “entrainment” and allow us to synchronize with external events. In this study, we used EEG to investigate neural responses (event-related potentials [ERPs] and steady-state evoked potentials [SSEPs]) during listening and tapping to drum clips with three different rhythmic structures. The clips had a total length of approximately 45 sec and were interrupted by silent breaks of 2–6 sec. This design allowed us to compare measures of neural entrainment, sensorimotor synchronization, rhythm perception, and predictive processes during and directly after the presentation of a musical rhythm. By addressing the question whether and to what extent neural entrainment processes persist after the physical presence of musical rhythms, we contribute to the discussion of the interplay between external rhythmic events and internal processing—or bottom–up and top–down processes—in rhythm and beat perception.

On a motor level, entrainment can be defined as coordinated rhythmic movement in response to auditory, visual, tactile, multimodal, or social signals (Phillips-Silver, Aktipis, & Bryant, 2010). This comprehensive definition indicates that the study of motor entrainment is relevant to various research topics like attention (Tierney & Kraus, 2013), language and reading abilities (Tierney & Kraus, 2014; Thomson, Fryer, Maltby, & Goswami, 2006), conversation and joint attention (Richardson, Dale, & Kirkham, 2007), interpersonal affiliation and cooperation (Tarr, Launay, & Dunbar, 2014; Hove & Risen, 2009; Marsh, Richardson, & Schmidt, 2009; Wiltermuth & Heath, 2009), and music perception and production (Large, 2008; Bispham, 2006; Clayton, Sager, & Will, 2005). Compared with language processing, conversation, and cooperation, entrainment to music might demand finer-grained and more precise timing mechanisms (Tierney & Kraus, 2014). Therefore, musical rhythm perception can provide a valuable pathway to explore timing processing and temporal prediction.

During the last decades, a large number of studies investigated the ability to move to a rhythmic pulse—most commonly in finger-tapping paradigms (for reviews, see Repp & Su, 2013; Repp, 2005). Furthermore, even without overt movements, the perception of auditory rhythms activates motor regions of our brain (Stupacher, Hove, Novembre, Schütz-Bosbach, & Keller, 2013; Chen, Penhune, & Zatorre, 2008; Grahn & Brett, 2007). All of this experimental evidence can be viewed in the light of two cognitive frameworks for sensorimotor synchronization and rhythmic processing: predictive coding (e.g., Friston, 2002, 2005; for a review, see Vuust & Witek, 2014) and dynamic attending (e.g., Large & Jones, 1999; for a review, see Henry & Herrmann, 2014). Both frameworks propose that the alignment of bottom–up (external rhythmic events) and higher-order top–down processes enable entrainment (Henry & Herrmann, 2014; Vuust & Witek, 2014). The higher-order processes, such as meter interpretation or rhythm prediction, can be explained on the basis of neural oscillations.

Neural oscillations can be measured noninvasively, for example, with EEG, and reflect synchronized excitability fluctuations (high and low excitability states) in groups of neurons (Fries, 2005; Singer, 1999). Synchronized neural oscillations are thought to be a key element in attention, prediction, the facilitation of neural plasticity, and the binding of parallel computations (Frey, Ruhnau, & Weisz, 2015; Arnal & Giraud, 2012; Schroeder & Lakatos, 2009; Fries, 2005; Buzsáki & Draguhn, 2004; Singer, 1999). Such “dynamic” views of higher-order processing assume that neural synchrony does not only reflect the physical aspects of an external stimulus but can be generated endogenously (Engel, Fries, & Singer, 2001). Because entrainment to music involves a large network of brain structures (Nozaradan, 2014; Grahn, 2012; Zatorre, Chen, & Penhune, 2007), neural oscillations can contribute to explaining how this network is connected.

Dynamic attending or resonance theories propose that beat perception arises from the coupling of endogenous oscillations (e.g., fluctuations of neural populations) and periodic external events (e.g., musical rhythm; Large, Herrera, & Velasco, 2015; Henry & Herrmann, 2014; Large & Jones, 1999; van Noorden & Moelants, 1999; Large & Kolen, 1994). If phase and amplitude of neural oscillations at the consistent frequency of a periodic input are stable over time, we can speak of SSEPs, which can be detected in form of amplitude peaks in the frequency spectrum of the EEG signal. When participants listened to an isochronous metronome with a tempo of 2.4 Hz (417 msec), the imagination of either a march (i.e., imagined accents on every second beat) or a waltz (i.e., imagined accent on every third beat) elicited SSEPs with peaks at 1.2 Hz (2.4 Hz/2) for marches and 0.8 Hz (2.4 Hz/3) for waltzes (Nozaradan, Peretz, Missal, & Mouraux, 2011). This result indicates that neural oscillations not only represent the beat (in this case 2.4 Hz) but also the meter corresponding to the imagined march or waltz (Nozaradan et al., 2011). In an additional experiment, Nozaradan, Peretz, and Mouraux (2012a) used nonisochronous rhythms with various metrical levels to show that SSEPs are also elicited when the beat of a rhythm is not directly present in the audio signal but has to be interpreted. A recent study further showed that neural entrainment does not only play an important role in beat and meter processing in simple and abstract stimuli but also in real music (Tierney & Kraus, 2015).

The analysis of SSEPs is based on neural activity over long time periods (usually at least several seconds) and transforms the recorded waveforms into the frequency domain, whereas ERPs measure neural responses to a specific event in a short time window. ERPs help to explore predictive processes like brain responses to fulfilled predictions (e.g., repetition positivity) or violated predictions (e.g., MMN; for a review, see Bendixen, SanMiguel, & Schröger, 2012). The N1, a negative ERP component peaking about 100–200 msec after stimulus onset, is of special importance for the study of predictive processes. The auditory-evoked N1 is typically attenuated when a target matches a prediction model (e.g., temporal), compared with when there is no prediction model or the target does not match a prediction model (Sanabria & Correa, 2013; Bendixen et al., 2012; Lange, 2009, 2010). MMN and N1 are both modulated by unexpected stimuli. Previous evidence suggests that both components share the same auditory cortex resources and that the MMN arises from the adaptation of N1 activity (Jääskeläinen et al., 2004). Among others, MMNs can be observed when the temporal regularity of an auditory stimulus is violated (van Zuijen, Sussman, Winkler, Näätänen, & Tervaniemi, 2005). Despite the fact that both ERPs (especially N1 and MMN) and oscillatory neural activity are connected to predictive processes, studies comparing the two measures in a rhythm processing context are still missing (Nozaradan, 2014). The high temporal resolution of ERPs can provide insight into the temporal evolvement of neural oscillations and therefore complement the knowledge gained from SSEPs.

In summary, empirical evidence indicates that neural oscillations representing the frequency of an auditory input play an important role for rhythm perception and sensorimotor synchronization. However, it remains an open question how neural entrainment is composed of bottom–up processes, such as passive reflections of the external rhythmic event, versus endogenous top–down processes, such as perceptual grouping or working memory. This question is relevant not only for music perception and action but also for the neuroscience of attention, prediction, time perception, and interpersonal affiliation.

In the present experiment, we combined measures of sensorimotor synchronization (finger-tapping accuracy), periodic neural activity (SSEPs), and event-related neural activity (ERPs, in particular the N1 component) to investigate how each can explain entrainment to naturalistic drum rhythms. The drum clip stimuli were interrupted by three silent breaks with different durations (approximately 2–6 sec) to examine whether the periodic neural responses continue to oscillate when the acoustic input stops. Therefore, SSEPs were analyzed separately during drum playbacks and during subsequent silent breaks. The neural responses to the onset of drum playbacks after each silent break were used for the analysis of ERPs. We hypothesized that neural oscillations corresponding to the frequencies of the drum clips are not solely driven by the physical presence of the acoustic input but partly endogenously generated and thus also exist during silent breaks. However, we expected that neural oscillations during drum playbacks would be more pronounced than during silent breaks. Additionally, we expected that the longer the silent breaks, the weaker the neural oscillations corresponding to the frequencies of the drum clips, and the more negative the N1 amplitude elicited by the onset of drum playbacks after the breaks.

METHODS

Participants

EEG recordings and finger-tapping data were collected from 14 participants (7 women, mean age = 26.1 years, SD = 5.5). Two additional participants were excluded because the reference electrodes and the EOG electrodes picked up artifacts. Two further participants were excluded because of their inability to tap accurately with the beat; their SD of intertap intervals (ITIs) and SD of tap-to-beat asynchronies were larger than twice the median absolute deviation (a common outlier detection metric in small samples; Leys, Ley, Klein, Bernard, & Licata, 2013). The final sample of 14 participants were amateur musicians with musical experience of M = 16.4 years (SD = 6.6) and music lessons (M = 10.4 years, SD = 6.4) on their main instrument (including drums, guitar, bass, cello, flute, and saxophone). Twelve participants played a second instrument (M = 11.9 years, SD = 8.6), and five participants played a third instrument (M = 12.2 years, SD = 9.8). Musical ensemble experience of 13 participants ranged from 4 to 23 years (M = 12.3 years, SD = 6.4). The experiment was approved by the ethics committee of the University of Graz, and participants provided written informed consent.

Auditory Stimuli

Rhythmic Structure

The periodic auditory stimuli consisted of three different drum clips that varied in rhythmic structure. Stimuli were programmed with Logic Pro X software (Apple, Inc., Cupertino, CA). Two drum clips consisting of bass drum, snare drum, and hi-hat sounds were structured similarly to common pop/rock drum rhythms (hereafter referred to as Rhythm 1 and Rhythm 2), and the third drum clip was a sequence of isochronous intervals played by snare drum and hi-hat (hereafter referred to as Metronome). Each instrument was quantized on an eighth note level, adjusted to a fixed MIDI velocity and had a consistent timbre. Figure 1 shows notations (A) and frequency spectra of the sound envelopes (B) of the drum clips.

Figure 1. 

(A) Notations of the drum clips. (B) Frequency spectra of the sound envelopes of the 108 BPM drum clips. Envelopes and spectra were computed with mirenvelope and mirspectrum (default options) of the MIR toolbox 1.5 for Matlab (Lartillot & Toiviainen, 2007). (C) Example sequence of one of six playback and break orders. One bar consists of four quarter notes.

Figure 1. 

(A) Notations of the drum clips. (B) Frequency spectra of the sound envelopes of the 108 BPM drum clips. Envelopes and spectra were computed with mirenvelope and mirspectrum (default options) of the MIR toolbox 1.5 for Matlab (Lartillot & Toiviainen, 2007). (C) Example sequence of one of six playback and break orders. One bar consists of four quarter notes.

Silent Breaks

Each drum clip contained three silent breaks that lasted one bar (4 quarter notes), two bars (8 quarter notes), and three bars (12 quarter notes). The order of breaks was randomized and counterbalanced. Silent breaks were preceded and followed by four bars of drum playback, resulting in a total length of 22 bars for every drum clip (Figure 1C).

Tempo

To ensure that neural oscillations were actually evoked by the stimuli and not by a random factor, drum clips were presented at two tempi: 108 and 132 beats per minute (BPM), corresponding to 1.8 Hz/555.6 msec and 2.2 Hz/454.6 msec, respectively. These tempi were selected based on previous research showing that sensorimotor synchronization is most comfortable and accurate around 2 Hz (MacDougall & Moore, 2005; Repp, 2005; Drake, Jones, & Baruch, 2000; van Noorden & Moelants, 1999). The resulting durations of drum clips were 48 sec (22 bars at 108 BPM) and 39 sec (22 bars at 132 BPM).

Experimental Design

Presentation software (version 17; Neurobehavioral Systems, Berkeley, CA) was used to present the stimuli and to record the taps. Taps were recorded with a MaKey MaKey controller (JoyLabz, makeymakey.com), an Arduino-based circuit board connected to a 3 × 3 cm aluminum pad placed on a table on which participants rested their arms. The experiment consisted of two blocks with 18 trials each. In one block, participants passively listened to the drum clips and were instructed not to move. In the other block, participants tapped their index finger to the beat. The order of blocks was counterbalanced between participants. For the tapping task, participants were instructed to tap in time with the beat as accurately as possible and to continue tapping at the same rate during the silent breaks. To keep the listening task comparable to everyday music listening, participants were not explicitly instructed to try to imagine that the drum clips continue during the silent breaks. The 18 trials in each block resulted from a combination of the factors rhythmic structure (Rhythm 1, Rhythm 2, Metronome), tempo (108, 132 BPM), and break order. The six possible break orders were divided into two groups with three different break orders each (counterbalanced between tempo and participants). The experiment was self-paced, and participants started each trial by pressing the space bar on a computer keyboard. During trials, participants were asked to look at a fixation cross presented on a computer screen one meter in front of them.

EEG Recording

The EEG was recorded from 18 Ag/AgCl electrodes placed on the scalp according to the international 10–20 system and from two additional reference electrodes placed on the left and right mastoids. The AFz electrode was used as ground. Eye movements were recorded using three electrodes placed on the outer canthi of the eyes and above the nasion. Impedances of all electrodes were kept under 10 kΩ. Signals were recorded continuously, amplified, digitized (1000 Hz), and prefiltered with a 50-Hz notch filter (BrainAmp Standard amplifier, Brain Products GmbH, Munich, Germany).

Data Processing

EEG Preprocessing and Analyses

EEG preprocessing and analyses were performed with the software Brain Vision Analyzer 2.01 (Brain Products GmbH, Munich, Germany). EEG signals were referenced to linked mastoids, high-pass filtered (0.3 Hz, 12 dB), segmented into the 36 individual trials, and corrected for eye movement artifacts using the algorithm developed by Gratton, Coles, and Donchin (1983). EEG signals with voltage steps over 150 μV/msec or 300 μV during a time window of 200 msec were rejected (9.38% in SSEP analysis during drum playbacks, 3.70% in SSEP analysis during silent breaks, and 1.32% in ERP analysis). EEG recordings of each trial were segmented into the three breaks and four drum playbacks in each drum clip.

Steady-state Potentials

Two main analyses were conducted to determine whether beat-related SSEPs were elicited.

In the first analysis series, we investigated SSEPs during drum playback segments only. EEG recordings during the first two quarter notes of each drum playback segment were removed from the analysis to exclude ERPs caused by drum-playback onsets (Nozaradan et al., 2011). EEG segments were averaged across trials for each participant, task, tempo, and rhythm in the time domain to maximize the signal-to-noise ratio of phase-locked EEG signals. The resulting signals of the five frontocentral electrodes (Fz, FC1, FCz, FC2, Cz) were averaged, as previous research showed that SSEPs are most predominant in these areas (Nozaradan, Peretz, & Mouraux, 2012b). Averaged signals were transformed in the frequency domain by computing a discrete fast Fourier transformation with a frequency resolution of 0.1 Hz. To analyze whether SSEPs were elicited, spectral peaks at the first and second harmonic frequency of the drum clips were normalized and compared against zero. Normalized spectral peaks were computed by subtracting the mean amplitude of all data points in a specific interval around the target frequency from the amplitude of the target frequency peak (1). Consequently, a normalized spectral peak that is significantly greater than zero indicates the presence of an SSEP.
formula

f describes the fundamental frequency (equivalent to the first harmonic frequency) of the stimulus. In a musical rhythm, the fundamental frequency f is most commonly considered as the inter-onset interval of the quarter notes in Hz (here 1.8 Hz for 108 BPM drum clips and 2.2 Hz for 132 BPM drum clips). nf describes the nth harmonic frequency. The minimum and maximum of the interval are narrowed by 0.1 Hz to exclude potential spectral subpeaks halfway between the harmonic frequencies. The resulting values used in the current experiment are shown in Table 1.

Table 1. 

Harmonic Frequencies (First Harmonic [f] and Second Harmonic [2f]) and Normalization Intervals of 108 and 132 BPM Stimuli Used for the Normalization of Spectral Peak Amplitudes

108 BPM132 BPM
Peak FrequencyNormalization IntervalPeak FrequencyNormalization Interval
1.8 [1.0, 2.6] 2.2 [1.2, 3.2] 
2f 3.6 [2.8, 4.4] 4.4 [3.4, 5.4] 
108 BPM132 BPM
Peak FrequencyNormalization IntervalPeak FrequencyNormalization Interval
1.8 [1.0, 2.6] 2.2 [1.2, 3.2] 
2f 3.6 [2.8, 4.4] 4.4 [3.4, 5.4] 

To exclude the possibility that SSEPs during the listening part of the experiment resulted from beat-related movements, EMG signals were recorded from left leg (musculus tibialis anterior) and right index finger (first dorsal interosseous) muscles using Ag/AgCl electrodes. A ground electrode was placed on the clavicle. EMG signals of hand and leg muscles during listening to the drum playbacks and during silent breaks were averaged across trials for each participant at each tempo. The resulting waveforms were transformed in the frequency domain and normalized similarly to the EEG signals. Normalized spectral peaks of both tempi were combined to compute 4 one-sample t tests (first and second harmonic frequency, during stimulus presentation and silent breaks) against zero. The t tests revealed that there were no overt movements in the listening part that could have systematically affected the SSEP results (normalized peaks between −0.05 and 0.01 μV, all ps > .11). In addition to the ocular correction, evoked peaks in the EEG spectrum were normalized by spectral subtraction of the corresponding peaks in the EOG spectrum to further exclude the possibility that SSEPs resulted from signals unrelated to brain activity.

In the second analysis series, we investigated SSEPs during the first bar of each silent break (2.2 and 1.8 sec for 108 and 132 BPM drum clips, respectively). The analysis was restricted to the first bar of each break to maximize sample size and to minimize potential effects of different signal-to-noise ratios (i.e., better signal-to-noise ratios in longer breaks). EEG segments were averaged across trials for each participant, task, and tempo in the time domain. Averaged signals were transformed in the frequency domain by computing a fast Fourier transformation with a frequency resolution of 0.2 Hz. Because the break segments were shorter than the drum playback segments, we left out the comparison of rhythmic structures to further increase the signal-to-noise ratio. All other processing steps were the same as described above.

Event-related Potentials

For the analyses of ERPs, we combined data from the 108 and 132 BPM conditions and computed two different averages. To investigate the effect of break length, ERPs from the first beat after the silent breaks of all three rhythm conditions (36 total ERPs per condition; thereof 18 during tapping and 18 during listening) were averaged for each break length. To investigate the effect of rhythmic structure, ERPs from the first beat after the silent breaks of all three break lengths were averaged for Rhythm 1, Rhythm 2, and Metronome (36 total ERPs per condition; thereof 18 during tapping and 18 during listening). ERP amplitudes were baseline-corrected (−100 to 0 msec before stimulus onset), and signals were low-pass filtered (20 Hz, 24 dB) in both analyses. Because previous research indicates that the N1 is especially meaningful in rhythm perception and predictive processing (Sanabria & Correa, 2013; Bendixen et al., 2012; Lange, 2009, 2010), we focused our ERP analysis on this component. In line with previous research showing that the N1 component is predominantly distributed over frontocentral areas, amplitudes of Fz, FC1, FCz, FC2, and Cz electrodes were averaged (e.g., Bendixen et al., 2012). N1 peaks were defined as local negative maxima between 90 and 160 msec after the stimulus onset. The N1 peak amplitudes (±2 data points, i.e., ±2 msec) were exported for further analyses.

Tapping Data

ITIs were computed by subtracting the absolute time of a tap n from the absolute time of the following tap n + 1. The first four taps were excluded from the analysis. After excluding ITIs shorter or longer than 50% of the stimulus inter-onset interval (indicating doubled or missing taps), ITIs that were more than 2.5 standard deviations from the mean ITI for each participant, trial, and segment (stimulus presentation vs. silent break) were removed (2.31%). We further calculated tap–beat asynchronies as the difference between tap times and the quarter note onset times of the drum clips with the remaining taps. Taps that deviated more than one fourth from the quarter note onsets of the drum clips (indicating “off-beat” taps, 3.21%) and taps that were more than 2.5 standard deviations from the mean tap–beat asynchrony for each participant, trial, and segment (2.65%) were excluded. Tapping performances of two participants were excluded because of a technical error of the tapping device (82% and 58% missing taps). The standard deviation of ITIs provides information about a person's ability to tap at a steady rate, whereas the standard deviation of tap–beat asynchronies represents the accuracy with which a person taps in time with the beat (Stupacher, Hove, & Janata, 2016).

Statistical Analysis

Multiple comparisons were controlled for false discovery rate (Benjamini & Hochberg, 1995).

RESULTS

Behavioral Results

The mean ITI provides information about a person's tapping rate. The mean ITI with 108 BPM clips was 555.30 msec (corresponding to 1.801 Hz; SD = 2.09 msec), and the mean ITI with 132 BPM clips was 454.02 msec (corresponding to 2.203 Hz; SD = 1.25 msec). Two paired samples t tests, one for each tempo, revealed no differences between mean ITIs during drum playbacks and breaks (both ps > .2), indicating accurate tapping rates during breaks. For the following analyses, tapping data of 108 and 132 BPM clips were combined. An ANOVA on the SD of ITIs with the factors Rhythmic structure (Rhythm 1, Rhythm 2, Metronome) and Playback (stimulus presentation, silent break) revealed a main effect of Playback (F(1, 11) = 27.50, p < .001, ηp2 = .71), indicating that the variability of ITIs was higher during drum playbacks (M = 20.52, SD = 3.09) compared with breaks (M = 17.69, SD = 3.71). No main effect of Rhythmic structure and no interaction were found (both ps > .4). As a second measure of tapping performance, we analyzed the standard deviation of tap–beat asynchronies, representing the accuracy with which a person taps in time with the beat. An ANOVA with the factors Rhythmic structure and Playback indicated that the variability of tap–beat asynchronies during drum playbacks (M = 18.35, SD = 2.38) was lower than the variability during silent breaks (in this case tap–beat asynchronies from the “hypothetical” beat, M = 34.49, SD = 5.15; F(1, 11) = 178.13, p < .001, ηp2 = .94). No main effect of rhythmic structure and no interaction were found (both ps > .18). Taken together, the ITI and tap–beat asynchrony results indicate that during breaks participants tapped at a steadier rate but less in time with the beat as compared with during stimulus presentation. They further suggest that participants' tapping performances were equally good with the three rhythmic structures.

Steady-state Evoked Potentials

EEG frequency spectra were examined to determine whether beat-related SSEPs were elicited during drum playbacks and during the first bar of silent breaks. First, we focus on SSEPs during drum playbacks. Figure 2 shows the raw frequency spectra of frontocentral electrodes (Fz, FC1, FCz, FC2, Cz) during listening and tapping to 108 and 132 BPM drum playbacks and the corresponding topographic plots at the first harmonic frequencies (1.8 Hz and 2.2 Hz). Topographic plots (Figure 2, top) show peak activations over frontal sites during listening and tend to extend to the left side during tapping with the right index finger. The frequency spectra (Figure 2, bottom) show clear amplitude peaks at the first and second harmonic of the stimulus frequencies. To test for the presence of SSEPs in the first (f) and second (2f) harmonic frequency (corresponding to quarter notes and eighth notes, respectively), normalized spectral peaks during listening and tapping to drum playbacks were compared against zero (Figure 3). Data of 108 and 132 BPM conditions were combined for this and the following analyses. Results indicate the presence of SSEPs in both harmonic frequencies during both listening and tapping (f, tapping, t(13) = 6.00, p < .001; 2f, tapping, t(13) = 5.02, p < .001; f, listening, t(13) = 4.22, p = .001; 2f, listening, t(13) = 2.83, p = .014).

Figure 2. 

Frequency spectra of averaged frontocentral electrodes (marked by asterisks in the corresponding topographic plots) during tapping (black) and listening (red) to (A) 108 BPM drum playbacks and (B) 132 BPM drum playbacks. Amplitudes represent the mean responses to Rhythm 1, Rhythm 2, and Metronome before normalization. Note that the heatmap colors were fitted to the maximum amplitude of the corresponding frequency spectrum to increase localization information.

Figure 2. 

Frequency spectra of averaged frontocentral electrodes (marked by asterisks in the corresponding topographic plots) during tapping (black) and listening (red) to (A) 108 BPM drum playbacks and (B) 132 BPM drum playbacks. Amplitudes represent the mean responses to Rhythm 1, Rhythm 2, and Metronome before normalization. Note that the heatmap colors were fitted to the maximum amplitude of the corresponding frequency spectrum to increase localization information.

Figure 3. 

Mean normalized spectral peaks (108 and 132 BPM combined) during tapping and listening to the drum playbacks for the first (f) and second (2f) harmonic frequency. Asterisks represent significance values (**p < .001, *p < .05) of t tests against zero, indicating the presence of beat-related SSEPs. Error bars represent ±1 SE.

Figure 3. 

Mean normalized spectral peaks (108 and 132 BPM combined) during tapping and listening to the drum playbacks for the first (f) and second (2f) harmonic frequency. Asterisks represent significance values (**p < .001, *p < .05) of t tests against zero, indicating the presence of beat-related SSEPs. Error bars represent ±1 SE.

SSEPs as a Function of Harmonic Frequencies

A 2 × 2 ANOVA on normalized spectral peak amplitudes with the factors Harmonic frequency (f, 2f) and Task (listen, tap) revealed a significant main effect of Harmonic frequency (F(1, 13) = 18.60, p = .001, ηp2 = .59), with greater SSEPs at the first harmonic frequency. Normalized spectral peaks during tapping were higher than during listening (F(1, 13) = 44.90, p < .001, ηp2 = .78). Harmonic frequency interacted with Task (F(1, 13) = 12.30, p = .004, ηp2 = .49; see Figure 3).

SSEPs as a Function of Rhythmic Structure

To examine the effects of rhythmic structure on SSEPs in both harmonic frequencies (f and 2f), we computed two individual 3 × 2 ANOVAs with the factors Rhythmic structure (Rhythm 1, Rhythm 2, Metronome) and Task (listen, tap) on normalized spectral peak amplitudes. Mean amplitudes of normalized spectral peaks for the three rhythmic structures are shown in Table 2. A significant main effect of Rhythmic structure was only found in the ANOVA on normalized peaks in the second harmonic frequency (F(2, 26) = 3.98, p = .031, ηp2 = .23; first harmonic frequency, p > .2). Two separate ANOVAs on normalized spectral peaks in the second harmonic frequency revealed a main effect of rhythmic structure during listening (F(2, 26) = 4.99, p = .015, ηp2 = .28), but not during tapping (F(2, 26) = .57, p > .5). During listening, pairwise comparisons indicated higher SSEPs with Rhythm 2 compared with Metronome (t(13) = 2.84, p = .014), a tendency for higher SSEPs with Rhythm 1 compared with Metronome (t(13) = 2.01, p = .065), but no difference in SSEPs between Rhythms 1 and 2 (t(13) = −1.56, p = .143).

Table 2. 

Means of Normalized Spectral Peaks in μV during Listening and Tapping to Rhythm 1, Rhythm 2, and Metronome (108 and 132 BPM Combined)

Harmonic FrequencyMean (SD) of Normalized Spectral Peaks during TappingMean (SD) of Normalized Spectral Peaks during Listening
Rhythm 1Rhythm 2MetronomeRhythm 1Rhythm 2Metronome
f (1.8 and 2.2 Hz) 0.437 (0.266) 0.448 (0.298) 0.466 (0.319) 0.153 (0.100) 0.141 (0.146) 0.233 (0.275) 
2f (3.6 and 4.4 Hz) 0.113 (0.128) 0.146 (0.133) 0.112 (0.136) 0.041 (0.081) 0.084 (0.115) 0.008 (0.086) 
Harmonic FrequencyMean (SD) of Normalized Spectral Peaks during TappingMean (SD) of Normalized Spectral Peaks during Listening
Rhythm 1Rhythm 2MetronomeRhythm 1Rhythm 2Metronome
f (1.8 and 2.2 Hz) 0.437 (0.266) 0.448 (0.298) 0.466 (0.319) 0.153 (0.100) 0.141 (0.146) 0.233 (0.275) 
2f (3.6 and 4.4 Hz) 0.113 (0.128) 0.146 (0.133) 0.112 (0.136) 0.041 (0.081) 0.084 (0.115) 0.008 (0.086) 

In both harmonic frequencies (f and 2f), normalized spectral peaks during tapping were greater than during listening (F(1, 13) = 26.02, p < .001, ηp2 = .67 and F(1, 13) = 11.33, p = .005, ηp2 = .47). No significant interactions between rhythmic structure and task were found (both ps > .3).

SSEPs during Silent Breaks

EEG frequency spectra during the first bar of each break were analyzed to investigate whether SSEPs sustained without external rhythmic events. Normalized spectral peaks of the first bar of each break were compared against zero. They indicate the existence of neural oscillations during silent breaks at the first harmonic frequency (f) during tapping (M = 0.120 μV, SD = 0.09; t(13) = 4.83, p < .001), but not during listening (M = −0.013 μV, SD = 0.06; t(13) = −0.75, p = .469). At the second harmonic frequency (2f), normalized spectral peak amplitudes did not significantly differ from zero (during tapping, M = 0.018 μV, SD = 0.05, t(13) = 1.36, p = .197; during listening, M = −0.005 μV, SD = 0.04, t(13) = −0.45, p = .663).

Event-related Potentials

Having confirmed oscillatory responses (i.e., SSEPs) during drum playbacks and for the tapping task also during silent breaks, we next analyzed the ERPs evoked by drum-playback onsets after the breaks. This additional analysis allowed us to assess whether ongoing oscillatory neural activity affects early-phase rhythm processing.

ERPs as a Function of Break Length

A 3 × 2 ANOVA on N1 amplitudes with the factors Break length (1, 2, 3 bars) and Task (listen, tap) revealed a main effect of Break length (F(2, 26) = 19.72, p < .001, ηp2 = .60), with smaller N1 components (i.e., less negative amplitudes) after one-bar compared with two-bar breaks (t(13) = 3.71, p = .003), after two-bar compared with three-bar breaks (t(13) = 3.03, p = .010), and after one-bar compared with three-bar breaks (t(13) = 5.49, p < .001; Figure 4A). N1 components during listening (M = −7.38, SD = 4.30) were smaller than during tapping (M = −10.17, SD = 4.32; F(1, 13) = 7.94, p = .015, ηp2 = .38). The interaction between Break length and Task was not significant (p = .771). Two separate ANOVAs on N1 amplitudes in each of the two tasks revealed similar effects of break length during listening (F(2, 26) = 13.24, p < .001, ηp2 = .51) and during tapping (F(2, 26) = 14.05, p < .001, ηp2 = .52; Table 3).

Figure 4. 

N1 components (between 90 and 160 msec) following drum-playback onsets (time = 0 msec) after silent breaks. (A) ERPs following one bar (4 quarter notes break), two bars (8 quarter notes break), and three bars (12 quarter notes break) of silence. ERPs of all three rhythm conditions and both tasks (tapping vs. listening) were averaged. (B) ERPs of Rhythm 1, Rhythm 2, and Metronome playback onsets. ERPs of all three break lengths and both tasks were averaged.

Figure 4. 

N1 components (between 90 and 160 msec) following drum-playback onsets (time = 0 msec) after silent breaks. (A) ERPs following one bar (4 quarter notes break), two bars (8 quarter notes break), and three bars (12 quarter notes break) of silence. ERPs of all three rhythm conditions and both tasks (tapping vs. listening) were averaged. (B) ERPs of Rhythm 1, Rhythm 2, and Metronome playback onsets. ERPs of all three break lengths and both tasks were averaged.

Table 3. 

Means and SDs of N1 Amplitudes following Drum-playback Onsets after One Bar (4 Quarter Notes Break), Two Bars (8 Quarter Notes Break), and Three Bars (12 Quarter Notes Break) of Silence during Listening and Tapping

N1 Amplitudes during ListeningN1 Amplitudes during Tapping
MeanSDMeanSD
4/4 break −4.40 3.08 −6.74 3.49 
8/4 break −7.51 3.84 −10.79 4.55 
12/4 break −10.24 5.99 −12.99 4.92 
N1 Amplitudes during ListeningN1 Amplitudes during Tapping
MeanSDMeanSD
4/4 break −4.40 3.08 −6.74 3.49 
8/4 break −7.51 3.84 −10.79 4.55 
12/4 break −10.24 5.99 −12.99 4.92 

ERPs as a Function of Rhythmic Structure

A 3 × 2 ANOVA on N1 amplitudes with the factors Rhythmic structure (Rhythm 1, Rhythm 2, Metronome) and Task (listen, tap) revealed a main effect of Rhythmic structure (F(2, 26) = 4.88, p = .016, ηp2 = .27) with smaller N1 components for Rhythm 2 compared with Metronome (t(13) = 3.11, p = .008) and a tendency for Rhythm 1 compared with Metronome (t(13) = 2.24, p = .043, false discovery rate-corrected alpha level = .033; Figure 4B). The difference between Rhythms 1 and 2 was not significant (p = .762). Similar to the effect of break length, N1 components during listening were smaller than during tapping (F(1, 13) = 6.95, p = .021, ηp2 = .35). The interaction between Rhythmic structure and Task was not significant (p = .774). Two separate ANOVAs on N1 amplitudes in each of the two tasks revealed a similar trend in the effect of rhythmic structure during listening (F(2, 26) = 2.54, p = .098, ηp2 = .16), but not during tapping (F(2, 26) = 1.19, p = .319.

Behavioral Relevance of ERPs

To evaluate the behavioral relevance of ERPs, we examined the relationship between N1 components and tapping performance (as measured by the standard deviations of ITIs and tap–beat asynchronies). We found a significant negative correlation between the mean amplitude of N1 components and the SD of ITIs (r = −.75, p = .005; Figure 5A) and a strong tendency for a negative correlation between the mean N1 amplitude and the SD of tap–beat asynchronies (r = −.55, p = .065; Figure 5B). In both correlations, better tapping performances were associated with smaller N1 amplitudes. Similar relationships were found when focusing on the tapping performances during breaks (SD of ITIs, r = −.63, p = .028; SD of tap–beat asynchronies, r = −.45, p = .140). No significant correlations were found between N1 amplitudes and SSEPs (all ps > .1).

Figure 5. 

Relationship between tapping performance during breaks (i.e., [A] the mean standard deviation of ITIs, and [B] the mean standard deviations of tap–beat asynchronies) and mean N1 amplitude. The negative correlations indicate that the better the tapping performance, the lower (i.e., less negative) the N1 component.

Figure 5. 

Relationship between tapping performance during breaks (i.e., [A] the mean standard deviation of ITIs, and [B] the mean standard deviations of tap–beat asynchronies) and mean N1 amplitude. The negative correlations indicate that the better the tapping performance, the lower (i.e., less negative) the N1 component.

DISCUSSION

Commonly discussed frameworks of rhythm perception, such as predictive coding or dynamic attending, emphasize the importance of both bottom–up (e.g., acoustic stimulus features) and top–down (e.g., beat and meter interpretation) processes for entrainment to music (e.g., Large et al., 2015; Henry & Herrmann, 2014; Vuust & Witek, 2014; Friston, 2005; Large & Jones, 1999). To disentangle these processes and to provide further evidence of neural entrainment to naturalistic rhythms, this study combined measures of ongoing rhythm-related neural oscillations (SSEPs), transient neural indicators of predictive timing processes (N1), and sensorimotor synchronization (finger tapping). The analysis of finger-tapping data indicated that participants successfully tapped in time with the beat. The variability of tap–beat asynchronies was lower during stimulus presentation compared with silent breaks, whereas the variability of ITIs was higher during stimulus presentation. These results indicate that during silent breaks participants tapped more steadily but less accurately with the beat frequency. The analysis of SSEPs during stimulus presentation revealed that neural oscillations at the beat rate occurred in the tapping task as well as in the listening task. In the silent breaks between stimulus presentations however, SSEPs were only found during tapping. The finer-grained analysis of ERPs showed that N1 peak amplitudes were more negative after longer breaks and for the isochronous metronome compared with the two rhythmic stimuli. Additionally, the behavioral relevance of N1 components was shown by their negative relationship with tapping stability and accuracy. We will now discuss the results in the context of previous studies of rhythm perception and neural entrainment.

The presence of SSEPs and the topographic distribution over frontocentral sites during the presentation of auditory rhythms is consistent with previous research (Nozaradan et al., 2011, 2012a). On the basis of the assumption that we recorded the electrical activity of neural populations that functioned as an oscillator synchronizing to an external periodic input, we interpret the existence of SSEPs as reflecting neural entrainment (Thut, Schyns, & Gross, 2011).

Our findings suggest that neural entrainment was modulated by the rhythmic structure of the auditory input. We found a main effect of rhythmic complexity on SSEPs in the second harmonic frequency (corresponding to eighth notes). During the listening task SSEP amplitudes were higher with Rhythm 1 (tendency) and Rhythm 2 (significantly) compared with the isochronous Metronome. Figure 1B shows that, in contrast to the Metronome, the second harmonic frequency was predominant in the two rhythms' envelope spectra, which might explain why only SSEPs in the second harmonic frequency were affected by rhythmic structure. On the other hand, we found no difference in SSEPs in the first harmonic frequency, although this frequency was more prominent in the Metronome compared with Rhythms 1 and 2. Despite the fact that the amplitude of the first harmonic frequency in the spectra of Rhythms 1 and 2 was low, SSEPs at this frequency were higher than SSEPs in the second harmonic frequency. This result suggests that neural entrainment was not solely driven by the physical properties of the sound but also depends on top–down processes like beat and meter interpretation. A comparable effect was found by Nozaradan and colleagues (2012a).

The effect of rhythmic structure on SSEPs was supported by the analysis of ERPs corresponding to drum-playback onsets after silent breaks: N1 peak amplitudes for Rhythm 1 and Rhythm 2 onsets were smaller (i.e., less negative) than for Metronome onsets (Figure 4B). Because the attenuation of auditory-evoked N1 components has previously been linked to improved predictive timing processes (Sanabria & Correa, 2013; Bendixen et al., 2012; Lange, 2009, 2010), these results suggest that Rhythm 1 and Rhythm 2 drum clips, as compared with Metronome, enabled more stable entrainment.

Both measures, SSEP and N1 amplitudes, were similarly affected by rhythmic structure, with Rhythms 1 and 2 promoting rhythm processing in comparison with Metronome drum clips. This effect might be due to a “subdivision benefit” (Repp, 2003) resulting from the finer-grained metrical information in Rhythms 1 and 2. Up to a certain point, more complex rhythms (i.e., rhythms with denser metrical hierarchies) might enable more complex neural oscillations (i.e., oscillations with a richer harmonic spectrum), keeping people more on track. Subdivisions of beat intervals increase timing information and have been shown to facilitate sensorimotor synchronization (Madison, 2014; Repp, 2003). However, further study with more variations of auditory stimuli is needed to unravel the effects of metrical hierarchy (e.g., by including an additional eighth note metronome), timbre variations, and rhythmic complexity on neural entrainment more thoroughly. Syncopation, for example, is one form of rhythmic complexity that relates to movement induction in an inverted U-shaped fashion (Sioros, Miron, Davies, Gouyon, & Madison, 2014; Witek, Clarke, Wallentin, Kringelbach, & Vuust, 2014) and might affect neural entrainment.

The analysis of neural oscillations during silent breaks revealed SSEPs at the beat rate (f) during the tapping task, but no clear amplitude peaks in the second harmonic (2f) during the tapping task, and no peaks in either harmonic during the listening task. Although entrainment to music is based on complex sensorimotor interactions (Stupacher et al., 2013; Chen et al., 2008; Grahn & Brett, 2007), it may seem possible that the SSEPs during the tapping task in silent breaks were mostly driven by motor execution. In both playback conditions (stimulus presentation and silent breaks), the comparison of tapping and listening tasks revealed that SSEP amplitudes were higher during tapping. Considering the participants' stable and accurate tapping performances, this effect might partly be due to an overlap of auditory-evoked and motor-evoked potentials. Oscillations in sensorimotor brain regions can be locked to movement kinematics in motor tasks (Bourguignon et al., 2011; Gerloff et al., 1997). In musical contexts, body movements in time with the beat can sharpen temporal structure representation (Su & Pöppel, 2012; Phillips-Silver & Trainor, 2005), timekeeping (Manning & Schutz, 2013), and sensory selection (Morillon, Schroeder, & Wyart, 2014), suggesting that the consistency between auditory and motor fluctuations improves rhythm perception and entrainment (Nozaradan, Zerouali, Peretz, & Mouraux, 2015). Chemin, Mouraux, and Nozaradan (2014) reported an effect of previously performed rhythmic movement on auditory perception. This shaping effect, however, could only be observed after a drastic intervention involving whole-body movements in synchrony with an auditory input and movements of an experimenter. In contrast, our behavioral task involved only minimal movements of the index finger. The behavioral results indicate that participants' tapping performances were equally good across the three rhythmic structures. Nonetheless, we found an effect of rhythmic structure (Rhythm 1, Rhythm 2, Metronome) on SSEP and N1 amplitudes. These findings once more suggest that neural entrainment to music is based on auditory–motor interactions and cannot be reduced to either solely auditory or solely motor neural activity. The neurodynamic model of Large et al. (2015) comes to similar conclusions. The model is based on dynamic attending and shows that, especially with more complex musical rhythms, beat and meter perception depends on the interaction of neural oscillations in sensory and motor networks.

During the silent breaks, the lack of SSEPs in the listening task might be explained by two methodological issues. First, we analyzed the frequency spectra during the first 1.8 and 2.2 sec (corresponding to the shortest break in 132 and 108 BPM clips, respectively) in all of the three break lengths, to obtain acceptable signal-to-noise ratios. In contrast to the more strongly pronounced neural oscillations during the tapping task, neural oscillations during the listening task might be too fragile to be captured in short time windows. Second, we did not explicitly instruct the participants to try to imagine that the drum clips continue during the silent breaks. Future experiments could try to maximize the attention in silent breaks. Although this might make the experimental design less comparable to everyday music listening, higher attention and more specific instructions (e.g., “Imagine the rhythm continuing during the silent breaks”) could lead to more stable neural oscillations during silence.

Yet, our experimental design allowed us to draw inferences from the results of ERP analyses about neural activity during the silent breaks. N1 peak amplitudes were more negative with longer break lengths (Figure 4A), suggesting that rhythm-related neural oscillations persisted after stimulus presentation but decreased in their amplitude or changed in their phase with increasing time of silence. Importantly, this effect was found during listening as well as during tapping.

One of the main questions of this study was whether, or to what extent, rhythmic entrainment is caused by an internal driver or by external events. Although we did not directly test the mechanism at hand, the variation of rhythmic structure, the interruption of stimuli through silent breaks, and the comparison of SSEPs and ERPs provided new insights here. In the light of the predictive coding theory, our N1 findings represent less correct predictions with longer breaks and simpler rhythmic structures. In the context of dynamic attending, our SSEP results can be interpreted as interacting neural oscillations in sensory and motor networks that are driven by both external signals and endogenous oscillations (Large et al., 2015). In summary, our findings suggest that rhythm-related neural oscillations are not just passively reflecting the external rhythmic events (bottom–up processes) but at least partly represent endogenous top–down processes that enable perceptual grouping and meter interpretation and thus promote timing predictions. The involvement of both bottom–up and top–down processes is consistent with predictive coding (Vuust & Witek, 2014; Friston, 2002, 2005) and dynamic attending (Henry & Herrmann, 2014; Large & Jones, 1999; Large & Kolen, 1994).

Top–down processes have been shown to modulate rhythm perception. For example, long time exposure to culturally dominant musical meters can narrow the metrical frameworks that can be effortlessly processed (Hannon & Trehub, 2005). Additionally, depending on our interpretation, physically identical rhythms can be heard as syncopated or not (Honing, 2012). Findings such as these show that rhythm perception is based on an interplay between automatic and controlled processes (Levitin & Tirovolas, 2009). Integrating these processes on a neural level, Schwartze, Tavano, Schröger, and Kotz (2012) proposed a model that differentiates between event-based cerebellar–thalamocortical encoding processes with high temporal resolution and attention-based evaluations of longer-range temporal structures related to the basal ganglia–thalamocortical system. The model is consistent with the view that the processing of different timescales involves different brain structures and cognitive mechanisms, for example, cerebellum in millisecond range motor control and basal ganglia in larger-scale conscious estimations and interpretations (for a review, see Buhusi & Meck, 2005; for a recent review on the role of the cerebellum in perceptual processes, see also Baumann et al., 2015). Basal ganglia mechanisms are associated with beat prediction processes (Grahn & Rowe, 2013; Grahn & Brett, 2007) and may interact with higher-order cognitive mechanisms in the pFC during beat processing (Kung, Chen, Zatorre, & Penhune, 2013). The present finding of beat-related neural oscillations that partly persist after stimulus presentation might be a result of temporal structure evaluations and predictive processes involving basal ganglia-thalamocortical mechanisms.

In contrast, the N1 results can be associated with automatic and temporally fine-grained encoding processes. To further evaluate the sensorimotor aspects and the behavioral relevance of our N1 findings, we investigated the relationship between tapping performance measures and N1 peak amplitudes. Tapping variability measures and N1 peak amplitude were correlated negatively, indicating that higher tapping asynchronies were associated with more negative N1 peak amplitudes (Figure 5). These findings are consistent with the assumption that more negative N1 amplitudes reflect reactions to less predictable events. Accordingly, the N1 peak amplitude would be less prominent—reflecting less surprise—with more stable tapping performances and more precise tap–beat overlaps.

In summary, we provided evidence that suggests an involvement of endogenous timing processes in entrainment to musical rhythms. Combining the novel approach of measuring EEG spectral amplitudes during silent breaks with the analysis of ERPs allowed us to further bridge the gap between resonant oscillations and finely timed transient neural processes. Our findings strongly suggest that during music perception, neurons dynamically modulate their firing activity in response to the rhythmic pulse in a way that incorporates endogenous cognitive processes. This mechanism of stimulus-induced periodic firing might also help to better understand predictive timing in nonmusical contexts. The role of endogenous neural oscillations and how they shape sensory, motor, and cognitive processes is a large topic in neuroscience (e.g., Calderone, Lakatos, Butler, & Castellanos, 2014), and future research should further assess how and to what extent this neural oscillatory activity is influenced or primed by external events.

From a musical point of view, our study can help explain why we find rhythmic breaks, as used in funk, soul, and related genres, so catchy. Similar to the “bass drop” in electronic music, the reappearance of a clear rhythmic pulse after a break can feel engaging—even if we do not move our body. In the words of the Meters' bass player George Porter, Jr.: “[…] it is what you don't play that makes the music flow. The spaces are as important as the notes” (Porter, 2014).

Acknowledgments

Jan Stupacher was supported by a DOC fellowship of the Austrian Academy of Sciences at the Department of Psychology, University of Graz. We thank Philipp Metzger and Vanessa Hinterleitner for their assistance in data collection.

Reprint requests should be sent to Jan Stupacher, Department of Psychology, University of Graz, Universitätsplatz 2/DG, 8010 Graz, Austria, or via e-mail: jan.stupacher@uni-graz.at.

REFERENCES

Arnal
,
L. H.
, &
Giraud
,
A.-L.
(
2012
).
Cortical oscillations and sensory predictions
.
Trends in Cognitive Sciences
,
16
,
390
398
.
Baumann
,
O.
,
Borra
,
R. J.
,
Bower
,
J. M.
,
Cullen
,
K. E.
,
Habas
,
C.
,
Ivry
,
R. B.
, et al
(
2015
).
Consensus paper: The role of the cerebellum in perceptual processes
.
Cerebellum
,
14
,
197
220
.
Bendixen
,
A.
,
SanMiguel
,
I.
, &
Schröger
,
E.
(
2012
).
Early electrophysiological indicators for predictive processing in audition: A review
.
International Journal of Psychophysiology
,
83
,
120
131
.
Benjamini
,
Y.
, &
Hochberg
,
Y.
(
1995
).
Controlling the false discovery rate: A practical and powerful approach to multiple testing
.
Journal of the Royal Statistical Society
,
57
,
289
300
.
Bispham
,
J.
(
2006
).
Rhythm in music: What is it? who has it? and why?
Music Perception
,
24
,
125
134
.
Bourguignon
,
M.
,
De Tiège
,
X.
,
de Beeck
,
M. O.
,
Pirotte
,
B.
,
Van Bogaert
,
P.
,
Goldman
,
S.
, et al
(
2011
).
Functional motor-cortex mapping using corticokinematic coherence
.
Neuroimage
,
55
,
1475
1479
.
Buhusi
,
C. V.
, &
Meck
,
W. H.
(
2005
).
What makes us tick? Functional and neural mechanisms of interval timing
.
Nature Reviews Neuroscience
,
6
,
755
765
.
Buzsáki
,
G.
, &
Draguhn
,
A.
(
2004
).
Neuronal oscillations in cortical networks
.
Science
,
304
,
1926
1929
.
Calderone
,
D. J.
,
Lakatos
,
P.
,
Butler
,
P. D.
, &
Castellanos
,
F. X.
(
2014
).
Entrainment of neural oscillations as a modifiable substrate of attention
.
Trends in Cognitive Sciences
,
18
,
300
309
.
Chemin
,
B.
,
Mouraux
,
A.
, &
Nozaradan
,
S.
(
2014
).
Body movement selectively shapes the neural representation of musical rhythms
.
Psychological Science
,
25
,
2147
2159
.
Chen
,
J. L.
,
Penhune
,
V. B.
, &
Zatorre
,
R. J.
(
2008
).
Listening to musical rhythms recruits motor regions of the brain
.
Cerebral Cortex
,
18
,
2844
2854
.
Clayton
,
M.
,
Sager
,
R.
, &
Will
,
U.
(
2005
).
In time with the music: The concept of entrainment and its significance for ethnomusicology
.
European Meetings in Ethnomusicology
,
11
,
3
124
.
Drake
,
C.
,
Jones
,
M. R.
, &
Baruch
,
C.
(
2000
).
The development of rhythmic attending in auditory sequences: Attunement, referent period, focal attending
.
Cognition
,
77
,
251
288
.
Engel
,
A. K.
,
Fries
,
P.
, &
Singer
,
W.
(
2001
).
Dynamic predictions: Oscillations and synchrony in top–down processing
.
Nature Reviews Neuroscience
,
2
,
704
716
.
Frey
,
J. N.
,
Ruhnau
,
P.
, &
Weisz
,
N.
(
2015
).
Not so different after all: The same oscillatory processes support different types of attention
.
Brain Research
,
1626
,
183
197
.
Fries
,
P.
(
2005
).
A mechanism for cognitive dynamics: Neuronal communication through neuronal coherence
.
Trends in Cognitive Sciences
,
9
,
474
480
.
Friston
,
K.
(
2002
).
Beyond phrenology: What can neuroimaging tell us about distributed circuitry?
Annual Review of Neuroscience
,
25
,
221
250
.
Friston
,
K.
(
2005
).
A theory of cortical responses
.
Philosophical Transactions of the Royal Society of London, Series B, Biological Sciences
,
360
,
815
836
.
Gerloff
,
C.
,
Toro
,
C.
,
Uenishi
,
N.
,
Cohen
,
L. G.
,
Leocani
,
L.
, &
Hallett
,
M.
(
1997
).
Steady-state movement-related cortical potentials: A new approach to assessing cortical activity associated with fast repetitive finger movements
.
Electroencephalography and Clinical Neurophysiology
,
102
,
106
113
.
Grahn
,
J. A.
(
2012
).
Neural mechanisms of rhythm perception: Current findings and future perspectives
.
Topics in Cognitive Science
,
4
,
585
606
.
Grahn
,
J. A.
, &
Brett
,
M.
(
2007
).
Rhythm and beat perception in motor areas of the brain
.
Journal of Cognitive Neuroscience
,
19
,
893
906
.
Grahn
,
J. A.
, &
Rowe
,
J. B.
(
2013
).
Finding and feeling the musical beat: Striatal dissociations between detection and prediction of regularity
.
Cerebral Cortex
,
23
,
913
921
.
Gratton
,
G.
,
Coles
,
M. G. H.
, &
Donchin
,
E.
(
1983
).
A new method for off-line removal of ocular artifact
.
Electroencephalography and Clinical Neurophysiology
,
55
,
468
484
.
Hannon
,
E. E.
, &
Trehub
,
S. E.
(
2005
).
Metrical categories in infancy and adulthood
.
Psychological Science
,
16
,
48
55
.
Henry
,
M. J.
, &
Herrmann
,
B.
(
2014
).
Low-frequency neural oscillations support dynamic attending in temporal context
.
Timing & Time Perception
,
2
,
62
86
.
Honing
,
H.
(
2012
).
Without it no music: Beat induction as a fundamental musical trait
.
Annals of the New York Academy of Sciences
,
1252
,
85
91
.
Hove
,
M. J.
, &
Risen
,
J. L.
(
2009
).
It's all in the timing: Interpersonal synchrony increases affiliation
.
Social Cognition
,
27
,
949
960
.
Jääskeläinen
,
I. P.
,
Ahveninen
,
J.
,
Bonmassar
,
G.
,
Dale
,
A. M.
,
Ilmoniemi
,
R. J.
,
Levänen
,
S.
, et al
(
2004
).
Human posterior auditory cortex gates novel sounds to consciousness
.
Proceedings of the National Academy of Sciences, U.S.A.
,
101
,
6809
6814
.
Kung
,
S.-J.
,
Chen
,
J. L.
,
Zatorre
,
R. J.
, &
Penhune
,
V. B.
(
2013
).
Interacting cortical and basal ganglia networks underlying finding and tapping to the musical beat
.
Journal of Cognitive Neuroscience
,
25
,
401
420
.
Lange
,
K.
(
2009
).
Brain correlates of early auditory processing are attenuated by expectations for time and pitch
.
Brain and Cognition
,
69
,
127
137
.
Lange
,
K.
(
2010
).
Can a regular context induce temporal orienting to a target sound?
International Journal of Psychophysiology
,
78
,
231
238
.
Large
,
E. W.
(
2008
).
Resonating to musical rhythm: Theory and experiment
. In
S.
Grondin
(Ed.),
Psychology of time
(pp.
189
231
).
Bingley, UK
:
Emerald
.
Large
,
E. W.
,
Herrera
,
J. A.
, &
Velasco
,
M. J.
(
2015
).
Neural networks for beat perception in musical rhythm
.
Frontiers in Systems Neuroscience
,
9
,
159
.
Large
,
E. W.
, &
Jones
,
M. R.
(
1999
).
The dynamics of attending: How people track time-varying events
.
Psychological Review
,
106
,
119
159
.
Large
,
E. W.
, &
Kolen
,
J. F.
(
1994
).
Resonance and the perception of musical meter
.
Connection Science
,
6
,
177
208
.
Lartillot
,
O.
, &
Toiviainen
,
P.
(
2007
).
A Matlab toolbox for musical feature extraction from audio
. In
Proceedings of the 10th International Conference on Digital Audio Effects (DAFx-07)
(pp.
237
246
).
Bordeaux, France
.
Levitin
,
D. J.
, &
Tirovolas
,
A. K.
(
2009
).
Current advances in the cognitive neuroscience of music
.
Annals of the New York Academy of Sciences
,
1156
,
211
231
.
Leys
,
C.
,
Ley
,
C.
,
Klein
,
O.
,
Bernard
,
P.
, &
Licata
,
L.
(
2013
).
Detecting outliers: Do not use standard deviation around the mean, use absolute deviation around the median
.
Journal of Experimental Social Psychology
,
49
,
764
766
.
MacDougall
,
H. G.
, &
Moore
,
S. T.
(
2005
).
Marching to the beat of the same drummer: The spontaneous tempo of human locomotion
.
Journal of Applied Physiology
,
99
,
1164
1173
.
Madison
,
G.
(
2014
).
Sensori-motor synchronisation variability decreases as the number of metrical levels in the stimulus signal increases
.
Acta Psychologica
,
147
,
10
16
.
Manning
,
F.
, &
Schutz
,
M.
(
2013
).
“Moving to the beat” improves timing perception
.
Psychonomic Bulletin & Review
,
20
,
1133
1139
.
Marsh
,
K. L.
,
Richardson
,
M. J.
, &
Schmidt
,
R. C.
(
2009
).
Social connection through joint action and interpersonal coordination
.
Topics in Cognitive Science
,
1
,
320
339
.
Morillon
,
B.
,
Schroeder
,
C. E.
, &
Wyart
,
V.
(
2014
).
Motor contributions to the temporal precision of auditory attention
.
Nature Communications
,
5
,
5255
.
Nozaradan
,
S.
(
2014
).
Exploring how musical rhythm entrains brain activity with electroencephalogram frequency-tagging
.
Philosophical Transactions of the Royal Society of London, Series B, Biological Sciences
,
369
,
20130393
.
Nozaradan
,
S.
,
Peretz
,
I.
,
Missal
,
M.
, &
Mouraux
,
A.
(
2011
).
Tagging the neuronal entrainment to beat and meter
.
Journal of Neuroscience
,
31
,
10234
10240
.
Nozaradan
,
S.
,
Peretz
,
I.
, &
Mouraux
,
A.
(
2012a
).
Selective neuronal entrainment to the beat and meter embedded in a musical rhythm
.
Journal of Neuroscience
,
32
,
17572
17581
.
Nozaradan
,
S.
,
Peretz
,
I.
, &
Mouraux
,
A.
(
2012b
).
Steady-state evoked potentials as an index of multisensory temporal binding
.
Neuroimage
,
60
,
21
28
.
Nozaradan
,
S.
,
Zerouali
,
Y.
,
Peretz
,
I.
, &
Mouraux
,
A.
(
2015
).
Capturing with EEG the neural entrainment and coupling underlying sensorimotor synchronization to the beat
.
Cerebral Cortex
,
25
,
736
747
.
Phillips-Silver
,
J.
,
Aktipis
,
C. A.
, &
Bryant
,
G. A.
(
2010
).
The ecology of entrainment: Foundations of coordinated rhythmic movement
.
Music Perception
,
28
,
3
14
.
Phillips-Silver
,
J.
, &
Trainor
,
L. J.
(
2005
).
Feeling the beat: Movement influences infant rhythm perception
.
Science
,
308
,
1430
.
Porter
,
G.
, Jr.
(
2014, August 25
).
Interview by B. Drolenga
. .
Repp
,
B. H.
(
2003
).
Rate limits in sensorimotor synchronization with auditory and visual sequences: The synchronization threshold and the benefits and costs of interval subdivision
.
Journal of Motor Behavior
,
35
,
355
370
.
Repp
,
B. H.
(
2005
).
Sensorimotor synchronization: A review of the tapping literature
.
Psychonomic Bulletin & Review
,
12
,
969
992
.
Repp
,
B. H.
, &
Su
,
Y.
(
2013
).
Sensorimotor synchronization: A review of recent research (2006–2012)
.
Psychonomic Bulletin & Review
,
20
,
403
452
.
Richardson
,
D. C.
,
Dale
,
R.
, &
Kirkham
,
N. Z.
(
2007
).
The art of conversation is coordination: Common ground and the coupling of eye movements during dialogue
.
Psychological Science
,
18
,
407
413
.
Sanabria
,
D.
, &
Correa
,
Á.
(
2013
).
Electrophysiological evidence of temporal preparation driven by rhythms in audition
.
Biological Psychology
,
92
,
98
105
.
Schroeder
,
C. E.
, &
Lakatos
,
P.
(
2009
).
Low-frequency neuronal oscillations as instruments of sensory selection
.
Trends in Neurosciences
,
32
,
1
16
.
Schwartze
,
M.
,
Tavano
,
A.
,
Schröger
,
E.
, &
Kotz
,
S. A.
(
2012
).
Temporal aspects of prediction in audition: Cortical and subcortical neural mechanisms
.
International Journal of Psychophysiology
,
83
,
200
207
.
Singer
,
W.
(
1999
).
Neurobiology: Striving for coherence
.
Nature
,
397
,
391
393
.
Sioros
,
G.
,
Miron
,
M.
,
Davies
,
M.
,
Gouyon
,
F.
, &
Madison
,
G.
(
2014
).
Syncopation creates the sensation of groove in synthesized music examples
.
Frontiers in Psychology
,
5
,
1036
.
Stupacher
,
J.
,
Hove
,
M. J.
, &
Janata
,
P.
(
2016
).
Audio features underlying perceived groove and sensorimotor synchronization in music
.
Music Perception
,
33
,
571
589
.
Stupacher
,
J.
,
Hove
,
M. J.
,
Novembre
,
G.
,
Schütz-Bosbach
,
S.
, &
Keller
,
P. E.
(
2013
).
Musical groove modulates motor cortex excitability: A TMS investigation
.
Brain and Cognition
,
82
,
127
136
.
Su
,
Y.-H.
, &
Pöppel
,
E.
(
2012
).
Body movement enhances the extraction of temporal structures in auditory sequences
.
Psychological Research
,
76
,
373
382
.
Tarr
,
B.
,
Launay
,
J.
, &
Dunbar
,
R. I. M.
(
2014
).
Music and social bonding: “Self-other” merging and neurohormonal mechanisms
.
Frontiers in Psychology
,
5
,
1096
.
Thomson
,
J. M.
,
Fryer
,
B.
,
Maltby
,
J.
, &
Goswami
,
U.
(
2006
).
Auditory and motor rhythm awareness in adults with dyslexia
.
Journal of Research in Reading
,
29
,
334
348
.
Thut
,
G.
,
Schyns
,
P. G.
, &
Gross
,
J.
(
2011
).
Entrainment of perceptually relevant brain oscillations by non-invasive rhythmic stimulation of the human brain
.
Frontiers in Psychology
,
2
,
170
.
Tierney
,
A.
, &
Kraus
,
N.
(
2014
).
Auditory-motor entrainment and phonological skills: Precise auditory timing hypothesis (PATH)
.
Frontiers in Human Neuroscience
,
8
,
949
.
Tierney
,
A.
, &
Kraus
,
N.
(
2015
).
Neural entrainment to the rhythmic structure of music
.
Journal of Cognitive Neuroscience
,
27
,
400
408
.
Tierney
,
A. T.
, &
Kraus
,
N.
(
2013
).
The ability to tap to a beat relates to cognitive, linguistic, and perceptual skills
.
Brain and Language
,
124
,
225
231
.
van Noorden
,
L.
, &
Moelants
,
D.
(
1999
).
Resonance in the perception of musical pulse
.
Journal of New Music Research
,
28
,
43
66
.
van Zuijen
,
T. L.
,
Sussman
,
E.
,
Winkler
,
I.
,
Näätänen
,
R.
, &
Tervaniemi
,
M.
(
2005
).
Auditory organization of sound sequences by a temporal or numerical regularity—A mismatch negativity study comparing musicians and non-musicians
.
Cognitive Brain Research
,
23
,
270
276
.
Vuust
,
P.
, &
Witek
,
M. A. G.
(
2014
).
Rhythmic complexity and predictive coding: A novel approach to modeling rhythm and meter perception in music
.
Frontiers in Psychology
,
5
,
1111
.
Wiltermuth
,
S. S.
, &
Heath
,
C.
(
2009
).
Synchrony and cooperation
.
Psychological Science
,
20
,
1
5
.
Witek
,
M. A. G.
,
Clarke
,
E. F.
,
Wallentin
,
M.
,
Kringelbach
,
M. L.
, &
Vuust
,
P.
(
2014
).
Syncopation, body-movement and pleasure in groove music
.
PloS One
,
9
,
e94446
.
Zatorre
,
R. J.
,
Chen
,
J. L.
, &
Penhune
,
V. B.
(
2007
).
When the brain plays music: Auditory–motor interactions in music perception and production
.
Nature Reviews Neuroscience
,
8
,
547
558
.

Author notes

*

These authors have contributed equally to this work.