Dynamic attending theory predicts that attention is allocated hierarchically across time during processing of hierarchical rhythmic structures such as musical meter. ERP research demonstrates that attention to a moment in time modulates early auditory processing as evidenced by the amplitude of the first negative peak (N1) approximately 100 msec after sound onset. ERPs elicited by tones presented at times of high and low metric strength in short melodies were compared to test the hypothesis that hierarchically structured rhythms direct attention in a manner that modulates early perceptual processing. A more negative N1 was observed for metrically strong beats compared with metrically weak beats; this result provides electrophysiological evidence that hierarchical rhythms direct attention to metrically strong times during engaged listening. The N1 effect was observed only on fast tempo trials, suggesting that listeners more consistently invoke selective processing based on hierarchical rhythms when sounds are presented rapidly. The N1 effect was not modulated by musical expertise, indicating that the allocation of attention to metrically strong times is not dependent on extensive training. Additionally, changes in P2 amplitude and a late negativity were associated with metric strength under some conditions, indicating that multiple cognitive processes are associated with metric perception.
In the complex environments encountered in everyday life, the amount of sensory information available is vast; detailed processing of all information available to human perceptual systems would overwhelm neural resources. A large body of behavioral, electrophysiological, and neuroimaging research has demonstrated that selectively attending based on certain aspects of the incoming sensory stream allows for more detailed processing of potentially relevant information, resulting in a more efficient use of neuroperceptual resources (Alho et al., 1999; Hillyard, 1985; Hansen & Hillyard, 1983; Hillyard, Hink, Schwent, & Picton, 1973; Cherry, 1953). Studies employing the fine temporal resolution of ERPs have further demonstrated that selective attention affects perceptual as well as higher-level processing; attention modulates early portions of ERP waveforms less than 250 msec after auditory and visual onsets (Hillyard, 1985; Hillyard et al., 1973).
Much of the previous research on selective attention has focused on the use of spatial location as the selection criterion. Recent evidence demonstrates that temporal characteristics can also be used as effective attention selection criteria. Behavioral responses to information presented at an attended time show facilitation similar to that observed for information presented at an attended location. A modified version of the Posner cuing paradigm (Posner, Snyder, & Davidson, 1980) showed that responses to detected targets are faster following validly cued time intervals compared to invalidly cued time intervals (Griffin, Miniussi, & Nobre, 2002; Miniussi, Wilding, Coull, & Nobre, 1999; Coull & Nobre, 1998). Initial ERP studies of temporally selective attention demonstrated modulation of late portions of the waveforms, including the P300, as well as a low-frequency negativity that increased in amplitude prior to the attended time (contingent negative variation; Lange & Röder, 2006; Lange, Rösler, & Röder, 2003; Griffin et al., 2002).
Temporally selective attention, like spatially selective attention, affects early perceptual processing as indexed by ERPs in the first 250 msec after onset when participants are engaged in a perceptually demanding task (Sanders & Astheimer, 2008; Correa, Lupiáñez, Madrid, & Tudela, 2006; Lange & Röder, 2006; Lange et al., 2003; Griffin et al., 2002). Specifically, sounds presented at attended times elicit a larger N1 approximately 100 msec after onset compared to physically identical stimuli presented at unattended times (Lange & Röder, 2006; Lange et al., 2003). The observation of this effect in studies that used more than two possible time intervals indicates that temporal attention can be modulated at a subsecond timescale (Sanders & Astheimer, 2008). Furthermore, the observation of this effect to word onsets during speech processing without explicit attention instructions indicates that temporal attention can be guided by cues inherent to the stimulus (Astheimer & Sanders, 2009, 2011), although it remains unclear exactly which stimulus-inherent cues guide attention across time. One potential stimulus-inherent temporal attention cue is rhythmic structure; repeating rhythmic information provides strong, self-reinforcing cues about when relevant information is likely to occur.
Dynamic attending theory is a well-developed theoretical framework of how rhythms in external stimuli may guide the allocation of attention across time (McAuley & Jones, 2003; Pashler, 2001; Large & Jones, 1999; Keele, Nicoletti, Ivry, & Pokorny, 1989; Schulze, 1978; Jones, 1976). Dynamic attending theory belongs to the entrainment-based class of models describing how we perceive time and the organization of events across time. Entrainment models of time perception are developed from a nonlinear dynamical systems perspective and typically rely on an oscillator interpretation of the biological clock (McAuley, 2010). Dynamic attending theory conceptualizes attention as a persistently rhythmic process comprising multiple self-sustaining oscillatory components termed attending rhythms. These attending rhythms are subject to phase and period perturbations by external periodicities and will become phase- and period-locked (entrained) to the external periodicities after sufficient exposure. Individual attending rhythms generate periodic temporal expectancies based on their oscillations, such that when an attending rhythm becomes entrained to an external periodically recurring stimulus, it generates a pattern of temporal expectancies that predict when the stimulus will recur (Large & Jones, 1999).
Evidence for dynamic attending theory comes largely from the effects of prior rhythmic context on accuracy in duration discrimination tasks. In paradigms that require participants to identify timing deviations in tone streams, accuracy is higher when the tone onsets are exactly isochronous (Large & Jones, 1999; Jones & Yee, 1997; Yee, Holleran, & Jones, 1994). Just-noticeable differences in tempo are smaller for precisely isochronous sequences and are progressively smaller with more repetitions of the exact interval (McAuley & Kidd, 1998; Drake & Botte, 1993). Same/different discriminations between a sequence–final interval and an immediately prior standard interval are more accurate when the standard interval is validly predicted by an isochronous stream (Barnes & Jones, 2000; McAuley & Kidd, 1998). Prior rhythmic context also modulates ERP components associated with attentional allocation. Target sounds elicit a larger P300 when preceded by isochronous rhythmic contexts compared to irregular rhythmic contexts (Rimmele, Jolsvai, & Sussman, 2011; Schwartze, Rothermich, Schmidt-Kassow, & Kotz, 2011; Lange, 2009, 2010). There is some evidence that sounds elicit a larger N1 when preceded by an isochronous (compared to irregular) rhythmic context (Schwartze, Farrugia, & Kotz, 2013; Rimmele et al., 2011; Lange, 2010), although this relationship remains tentative as this effect is not always observed (Sanabria & Correa, 2013; Lange, 2009; see Lange, 2013, for discussion).
Dynamic attending theory makes specific predictions about attentional allocation in the presence of multiple external periodicities. Under such circumstances, it is predicted that separate attending rhythms will entrain to each of the periodicities present in the external stimulus. Furthermore, it is predicted that the separately entrained attending rhythms will combine additively to produce an overall temporal expectancy profile that will direct the allocation of attention across time (Large & Palmer, 2002; Large & Jones, 1999). When multiple entraining periodicities present in a stimulus are related by simple integer ratios such as 2:1, 3:1, or 4:1, the additive combination of the individual periodicities is hierarchically structured. Accordingly, the resultant additive combination of separately entrained attending rhythms is predicted to form a hierarchical profile of temporal expectancy, which is in turn predicted to generate a hierarchical allocation of attention across time. Supporting this idea, target sounds occurring at metrically strong times are detected more quickly than those occurring at metrically weak times (Bolger, Coull, & Schön, 2014; Bolger, Trost, & Schön, 2013; Cason & Schön, 2012).
Studies of oscillatory electrical and magnetic brain responses offer tentative biological support for the proposal that temporally hierarchical stimuli induce a temporally hierarchical attention profile (Zanto, Snyder, & Large, 2006). Isochronous sequences of alternating loud and soft tones that create explicit hierarchies modulate the power of EEG activity in the gamma band (20–60 Hz; Snyder & Large, 2005). Similar results are observed using magnetoencephalography (Iversen, Repp, & Patel, 2009). The exact relationship between gamma band activity and attentional selection has not yet been specified, but increases in gamma band power and phase synchrony have been associated with visual selective attention (Schroeder & Lakatos, 2009a; Doesburg, Roggeveen, Kitajo, & Ward, 2008; Fries, Reynolds, Rorie, & Desimone, 2001). Furthermore, gamma band power is thought to be coupled to delta band phase (Schroeder & Lakatos, 2009b; Lakatos et al., 2005), which has been associated with facilitated detection of audiotemporal targets (Stefanics et al., 2010) and is modulated by stimulus-inherent rhythms (Will & Berg, 2007). Additionally, hierarchic rhythm perception has been shown to modulate auditory steady-state evoked potentials at frequencies contained in the hierarchy (Nozaradan, Peretz, & Mouraux, 2012; Nozaradan, Peretz, Missal, & Mouraux, 2011). This suggests entrainment of some neuronal resources to different levels in the hierarchical rhythmic structure, but it is unclear whether this process is related to the attentional mechanisms implicated by dynamic attending theory because the steady-state evoked response is not necessarily specific to attentional modulation of perceptual processing.
The relationship between hierarchical rhythmic structure and the allocation of attention across time has also been investigated using ERPs. Multiple ERP studies have associated the processing of hierarchic strength with changes in later positivities considered to be part of the P300 family of components. Deviant tones elicit a larger P3b (a subcomponent of the P300) when presented at positions of hierarchic strength in spontaneous and defined subjective rhythmic hierarchies (Potter, Fenwick, Abecasis, & Brochard, 2009; Abecasis, Brochard, Granot, & Drake, 2005; Brochard, Abecasis, Potter, Ragot, & Drake, 2003). Amplitude of the P3a elicited by temporal probes is modulated by hierarchic strength in musically experienced participants (Jongsma, Desain, & Honing, 2004; Jongsma, Desain, Honing, & Rijn, 2003). Amplitude of the P300 elicited by phonemic targets in pseudowords is modulated by congruence of pseudoword stress and metric structure of a preceding rhythmic prime (Cason & Schön, 2012). Temporally selective attention can clearly modulate P300 amplitude. However, there are many other factors that modulate P300 amplitude (see Picton, 1992, for review), making it a relatively poor index of attentional allocation. Determining whether temporally selective attention is directed to points of strength in hierarchic rhythms requires measuring the effects of attention on earlier perceptual processing, which can be indexed with the amplitude of the auditory N1.
Current findings regarding the relationship between hierarchic strength and auditory N1 amplitude are mixed. None of the studies associating hierarchic strength with the P300 observed reliable effects of metric strength on N1 amplitude. Although an early processing negativity between 0 and 100 msec is reported by Potter and colleagues (2009), this negativity is clearly evident at event onset and thus cannot represent modulation of the N1 component, which is typically observed with an onset between 90 and 110 msec in adults (Ponton, Eggermont, Kwong, & Don, 2000). However, the designs of these studies were such that any changes in N1 amplitude because of hierarchic strength could have been obscured by other neural activity. Rare deviant tones embedded in a sequence of standards, such as those employed in three of the studies (Potter et al., 2009; Abecasis et al., 2005; Brochard et al., 2003), typically elicit a large, frontocentrally distributed negativity that peaks approximately 150–250 msec poststimulus onset termed the MMN that can obscure changes in N1 amplitude (see Näätänen, Paavilainen, Rinne, & Alho, 2007, for review). Metric expectancy violations such as those employed in Cason and Schön (2012) have also been shown to elicit an MMN (Vuust et al., 2005). Additionally, the variable timing of the temporal probes relative to the previous auditory event in the Jongsma et al. (2003, 2004) studies is known to modulate N1 amplitude (Coch, Skendzel, & Neville, 2005; Budd, Barry, Gordon, Rennie, & Michie, 1998), confounding hierarchical strength and refractory effects in these studies.
One additional ERP study reported a larger N1/P2 complex in response to sounds presented at times of greater hierarchic strength in a subjective metric hierarchy (Schaefer, Vlek, & Desain, 2011). However, although described as an N1/P2 change, this difference was actually a sustained positivity between 100 and 250 msec for hierarchically strong times compared to hierarchically weak times. This time window is too late and long to be considered an N1 effect. Furthermore, in the ERP waveforms, it appears that, although this early effect was reported separately from a later effect between 300 and 450 msec, the effects may actually be a single sustained positivity across the entire epoch in response to subjectively hierarchically strong times. If there were reliable differences in N1 amplitude between hierarchically strong and hierarchically weak beats, they would have been obscured by this long-duration effect.
The current study was designed to determine if metric structure directs temporally selective attention in a manner that modulates early perceptual processing. Specifically, identical sounds were predicted to elicit larger amplitude N1s when presented at times of greater metric strength in short melodies. Such results would provide neurophysiological evidence for one of the critical predictions of dynamic attending theory: Patterned stimuli can automatically induce listeners to allocate more attention to times that are cued by more levels of a metrical hierarchy. To determine whether such allocation of attention to metrically strong times is dependent on musical expertise, data from musicians and nonmusicians were compared. Additionally, to ensure that any observed effects were due to differences in metric strength rather than increasing reliability of a metric percept as more of a melody is heard, responses were compared for sounds presented at metrically strong and weak times both early and late in the melody.
Twenty-four adults provided the data included in these analyses. All participants were right-handed native English speakers with normal hearing and normal or corrected-to-normal vision. All participants reported having no neurological conditions and not having taken psychoactive medication within the 6 months prior to the experiment. Participants were divided into two groups on the basis of their self-reported musical training and performance backgrounds; 12 (six women) were classified as musicians and 12 (two women) were classified as nonmusicians. Data from an additional four participants were collected but excluded from analyses due to insufficiently high or low levels of musical expertise.
The musicians ranged in age from 18 to 24 years (M = 20.33, SD = 1.61) and had 6–16 years of experience playing their primary instrument (M = 10.34, SD = 3.27). All musicians had prior music theory training in either a classroom or private lesson environment, with 9 of the 12 having completed Music Theory II or above at the college level. The three musicians who had not completed Music Theory II had each received at least 7 years of individual or group lessons on their primary instrument. Nonmusicians ranged in age from 18 to 23 years (M = 20.33, SD = 1.44) and had 0–2 years of cumulative experience playing any instrument (M = 0.43, SD = 0.63). Only one nonmusician had any prior music theory training, which was at the high school level. The group of musicians scored higher on the Advanced Measures of Music Audiation (AMMA; Gordon & Alvey, 2008; Gordon, 1989) than the group of nonmusicians, although the spread of scores of the two groups overlapped. Musicians' raw AMMA subscores for tonal ranged from 20 to 39 (M = 30.50, SD = 4.54) and for rhythmic ranged from 24 to 39 (M = 31.58, SD = 3.68). Nonmusicians' raw AMMA subscores for tonal ranged from 19 to 29 (M = 24.00, SD = 3.41) and for rhythmic ranged from 18 to 31 (M = 24.83, SD = 1.11).
A total of 192 novel eight-measure melodies with simple and strongly cued metric structure were composed for use in the current study (see Figure 1 for examples). The melodies were varied on several musical factors to ensure that observed effects were due to perceived metric strength rather than stimulus familiarity. The melodies ranged in length from 10 to 20 sec and varied primarily on three fully crossed binary dimensions: presentation rate (fast: 450 msec interbeat intervals [IBIs], slow: 625 msec IBIs), intended meter (triple, quadruple), and surface rhythm (isochronous, patterned). The IBI is equivalent to the stimulus onset asynchronies of the underlying isochronous structural beat whether or not the surface rhythm is isochronous. Each major key was represented twice for each combination of presentation rate, intended meter, and surface rhythm. Two different tempi were used to reduce possible entrainment to a specific tempo over the course of the experiment; the specific tempi were chosen because individuals show a preference for beat lengths in the general range of 500–700 msec IBIs or 85–120 beats per minute (McAuley, Jones, Holub, Johnston, & Miller, 2006; London, 2002; Parncutt, 1994). The isochronous melodies contained only melodic cues to the intended meter, whereas the phrases with surface rhythm patterning contained both melodic and rhythmic cues to the intended meter. The metric cues employed were typical of Western tonal harmony; melodic cues included relative pitch pattern repetition, placement of harmonically strong notes on measure-initial beats, and changes in pitch contour direction at measure boundaries, rhythmic cues consisted primarily of temporal pattern repetition. All stimuli were composed using MIDI authoring software (Sonar Home Studio 6) and generated using a software MIDI synthesizer (Cakewalk TTS-1, Boston, MA) to ensure the absence of dynamic expression and other performance-based variations.
The melodies were composed so that each contained four scale-degree constrained notes at fixed critical points that varied by metric strength and temporal position: beat 1 of measure 4 (strong, early), beat 2 of measure 5 (weak, early), beat 1 of measure 6 (strong, late), and beat 2 of measure 7 (weak, late). Strong and weak critical points were included both early and late in the melodies to disentangle metric strength from the reliability of the perceived metric hierarchy. For half of the stimuli, the scale degrees occurring at these four critical locations were 1, 5, 5, 1 (Do, Sol, Sol, Do); for the other half, the scale degree pattern was 5, 1, 1, 5 (Sol, Do, Do, Sol). Within each combination of intended meter, presentation rate, surface rhythm, and critical point scale degree pattern, there was an equal probability of any of the 12 notes in the Western music system occurring at any of the four critical points in the melody. This organization allowed comparison of ERPs elicited by physically identical stimuli among the different metric strength conditions. All notes presented at critical points were directly preceded by quarter notes in all melodies regardless of surface rhythm complexity to prevent confounding of metric strength with N1 refractoriness.1
After providing informed consent, participants took part in a screening session during which they answered basic demographic questions, completed a questionnaire about their musical experience, and completed a computer-based self-administered version of the AMMA (Gordon & Alvey, 2008; Gordon, 1989). Participants found to have sufficiently high or low levels of musical expertise (as defined under Participants) completed an EEG session on the same or a subsequent day.
During the EEG session, participants listened to each of the 192 short melodies while fixating on a plus sign displayed on a computer monitor. A short (100 msec) burst of white noise was played either 1.0 beat (450 or 625 msec) or 1.3 beats (585 or 813 msec) after the completion of each melody. Participants were asked to identify whether the noise continued the rhythm of the preceding melody. This task was used to ensure that participants remained engaged in the experiment and actively listened to the melody on each trial. Trials were presented in a pseudorandomized order in blocks of 32 trials each with short breaks between blocks.
Data Collection and Analysis
EEG was collected using a 128-channel electrode net (EGI, Eugene, OR) at a sampling rate of 250 Hz and a bandpass of 0.01–100 Hz. Continuous EEG was filtered offline using a 60-Hz notch filter and segmented into 700-msec epochs beginning 100 msec before the onset of a critical point. Epochs containing eyeblink, eye movement, or other artifacts were automatically detected and excluded from analyses. Averaged waveforms were re-referenced to the averaged mastoid recording and baseline corrected to the 100 msec prior to target onset.
Mean amplitude was measured 90–120 msec (N1) and 150–190 msec (P2) after critical point onset from 81 electrodes broadly distributed across the scalp (Figure 2). Mean amplitude was also measured 250–500 msec to capture a late negativity (LN) evident in the grand-averaged waveforms (Figures 3 and 4). Separate repeated-measures ANOVAs were performed for each measurement (N1, P2, LN). Data from nine electrodes were averaged together to create nine ROIs (Figure 2), and Anterior/posterior position (AP: 3 levels) and Lateral position (LR: 3 levels) were included as within-subject factors in the repeated-measures ANOVAs. Our primary interest in the LN was to determine whether it explained strength effects in the earlier N1 time window, so to minimize post hoc comparisons, the LN ANOVA was spatially restricted to the left anterior (LA) scalp region where N1 strength effects were largest. The ANOVAs also included the factors Participant group (musicians, nonmusicians), Metric strength of the critical point (strong, weak), Presentation tempo (fast, slow), and Relative position of the critical point in the melody (early, late). Group, metric strength, and tempo were included to test the a priori hypotheses. Critical point position was included to explore the possibility that the amount of information needed to extract meter from the melodies differed for musicians and nonmusicians. The N1 ANOVA revealed interactions of Metric strength and Tempo, motivating follow-up ANOVAs for fast and slow trials separately. The N1 ANOVA for fast trials revealed interactions of Metric strength and Electrode position, motivating a follow-up ANOVA over only the LA scalp region. The P2 and LN ANOVAs revealed interactions of Metric strength, Musical experience, and Critical point location, motivating follow-up ANOVAs to isolate effects by Group and Critical point location. The ANOVA on P2 amplitude that included only musicians and only tones late in the melody revealed interactions of Metric strength with Tempo, motivating additional follow-up ANOVAs by tempo. For all ANOVAs, uncorrected degrees of freedom and Greenhouse–Geisser corrected p values are reported.
The behavioral task of determining whether an extra sound at the end of the melody was on- or off-beat was clearly difficult (accuracy: M = 66.56% correct, SD = 19.40%). Although this task was unrelated to extracting metric structure or differentially allocating attention across the melody, it did reflect experience with music; musicians (M = 81.34% correct, SD = 15.45%) clearly outperformed nonmusicians (M = 51.78% correct, SD = 8.45%; t(17) = 5.81, p < .001). However, response rates were high in both groups (musicians: M = 95.00%, SD = 8.39%; nonmusicians: M = 94.79%, SD = 6.37%), suggesting that all participants remained engaged in the task and actively listened to the melodies throughout the experiment.
The hypothesis that listeners direct temporally selective attention to metrically strong times in a manner that modulates early auditory processing predicts a larger N1 over anterior electrodes for metrically strong tones compared to metrically weak tones. Consistent with this hypothesis, on fast trials metrically strong tones elicited a larger N1 than metrically weak tones, F(1, 22) = 9.41, p = .006 (Figures 3 and 5). The effect for fast trials was largest over LA regions (Strength × AP × LR: F(4, 88) = 3.24, p = .031; LA only: F(1, 22) = 12.89, p = .002). At these same LA sites, the effect of Metric strength was not significant on slow trials (ps > .2; Figures 4 and 6).
The effect of Metric strength on N1 amplitude on fast trials was not modulated by musical expertise (ps > .2) or by critical point location in the melody (ps > .3; Figure 7). Furthermore, the N1 strength effect was observed on fast trials when comparing strong points late in the melodies to weak points early in the melodies (F(1, 22) = 4.27, p = .051; LA only: F(1, 22) = 10.15, p = .004), indicating that position in the melody cannot account for the metric strength effects. Although musical expertise did not modulate the effect of metric strength on N1 amplitude, mean N1 amplitude across metric strength was numerically larger in nonmusicians than in musicians over antero- and centromedial regions (Figures 5 and 6). Individual variability prevented the main effect of Group from reaching significance (p = .057), but N1 amplitude on slow trials was more negative in nonmusicians compared to musicians at these locations, F(1, 22) = 4.60, p = .045.
In nonmusicians early in the melody, metrically strong tones elicited a more positive P2 than metrically weak tones, F(1, 11) = 13.47, p = .004 (Figure 7). This effect was not modulated by presentation tempo (ps > .1). In musicians late in the melody on slow trials, metrically strong tones elicited a more positive P2 than metrically weak tones, F(1, 11) = 5.05, p = .046. Conversely, in musicians late in the melody on fast trials metrically strong tones elicited a less positive P2 than metrically weak tones, F(1, 11) = 14.85, p = .003; this effect was not observed for tones early in the melody (ps > .1).
There was some evidence that, in musicians, metrically strong tones elicited a sustained LN between 250 and 500 msec over LA electrode regions, F(1, 11) = 4.58, p = .055 (Figure 7). In nonmusicians, metrically strong tones presented late in the melodies elicited a larger LN, F(1, 11) = 8.66, p = .013, and metrically weak tones presented early in the melodies elicited a larger LN, F(1, 11) = 7.87, p = .017. The effects of Metric strength in the LN time window were larger for fast trials in both groups (SW × TMP: F(1, 22) = 5.65, p = .027).
Notes presented at metrically strong times elicited a larger amplitude N1 than identical notes presented at metrically weak times during engaged listening. As discussed below, the most likely mechanism underlying differences in early perceptual processing of physically identical stimuli is selective attention. This finding is therefore consistent with the prediction of dynamic attending theory that attention is preferentially allocated to times of metric strength during hierarchical rhythm perception. Observing this effect for tones both early and late in the melodies supports the claim that the differences are a function of metric strength rather than phrase closure or order-dependent processes such as familiarity. The larger N1 in response to sounds presented at metrically strong times was observed only on fast trials; a potential explanation for this finding is that listeners may be more likely to employ temporally selective attention when presented with rapidly changing information. Alternatively, the longer measure length on slow trials may have precluded attentional synchronization at the meter level of the hierarchy; whereas the 625 msec IBIs on slow trials fall within the general range for which individuals show a synchronization preference, the 2500 msec measure lengths on longer slow trials exceed the upper interval limits for both preferred perceptual tempo and synchronization tapping (McAuley et al., 2006; Repp, 2005). Future work examining the allocation of attention to metrically strong times over multiple rates and measure lengths is necessary to distinguish these interpretations. Additionally, the N1 attention effect was not modulated by musical expertise, indicating that the hierarchical allocation of attention during music processing is not dependent on extensive training.
This finding is the first conclusive evidence that metric strength can modulate early perceptual processing of sounds. Differential neural processing of physically identical stimuli based on relevance is how the effects of selective attention on perception have been operationalized. Furthermore, the LA concentration of the observed N1 effect is similar to the frontal and often left-weighted distribution typically observed for attentional modulation of N1 amplitude during dichotic and other directed listening tasks (Sanders & Astheimer, 2008; Näätänen, Teder, Alho, & Lavikainen, 1992; Giard, Perrin, Pernier, & Peronnet, 1988; Woods & Clayworth, 1987). Although it is possible to posit other cognitive mechanisms to explain differential perceptual processing of physically identical stimuli (e.g., priming), theory, the design of the current experiment, and the specific morphology of the ERP N1 effect all support interpreting this finding as the consequence of meter directing selective attention. Additionally, this clear association of high metric strength with enhanced early auditory processing as indexed by N1 amplitude provides support for the interpretation that other, less well understood electrophysiological measures previously associated with metric strength such as changes in gamma-band activity and steady-state evoked potentials (e.g., Nozaradan et al., 2011; Snyder & Large, 2005) may also index the modulation of early perceptual processing by temporally selective attention.
Although musical meter was employed in the current study as a model hierarchical rhythm, the modulation of attentional allocation by metric strength was not dependent on musical experience. It is possible that this finding simply indicates that the “nonmusicians” in the current study had enough exposure to music to develop expertise sufficient to allocate attention in the same way as musicians during metric listening. However, it is in our opinion more probable that metrical allocation of attention across time is a domain-general cognitive process that is used in other complex auditory tasks such as speech processing. Temporally selective attention has been demonstrated to be modulated dynamically during natural speech perception (Astheimer & Sanders, 2009), but the aspects of the speech signal that direct attention to certain times have not yet been fully characterized. Although the nature of rhythm in speech is a much-debated topic, it is commonly accepted that certain speech elements receive emphasis and that this emphasis can occur at temporally local and global levels. On the basis of the current result that temporally predictable hierarchical emphasis in music guides the allocation of attention across time, it is possible that temporally predictable hierarchical emphasis patterns in speech direct the allocation of attention across time.
Neither the P2 nor the LN effects were predicted. Metrically strong notes elicited a larger P2, but only early in the melodies in nonmusicians and only late in the melodies on slow trials in musicians. Furthermore, metrically strong notes elicited a smaller P2 late in the melodies on fast trials in musicians. An LN was observed in response to metrically strong notes in musicians and to metrically strong notes late in the melodies in nonmusicians. However, this LN was also observed in response to metrically weak notes early in the melodies in nonmusicians. The observation that musical expertise and critical point location modulate the effects of metric strength on the P2 and the LN but not the N1 clearly indicates that these effects do not represent a single phenomenon. If the differences in N1 amplitude were part of a larger sustained effect lasting through the N1, P2, and LN time windows, they would be modulated by experimental factors in the same way; that they are not demonstrates that the N1 effect is distinct from the later effects. The functional significance of the P2 and LN effects is not clear based on the present data, but the observation of multiple distinct ERP effects in response to metrically strong compared to metrically weak notes suggests that multiple cognitive processes are involved with metric perception. Further work is necessary to determine what exactly these processes are, but the observations that the P2 and LN effects differ with musical expertise and position in the melody suggest that they may be sensitive to the reliability of the metric percept and could therefore reflect ongoing aspects of metric acquisition.
The current results provide support for the primary claims of dynamic attending theory that (1) rhythms guide the allocation of attention across time and (2) attention is preferentially allocated to metrically strong times. The data indicate that this process is not dependent on or affected by musical ability, suggesting that the guidance of temporally selective attention by rhythms may not be specific to music perception. These insights combined with the demonstration that N1 amplitude is a useful metric for investigating the relationship between hierarchical rhythms and temporal attention can serve as a framework for future electrophysiological investigations of rhythmic and metric processing. Furthermore, the demonstration here that hierarchical rhythms modulate a classic index of selective attention indicates that future studies of temporally selective attention in non-musical domains such as speech processing should consider the role that hierarchical rhythms may play in directing attention.
The authors would like to thank Drs. Kyle Cave, Matthew Davidson, Alexandra Jesse, Gary Karpinski, Joe Pater, Matthew Schulkind, and Adam Tierney for comments on versions of this manuscript. We additionally thank Drs. Lori Astheimer Best, Mara Breen, Will Bush, Yibei Shen, and Patrick Taylor, as well as Ben Zobel and Nicholas Planet, for thoughtful comments on the design and analysis of this study. Furthermore, we thank Nash Brodsky, Giorgio DiIorio, Ashley Fitzroy, Evan Hare, Nathaniel Kornet, Nathan Olson, and Dominique Simmons for their direct contributions to the described study. This work was partially funded by NIH R03 DC008684 to L. D. S., and A. B. F.'s training was partially funded by NIH 5T32NS007490-09 to the UMass Neuroscience and Behavior program.
Reprint requests should be sent to Ahren Fitzroy, Department of Psychological and Brain Sciences, Tobin Hall, 135 Hicks Way, University of Massachusetts, Amherst, MA 01003 or via e-mail: firstname.lastname@example.org.
Because of a software error, an intended quarter note was produced as an eighth note followed by an eighth rest in two of the slow triple melodies; one of these anomalies was at a critical point. Additionally, because of compositional error, a note at one of the critical points in one of the slow triple melodies was not the intended scale degree, and in two of the slow quadruple melodies, the intended critical note pattern of 5115 was produced as 1551.