The music we usually listen to in everyday life consists of either single melodies or harmonized melodies (i.e., of melodies “accompanied” by chords). However, differences in the neural mechanisms underlying melodic and harmonic processing have remained largely unknown. Using EEG, this study compared effects of music-syntactic processing between chords and melodies. In melody blocks, sequences consisted of five tones, the final tone being either regular or irregular (p = .5). Analogously, in chord blocks, sequences consisted of five chords, the final chord function being either regular or irregular. Melodies were derived from the top voice of chord sequences, allowing a proper comparison between melodic and harmonic processing. Music-syntactic incongruities elicited an early anterior negativity with a latency of approximately 125 msec in both the melody and the chord conditions. This effect was followed in the chord condition, but not in the melody condition, by an additional negative effect that was maximal at approximately 180 msec. Both effects were maximal at frontal electrodes, but the later effect was more broadly distributed over the scalp than the earlier effect. These findings indicate that melodic information (which is also contained in the top voice of chords) is processed earlier and with partly different neural mechanisms than harmonic information of chords.
Throughout our life, we experience music that consists either of only one voice (e.g., a mother singing a lullaby, our own singing, somebody whistling a tune, or a solo instrument playing) or of several voices playing or singing at the same time. Behavioral data suggest that melodic processing develops earlier than chord processing during childhood (Trainor & Trehub, 1994; Sloboda, 1989; Trehub, Cohen, Thorpe, & Morrongiello, 1986; Trehub, Bull, & Thorpe, 1984; Chang & Trehub, 1977), and in Occidental music, music developed from monophonic (i.e., melodic), over heterophonic and homophonic (i.e., multipart), to polyphonic styles (e.g., Bekker, 1926). However, only little is known about differences in neural mechanisms underlying the processing of chords and melodies.
So far, a number of neurophysiological studies used chord sequence paradigms (similar to those shown in Figure 1G and I) to investigate neural correlates of music-syntactic processing in both adults (e.g., Koelsch & Jentschke, 2008; Koelsch, Jentschke, Sammler, & Mietchen, 2007; see also Leino, Brattico, Tervaniemi, & Vuust, 2007; Loui, Grent-'t Jong, Torpey, & Woldorff, 2005) and children (e.g., Jentschke & Koelsch, 2009; Jentschke, Koelsch, Sallat, & Friederici, 2008). These studies showed that irregular chord functions (for explanation of the term chord function, see Figure 2) elicit early anterior negative brain electric responses that usually emerge at approximately 100 msec, are typically maximal at approximately 180–200 msec after stimulus onset, and often (but not always) have a slight right-hemispheric weighting (e.g., Loui et al., 2005). This brain response is referred to here as the early right anterior negativity or ERAN (as in some previous studies; e.g., Koelsch & Jentschke, 2008; Koelsch et al., 2007; Leino et al., 2007; Koelsch, Gunter, Friederici, & Schröger, 2000; for a review, see Koelsch, 2009). The ERAN can be elicited by both out-of-key chords (such as the final chords in Figure 1B) and in-key chords (such as the final chords in Figure 1C; Koelsch & Jentschke, 2008; Koelsch et al., 2007); thus, the ERAN is sensitive not only to violations of the tonal key but also to violations of music-syntactic regularities within a tonal key.1 Because of its time course, polarity, and scalp distribution, the ERAN is reminiscent of the MMN (Näätänen, Paavilainen, Rinne, & Alho, 2007). However, whereas the generation of the MMN is based on representations of regularities of local intersound relationships that are extracted on-line from the acoustic environment, the generation of the ERAN relies on representations of music-syntactic regularities that exist in a long-term memory format and often refer to long-distance dependencies involving hierarchical syntactic organization (for details, see Koelsch, 2009). Therefore, the ERAN can be used as an electrophysiological index of music-syntactic processing (other indices of music-syntactic processing are reviewed in Koelsch, 2009).
Only few studies have so far investigated the neural underpinnings of music-syntactic processing of melodies. Five published ERP studies (Miranda & Ullman, 2007; Besson & Faita, 1995; Paller, McCarthy, & Wood, 1992; Verleger, 1990; Besson & Macar, 1987) used familiar melodies (such as “happy birthday to you”) that consisted either of the original tones or in which the pitch of one tone was modified (e.g., D or D# instead of C), rendering that tone unexpected and music-syntactically less regular. In all of these studies, the musical stimuli (although not necessarily the unexpected tones) were somehow task relevant (except Experiment 2 of Paller et al., 1992). For example, participants had to detect timbre deviants (Miranda & Ullman, 2007), or they were told that they would be asked questions about the melodies (Besson & Faita, 1995). In all studies (except Experiment 2 of Paller et al., 1992), unexpected notes elicited a P300, suggesting that as soon as participants have any kind of melody-related task in experimental paradigms using highly familiar melodies, unexpected notes elicit P300 potentials.
However, it is remarkable that in all of these studies (Miranda & Ullman, 2007; Besson & Faita, 1995; Paller et al., 1992; Verleger, 1990; Besson & Macar, 1987), the unexpected notes also elicited a frontal negative ERP emerging approximately 100 msec after stimulus onset (peaking ∼150 msec in the study of Verleger, 1990, and ∼120 msec in the study of Paller et al., 1992; the other studies did not report peak latencies). This ERP effect resembles the ERAN (although it appears to have a shorter peak latency and a smaller amplitude compared with the ERAN), but it is important to note that this effect presumably overlapped in the mentioned studies with a subsequent N2b (the N2b is centrally maximal and often precedes the P300; Näätänen et al., 2007). An ERAN elicited by melodies without overlapping N2b has so far only been reported by Miranda and Ullman (2007) for out-of-key notes occurring in unfamiliar melodies (and it seems that a similar potential was elicited by out-of-key notes occurring in unfamiliar melodies in musicians in the study of Besson & Faita, 1995, p. 1284). In that study (Miranda & Ullman, 2007), the ERAN was maximal between 150 and 270 msec. Similar ERAN effects have been shown to be elicited by out-of-key violations in unfamiliar melodies (Brattico, Tervaniemi, Näätänen, & Peretz, 2006), and the study of Schön and Besson (2005) showed an ERAN even in response to visually induced expectancy violations (also with unfamiliar melodies).
The combined findings indicate that irregular notes in melodies also elicit frontal negative potentials, and they suggest that theses potentials might have a shorter latency and a smaller amplitude than those elicited by irregular chord functions. However, so far no study has directly compared the neural correlates underlying the syntactic processing of melodies and chords, and it is not known whether brain responses elicited during the processing of melodies are identical to or different from those elicited during the processing of chords. Moreover, the fact that irregular tones in melodies elicit early anterior negative potentials that resemble the ERAN raises the question whether the ERAN observed in previous studies using chord sequence paradigms was only elicited by the top voice of the chords (which is usually the most salient voice in multipart arrangements). Thus, it is possible that the effects that have so far been reported for the processing of chords are simply due to the processing of melodic information.
To explore these issues, we used two classes of stimuli: (a) chord sequences (with chords that had already been used in previous studies; Jentschke et al., 2008; Koelsch & Jentschke, 2008; Koelsch et al., 2007) and (b) the top voice (i.e., the melody) of these chord sequences. The chord sequences ended either on a regular chord function (with the third of the tonic chord in the top voice; Figure 1A) or on an irregular chord function. In one block, the irregular chord function was a double dominant (DD) with an out-of-key note (also the third of the chord) in the top voice (Figure 1B). In another block, the irregular final chord was a supertonic (ST, also with the third in the top voice); that is, a chord not containing an out-of-key note (Figure 1C).2 The two different sets of chord sequences (with DDs and with STs) were used to replicate effects of previous experiments (Koelsch & Jentschke, 2008; Koelsch et al., 2007) and to investigate for which of these sets of chord sequences the ERAN previously observed in response to irregular chord functions was perhaps only due to the melody of the top voice.
Stimuli in the melody conditions were identical to the chord stimuli, except that only the top voice was presented (Figure 1D–F). All melody sequences thus started on the first scale tone (which is the root tone of a key; note that listeners tend to interpret the first tone of a melody as the root tone of a key; e.g., Krumhansl & Kessler, 1982). The regular melodies ended on the third scale tone of the key of the sequence, whereas the “irregular” melodies ended on the fourth scale tone (melodies derived from chord sequences ending on an ST) or on the augmented fourth (melodies derived from chord sequences ending on a DD). Because, at the end of a melody, the third scale tone is perceived as more stable and regular than the fourth and augmented fourth scale tones (e.g., Krumhansl, 1979), we aimed at testing whether those “irregular” tones (fourth and augmented fourth) would elicit any ERP effects (compared with regular tones) and whether these effects would differ from those elicited by chords. Such differences between chords and melodies are to be expected because melodies convey syntactic information only with one voice (and thus only in the horizontal dimension), whereas chords convey syntactic information with several voices (and thus in both the horizontal and the vertical dimension; see also Tramo, Cariani, Delgutte, & Braida, 2001). More specifically, the syntax-relevant information about a chord's function is absent in the tones of melodies because single tones of melodies do not belong unambiguously to a specific chord function (for an illustration, see legend of Figure 2).
We hypothesized that music-syntactic irregularities in melodies as well as in chords would elicit early anterior negativities (i.e., ERAN effects) as well as N5 effects (the N5 is a late negativity usually following the ERAN; Koelsch et al., 2000, 2007; Leino et al., 2007; Miranda & Ullman, 2007; Loui et al., 2005; Schön & Besson, 2005). This would support previous findings that music-syntactically irregular chord functions (e.g., Koelsch et al., 2007) as well as music-syntactically irregular tones of melodies (e.g., Miranda & Ullman, 2007) elicit ERAN potentials. Moreover, we expected larger ERAN effects in response to irregular chord functions compared with irregular melody tones because only chords contained functional-harmonic information (see above). If, however, irregular tones elicit the same effect as irregular chords, then the ERAN effect observed in previous studies in response to chords would have been simply elicited by the top voice of the chords (and would have been due to melodic processing and not due to the processing of chord functions). Finally, we did not expect differences in electric brain responses to the different irregular chord types (DDs and STs), as in a previous study (Koelsch et al., 2007). No directed hypotheses were made about possible differences in ERP effects elicited by the two types of melody endings (DD and ST).
Data were collected from 16 subjects (mean age = 22.7 years, age range = 20–31 years, 8 women). Participants did not have any formal musical training besides normal school education and had never learned to play a musical instrument, except one male subject who had had piano lessons for 3 years, 12 years before the study, and one female subject who had had occasionally sung in amateur church choirs for the last 12 years. All participants were right-handed (mean laterality quotient = 97.5%) according to the Edinburgh Handedness Inventory (Oldfield, 1971) and reported to have normal hearing and no neurological disease.
Stimuli consisted of three types of chord sequences (Figure 1A–C) and three types of melodies (Figure 1D–F). The first set of chord sequences (Set 1) consisted of two 5-chord sequences (one with a regular ending and one with an irregular ending; Figure 1A and B) that were transposed to the 12 major keys, resulting in 24 different sequences. These sequences had already been used in previous studies (e.g., Jentschke et al., 2008; Koelsch et al., 2007). The first four chord functions were identical for all sequence types (tonic–subdominant–supertonic–dominant). The final chord of the regular sequence type (Figure 1A) was a tonic, and the final chord of the irregular sequence type of the first chord set was a DD (Figure 1B).3 The second set of chord sequences (Set 2) was identical to those of Set 1, except that the irregular chord sequence ending was an ST (Figure 1C).4 Auditory modelling performed in a previous study (Koelsch et al., 2007) for both DD and ST chord sequences showed that the pitch image of final DDs and STs correlated even higher than that of final tonic chords with the echoic memory representation established by the previous chords, suggesting that neither STs nor DDs represent physical deviants (also note that final DDs and STs had even more pitches in common with the first four chords than final tonic chords, see Figure 1; for details, see Koelsch et al., 2007).
Presentation time of chords was 500 msec, except for the final chords that lasted 1000 msec, followed by a 1000-msec pause (the same timing was used for the melodies: Each tone had a duration of 500 msec, except for the last tone that was presented for 1000 msec). Using only the sequences depicted in Figure 1A–C transposed to different keys gave us the maximum acoustic control over the musical stimulus (for details, see Koelsch et al., 2007; for studies investigating music processing with more naturalistic stimuli, see, e.g., Koelsch, Kilches, Steinbeis, & Schelinski, 2008; Steinbeis, Koelsch, & Sloboda, 2006; Koelsch & Mulder, 2002).
The sets of melodies consisted of the top voices of the chord sequences (Figure 1D–F). That is, each set of melodies consisted of two sequence types. In the first set of melodies (Set 1), the first two tones were identical, followed by a step of a major second upward (third tone) that was repeated by the fourth tone. The fifth tone was either another step of a major second upward (corresponding to the top voice of the regular chord sequences; Figure 1D) or a step of a major third upward (corresponding to the top voice of the irregular sequences that ended on a DD; Figure 1E). The second set of melodies (Set 2) was identical to the first set, except that the fifth tone of the irregular sequences was a step of a minor third upward (corresponding to the top voice of the irregular sequences ending on an ST; right of Figure 1F). As for the chord sequences, the two melody sequences of each set were transposed to the 12 major keys, resulting in 24 different sequences per set.
Sound files of sequences were generated using Cubase SX 2.0 (Steinberg Media Technologies, Hamburg, Germany) with a grand piano sound (Steinberg, The Grand). Then, the RMS power of chord and melody stimuli was matched so that all stimulus sequences had the same loudness (modification and measurement of RMS power values was performed using CoolEdit 2.0; Syntrillium Software, Phoenix, AZ). In addition to the sequences played with a piano sound, we generated sequences with acoustic oddballs by filtering one chord of the sequence (for chord sequences) or one tone of the sequence (for melody sequences) with a fast Fourier transform filter (2048 points, Blackman window, suppressing frequencies less than 1600 Hz by 30%; filtering was performed using CoolEdit 2.0). Such deviants occurred with equal probability at any position of the sequences.
Each sequence type (regular and irregular) of each set was presented 96 times (eight sequences per tonal key), including 12 sequences with a filtered chord. Stimulus sets were presented in blocks; the first two blocks were melody blocks followed by two chord sequence blocks (melodies were presented before chords to avoid that participants automatically harmonize the melodies after having heard them repeatedly in previous blocks). Half of the subjects (half of them female) were first presented with the first melody set (Figure 1H) and the other half (half of them female) with the second melody set (Figure 1J). Likewise, presentation of the first (Figure 1G) and second (Figure 1I) chord sequence set was balanced across subjects.
In the experiment, sequences were presented in direct succession (for an example of the block with DDs, see Figure 1G; for an example of the block with STs, see Figure 1I; and for examples of melodic sequences, see Figure 1H and J). Regular and irregular sequences occurred equiprobably (p = .5), consecutive sequences always had a different tonal key, and not more than three sequences of the same type (regular or irregular) followed each other.
Stimuli were presented at a comfortable volume using Presentation 0.52 software (Neurobehavioral Systems, Albany, NY). Participants were not informed about the regular and irregular sequence endings. Instead, they were informed about the filtered chords and tones and were asked to detect them and to indicate their detection by pressing a response button. Similar tasks have already been used in a number of previous studies (e.g., Koelsch & Jentschke, 2008; Koelsch et al., 2000, 2007; Leino et al., 2007; Miranda & Ullman, 2007; Loui et al., 2005) and allowed us to control that participants attended to the auditory stimulus, without requiring them to attend or to respond to the irregular stimuli (such a conscious detection would have elicited electric brain responses that overlap and obscure the brain responses related to the music-syntactic analysis of the stimuli; see, e.g., Koelsch et al., 2000, 2007).
Data Recording and Analysis
The EEG was recorded with 59 Ag/AgCl cap-mounted electrodes (Electrocap International, Eaton, OH) according to the extended 10–20 system (FP1, FP2, AF7, AF8, AF3, AF4, AFZ, F9, F10, F7, F8, F5, F6, F3, F4, FZ, FT9, FT10, FT7, FT8, FC5, FC6, FC3, FC4, FCZ, T7, T8, C5, C6, C3, C4, CZ, TP7, TP8, CP5, CP6, CP3, CP4, CPZ, P9, P10, P7, P8, P5, P6, P3, P4, PZ, PO7, PO8, PO3, PO4, POZ, O1, O2, OZ). The left mastoid (M1) served as reference; additional electrodes were placed on the nose tip and the right mastoid (M2). The ground electrode was located on the sternum. To monitor eye movements and blinks, horizontal and vertical EOGs were bipolarly recorded from electrodes placed on the outer canthus of each eye (horizontal EOG) as well as above and below the right eye (vertical EOG). Impedances were kept less than 5 kΩ. Signals were amplified with two synchronized PORTI-32/MREFA amplifiers (Twente Medical Systems International BV, Enschede, NL) and digitized with a sampling rate of 250 Hz.
After the measurement, data were re-referenced to the mean of both mastoids and filtered using a 0.25- to 25-Hz band-pass filter (fir, 1055 points, −6 dB/octave, hamming window). For artifact reduction, EEG and EOG data were rejected whenever the SD of the signal recorded at any electrode exceeded 30 μV (a) within a 200-msec gliding window (to reduce artifacts related to fast signal changes) or (b) within an 800-msec gliding window (to reduce artifacts related to slow signal changes). Trials with typical eye blinks were marked and corrected by applying EOG correction (xeog, EEP software; Advanced Neuro Technology, Enschede, NL). ERPs were then calculated separately for the regular and irregular final chords and tones of each set using a 200-msec prestimulus baseline and a 1000-msec poststimulus window. Sequences containing filtered instruments were excluded from further analysis.
For statistical analysis, mean amplitude values were calculated for four ROIs: left anterior (AF3, F7, F3, FT7, FC3), right anterior (AF4, F8, F4, FT8, FC4), left posterior (T7, C3, CP5, P7, P3), and right posterior (T8, C4, CP6, P8, P4). Values were calculated for three different time windows: 100–150, 160–210, and 440–640 msec. After visual inspection of the ERP data, the two early time windows were chosen so that time windows (a) had the same length, (b) did not overlap with each other, and (c) were approximately centered around peak maxima of ERP effects. All time windows were similar to those used in previous studies (for the first time window, see, e.g., Paller et al., 1992; for the second and third time windows, see Koelsch & Jentschke, 2008).
For all of these three time windows, global ANOVAs were computed with the repeated measures factors Regularity (regular, irregular), Class (melodies, chords), Hemisphere (left, right), AntPost (anterior, posterior), and Set (DD, ST). Because no main effects of Set and no interactions between Set and Regularity were found in any of the time windows (p > .46 in all tests), data of the sets with sequences ending on STs and with sequences ending on DDs were pooled separately for each stimulus class (melodies, chords), resulting in four-way ANOVAs with factors Regularity, Class, Hemisphere, and AntPost.
Participants detected, on average, 99.2% of the filtered sounds, indicating that participants attended to the timbre of the musical stimulus and that they did not have difficulties in reliably detecting the timbre deviants. Hit rates did not differ between sets (DDs and STs, p > .9) or stimulus classes (melodies and chords, p > .9). Likewise, RTs (M = 719 msec, SD = 82 msec) did not differ between sets (p > .9) or classes (p > .5). The finding that the behavioral data did not differ between melody and chord blocks suggests that participants' degree of attention was similar for both conditions.
Figure 3A shows the electric brain responses to harmonically regular and irregular chord sequence endings. Irregular chords of both sets of sequences (DDs and STs) elicited negative potentials with an onset of approximately 90–100 msec and with peak latencies of approximately 180 msec (see also difference waves of Figure 4). ERPs elicited by regular and irregular melody endings are shown in Figure 3B. Irregular melody endings elicited a negative effect, which had an onset of approximately 90–100 msec, was maximal at approximately 125 msec, and had a bilateral fronto-central scalp distribution. At approximately 180 msec (i.e., around the peak latency of the ERAN elicited by chords), brain electric responses to irregular and regular melody endings virtually did not differ from each other. That is, both irregular chords and irregular melodies elicited negative potentials in an early time window (∼90–150 msec), whereas irregular chords (but not irregular melodies) elicited in addition negative potentials in a time window of approximately 150–220 msec (best to be seen in the difference waves of Figure 4 and in the isopotential maps of Figure 5; for amplitude values of effects within these time windows, see Table 1).
|Left-frontal||−0.615 (0.239)||−0.222 (0.215)||−0.237 (0.318)|
|Right-frontal||−0.474 (0.268)||−0.020 (0.281)||−0.273 (0.296)|
|Left-frontal||−0.685 (0.251)||−1.804 (0.271)||−0.300 (0.259)|
|Right-frontal||−0.587 (0.230)||−1.593 (0.209)||−0.627 (0.244)|
|Left-frontal||−0.615 (0.239)||−0.222 (0.215)||−0.237 (0.318)|
|Right-frontal||−0.474 (0.268)||−0.020 (0.281)||−0.273 (0.296)|
|Left-frontal||−0.685 (0.251)||−1.804 (0.271)||−0.300 (0.259)|
|Right-frontal||−0.587 (0.230)||−1.593 (0.209)||−0.627 (0.244)|
An ANOVA with factors Regularity, Class, Hemisphere, and AntPost for the early time window (100–150 msec) indicated an effect of Regularity, F(1, 15) = 8.08, p = .012, reflecting that irregular endings elicited negative effects compared with regular endings. Moreover, the ANOVA indicated interaction between Regularity and AntPost, F(1, 15) = 8.24, p = .012, reflecting that the negative effects elicited by irregular endings had a frontal preponderance. There was no interaction between factors Regularity and Class, F(1, 15) = 0.11, p = .743, indicating that amplitudes of effects elicited by irregular endings did not differ between chord sequences and melodies. The scalp topographies in Figure 5 suggest differences in the hemispheric distribution during the early time window between chords and melodies, but there was no significant interaction between Hemisphere, Class, and Regularity (p = .646), nor between AntPost, Hemisphere, Class, and Regularity (p = .649).
The analogous ANOVA for a later time window (160–210 msec) indicated an effect of Regularity, F(1, 15) = 28.24, p < .001; an interaction between factors Regularity and Class, F(1, 15) = 16.56, p = .001, reflecting that irregular chords but not irregular melodies elicited negative effects in this time window; and an interaction of Regularity, Class, and AntPost, F(1, 15) = 31.18, p < .001, reflecting that the significant effect of irregular chords had a frontal scalp distribution. Planned comparisons with user-defined contrasts revealed that the difference between the regular and the irregular chords was significant, both when considering all ROIs, F(1, 15) = 46.00, p < .001, as well as when analyzing the frontal ROIs only, F(1, 15) = 54.90, p < .001. By contrast, the difference between regular and irregular tones was not significant, neither when considering all ROIs, F(1, 15) = .97, p = .340, nor when analyzing the frontal ROIs only, F(1, 15) = .27, p = .612.
To compare the scalp distribution of the effects elicited by irregular chords between the early (100–150 msec) and the later (160–210 msec) time window, an ANOVA was computed for chords with factors Time Window (100–150 and 160–210 msec), Regularity, Hemisphere, and AntPost. This ANOVA indicated an effect of Regularity, F(1, 15) = 24.70, p < .001; an interaction between Regularity and AntPost, F(1, 15) = 17.88, p = .001; an interaction between Regularity and Time Window, F(1, 15) = 15.72, p = .001, reflecting that the effects in the earlier and the later time window had different amplitudes; and an interaction between Regularity, Time Window, and AntPost, F(1, 15) = 5.71, p = .030, reflecting that the early effect elicited by irregular chords was more frontally and the later effect more broadly distributed over the scalp.
At approximately 500–550 msec, irregular chords and tones also elicited a small frontal negativity (the N5). An ANOVA with factors Regularity, Class, Hemisphere, and AntPost for a time window from 440 to 640 msec did not indicate an effect of Regularity but an interaction between factors Regularity and AntPost, F(1, 15) = 6.30, p = .024. There was no interaction between Regularity and Class (p = .626). A follow-up ANOVA with frontal ROIs revealed only approaching significance for the factor Regularity when tested one-sided (p < .07).
Both irregular melody endings and irregular final chord functions elicited an early effect approximately 100–150 msec after the onset of the tones. In addition to this early effect, irregular chords (but not irregular melody endings) also elicited an effect in a later time window (∼160–210 msec). Moreover, the effect in the early time window had a different scalp distribution than the later effect (the early effect elicited by irregular chords was more frontally and the later effect more broadly distributed over the scalp). These findings suggest that melodic information (which is present not only in melodies but also in the top voice of chords) is processed earlier and with partly different neural mechanisms than harmonic information specific for chords. In the following discussion, we will refer to the earlier effect (elicited by the processing of melodic information) as N125 and to the later one (elicited by the processing of harmonic information of chords) as N180.
The observation of an early anterior negativity (N125) in response to irregular tones of melodies is consistent with previous studies investigating melody processing with ERPs (Besson & Faita, 1995; Paller et al., 1992; Verleger, 1990; Besson & Macar, 1987). In these studies, incongruent endings (final or penultimate tones of well-known melodies) elicited a negative anterior ERP response that emerged around the N1 and peaked at approximately 120 msec in the study of Paller et al. (1992) and at approximately 150 msec in the study of Verleger (1990; no peak latencies were reported in the other studies). Similar responses with an onset around the N1, but with longer peak latencies, were reported in studies from Miranda and Ullman (2007) and Brattico et al. (2006). In the latter two studies, the latency of effects was perhaps longer because incongruous tones occurred in the middle of melodies (where expectancies for tones are presumably weaker compared with phrase endings).
The N125 did not differ between DD and ST sequences, although the augmented fourth (on which DD sequences ended) is a less stable tone compared with the fourth scale tone (on which ST sequences ended). Perhaps this difference was too small to be reflected in the ERPs of the present study.5
The peak latency of the N180 elicited by chords (but not by melodies) is the typical latency of the ERAN and consistent with a number of previous studies investigating chord processing with EPRs (for a review, see Koelsch, 2009). The finding that the N180 was observed in the present study in response to irregular chord functions, but not melodies, demonstrates that similar effects elicited in previous studies using similar chord sequence paradigms (e.g., Jentschke et al., 2008; Koelsch & Jentschke, 2008; Koelsch et al., 2007) were not simply due to the processing of the melodic information of the chords.
Subcomponents of the ERAN
Given that melodic and harmonic information appears to elicit ERP effects that differ with regard to their latency and scalp distribution (the N125 and the N180), the ERAN observed in response to music-syntactically irregular chords perhaps consists of two subcomponents: first, an earlier one (N125) reflecting the processing of a sound expectancy violation related to music-syntactic properties of tones, such as the stability of the tones of a melody (including the scale properties of tones, Krumhansl, 1979); and second, a later one (N180) reflecting the processing of a sound expectancy violation related to the harmonic function of a chord (note that the degree of consonance/dissonance of a chord, which is also an aspect of the vertical dimension of chords, is presumably processed earlier than functional-harmonic information; see also Tramo et al., 2001, but this issue remains to be specified).
This interpretation is consistent with the notion that the hierarchy of stability of chords (Bharucha & Krumhansl, 1983) is more complex than the hierarchy of stability of tones (Krumhansl, 1979; see also Krumhansl & Kessler, 1982) and consistent with the assumption that melodic processing develops earlier than the processing of chord functions during childhood (e.g., Trainor & Trehub, 1994; see also Koelsch, 2009). Note that melodic information is also more concrete than the more abstract functional-harmonic information of a chord because melodies, but not chords, can be sung (by a single individual).
An alternative, although less plausible, interpretation would be that both N125 and N180 reflect the processing of a general sound expectancy violation and that the N180 has a larger amplitude as well as a longer latency compared with the N125 because chords consisted of four voices (none of which contained a tone of the expected final tonic chord), in contrast to melodies, which consisted of only one voice. However, it is unlikely that the latency difference between the N125 and the N185 can simply be explained by the degree of a general sound expectancy violation: Two studies showed that the degree of music-syntactic irregularity affects the amplitude but much less the latency of the ERAN (Leino et al., 2007; Koelsch et al., 2000). In these two studies, Neapolitan sixth chords were presented within chord sequences at positions in which they were highly irregular and at positions in which they were only moderately irregular. In both studies, the latency of the ERAN was nominally even slightly longer for the less irregular than for the highly irregular chords (Koelsch et al., 2000, p. 536; Leino et al., 2007, p. 173). In addition, the finding that N125 and N180 have different scalp distributions lends support for the assumption that N125 and N180 reflect different processes (i.e., processing of melodic and harmonic sound expectancy violation) rather than one process reflecting the processing of a general sound expectancy violation.
Notably, the assumption that irregular chord functions elicit N180 potentials does not rule out the possibility that this effect can also be elicited by melodies: Data from Miranda and Ullman (2007) indicate that irregular tones of melodies that establish a tonal key and a harmonic context more clearly than the melodies used in the present study also elicit N180 potentials (see also Brattico et al., 2006), perhaps because out-of-key tones in melodies automatically engage processes that are also related to harmonic processing. Future studies could investigate how different degrees of the establishment of tonal key and harmonic context as well as different degrees of rhythmic and spectral complexity influence N125 and N180 potentials. Such studies could also investigate whether musical training exerts similar effects on N125 and N180 (effects of musical training on the ERAN are reviewed in Koelsch, 2009; see also Loui & Wessel, 2007; Vuust et al., 2005; Fujioka, Trainor, Ross, Kakigi, & Pantev, 2004).
The notion that N125 and N180 reflect different cognitive processes is supported by a study on different oscillatory activities to regular and irregular chord functions (Herrojo Ruiz, Koelsch, & Bhattacharya, 2009): Using the same chord sequences as those in Figure 1A, C, and I, that study reported a difference of spectral power between ST and tonic chords in the delta band at right anterior and right central electrode regions between 100 and 150 msec after chord onset (corresponding to the N125 time window) as well as differences in the spectral power of phase-locked theta activity between ST and tonic chords in the time range from 100 to 160 msec at left and right anterior electrode regions. By contrast, in the time span from 175 to 250 msec (corresponding to the N180 time window), the processing of irregular chords (compared with regular chords) was associated with a decrease in the degree of global phase synchrony in the lower alpha band (8–10 Hz; this effect was maximal at FC4 and presumably reflected the decoupling between the oscillatory activities at right fronto-central and left temporo-parietal regions mediated by long-range alpha phase synchrony).
However, also note that, in the melody condition, the amount of pitch distance between the last regular tone and the two preceding tones was ∼10% (two semitones), whereas it was ∼16% (three semitones) and ∼20% (four semitones) for the irregular endings. Thus, the strength of refractoriness effects could have been slightly larger for the irregular than for the regular tones and thus also account for, or at least contribute to, the N125 effect (e.g., Horváth et al., 2008). Nevertheless, because tones were spectrally rich piano tones and because irregular chords had even more pitches in common with the preceding two chords, we think that it is rather unlikely that the N125 was only due to refractoriness effects. To rule out such a possibility, future studies could try to devise stimuli in which possible refractoriness effects would be equal or even larger for regular events.
Lateralization of the ERAN
The ERAN was not right lateralized in the present study (neither the N125 nor the N180), similar to a few previous ERP studies with chord-sequence paradigms (Leino et al., 2007; Steinbeis et al., 2006; Loui et al., 2005) and melodies (Miranda & Ullman, 2007). An important difference between these studies and the studies in which the ERAN was right lateralized (e.g., Koelsch & Jentschke, 2008; Koelsch & Sammler, 2008; Koelsch et al., 2000, 2007) is that the latter studies had relatively large numbers of participants (20 or more; Koelsch & Jentschke, 2008; Koelsch & Sammler, 2008; Koelsch et al., 2007), whereas studies in which no lateralization of the ERAN was reported have mostly measured less than 20 subjects (16 subjects in the present study; 18 subjects in the study of Loui et al., 2005; and 10 subjects in the study of Leino et al., 2007). This difference is important because it seems that the ERAN is lateralized more strongly in men than in women (who often show a rather bilateral ERAN; Koelsch, Maess, Grossmann, & Friederici, 2003); therefore, a relatively large number of subjects are required until the lateralization of the ERAN reaches statistical significance. Additional factors that modulate the lateralization of the ERAN might include the salience of irregular chords, attentional factors, and signal-to-noise ratio of ERP data (for a review, see Koelsch, 2009).
However, a number of functional neuroimaging studies showed that, on average, the neural activity underlying the processing of music-syntactically irregular chords has a right-hemispheric weighting (for a review, see Koelsch, 2009). Thus, even if the EEG effect is sometimes not significantly lateralized, it is reasonable to assume that the neural generators of the ERAN (particularly those of the N180) are activated more strongly in the right than in the left hemisphere. Although the ERAN effect was not lateralized in the present study, we use the term ERAN here because this term has been established for the functional significance of this ERP component rather than for its scalp distribution (see also Koelsch et al., 2007; Miranda & Ullman, 2007; Maess, Koelsch, Gunter, & Friederici, 2001). Note that similar conflicts exist for most (if not all) endogenous ERP components; for example, the P300 is often not maximal at approximately 300 msec (e.g., McCarthy & Donchin, 1981), the N400 elicited by violations in high cloze probability sentences typically starts around the P2 latency range (van den Brink, Brown, & Hagoort, 2001; Gunter, Friederici, & Schriefers, 2000), and the MMN has sometimes positive polarity in infants (e.g., Winkler et al., 2003; Friederici, Friedrich, & Weber, 2002).
Late ERP Effects (N5)
Both irregular chords and irregular melody tones elicited small N5 potentials, although the N5 only approached statistical significance at frontal electrodes when tested one-sided. This lack of clear significance was unexpected because the present experiment used chord sequences identical to a previous study (Koelsch et al., 2007), in which the N5 was significant. However, as already mentioned with regard to the lateralization of the ERAN, 24 individuals were measured in our previous study (Koelsch et al., 2007) compared with 16 individuals in the present study. Therefore, it appears that ERP studies with rather subtle music-syntactic irregularities should include a large number of subjects, ideally in combination with a large number of trials per subject (see, e.g., Koelsch & Sammler, 2008), to obtain significant N5 potentials as well as a clear lateralization of the ERAN.
The N5 has been taken to reflect processes of harmonic integration (e.g., Koelsch et al., 2000), although the present study suggests that the N5 also reflects processes of melodic integration, because both melodies and chords elicited N5 potentials. Notably, the N5 did not differ between chords and melodies (in contrast to the N180), supporting the notion that ERAN and N5 reflect different aspects of music processing: A recent study (Steinbeis & Koelsch, 2008) showed that the processes underlying the generation of the N5 (but not the processes underlying the generation of the ERAN) interact with the N400 (reflecting processes of semantic integration during the perception of language), suggesting that the N5 reflects at least partly processing of musical meaning (Steinbeis & Koelsch, 2008; see also Koelsch et al., 2000).
A previous study reported an ERAN in response to diatonic and nondiatonic violations in well-known melodies and compared these effects with responses to such violations in unfamiliar melodies (Miranda & Ullman, 2007). However, it is not possible that similar familiarity effects were at work in the present study: No well-known melodies were used, and we used only two sequence types in each block (which were transposed to the 12 major keys).
Ordering of Blocks
We presented the melody sequences always before the chord sequences to avoid that participants automatically harmonize the melodies. This was justified because a previous study (Koelsch & Jentschke, 2008) showed that the ERAN amplitude does not increase but decrease over the course of an experiment: After 2 hours of listening to the sequences used in the present study in the ST block, the ERAN elicited by irregular chord functions was still significant, but the ERAN amplitude declined over the course of the experiment (Koelsch & Jentschke, 2008). Thus, if any session effect is to be expected, then it is an amplitude reduction of the ERAN during the chord sequence blocks, and therefore the N180 elicited by chords (compared with tones) could not have been due to the fact that chord sequence blocks were presented after melody blocks.
In conclusion, the present data indicate that melodic information (which is also contained in the top voice of chords) is processed earlier than the harmonic information contained in chords and that melodic information (of single melodies or with regard to the top voice of chords) appears to be processed with partly different neural mechanisms than harmonic information. Thus, processing of music-syntactically irregular chords might consist of two components: an earlier one reflecting the processing of a sound expectancy violation related to melodic information (such as the stability of the tones of a melody, including the scale properties of tones) and a later one reflecting the processing of a sound expectancy violation related to the harmonic function of a chord.
The authors thank Anke Heyder and Susann Steudte who helped to create the stimuli and conducted the EEG measurements. This work was supported by a grant of the German Research Foundation (Deutsche Forschungsgemeinschaft, DFG) awarded to S. K. (KO 2266/2-1/2).
Reprint requests should be sent to Stefan Koelsch, Max-Planck-Institute of Human Cognitive and Brain Sciences, Stephanstr. 1a, 04103 Leipzig, Germany, or via e-mail: firstname.lastname@example.org.
A tonal key is defined by the notes that belong to the (diatonic) scale of the key, for example, the notes C–D–E–F–G–A–B in C major.
Behavioral data collected in a previous study with the same chord sequences (Koelsch et al., 2007) indicate that the salience of DDs and STs is moderate and quite comparable: When participants were asked to indicate whether the final chords of sequences are regular or irregular, percentages of correct responses were around 80% and very similar for DD and ST blocks (DD block = 80%, ST block = 81%).
A double dominant (in major) is often also referred to as chromatic supertonic.
A supertonic (in major) is often also referred to as diatonic supertonic.
Note that endings of DD melody sequences did not represent a frank out-of-key violation because all tones of these sequences would also fit within the tones of a scale when the first tone of the sequence is interpreted as the fourth scale tone (so that, e.g., the notes in C major would be F–F–G–G–B). However, because individuals tend to interpret the first tone of a melody as the root tone of a key (Krumhansl, 1979), it is likely that participants perceived the irregular final tone of DD sequences as out-of-key tones.