Prosody, particularly accent, aids comprehension by drawing attention to important elements such as the information that answers a question. A study using ERP registration investigated how the brain deals with the interpretation of prosodic prominence. Sentences were embedded in short dialogues and contained accented elements that were congruous or incongruous with respect to a preceding question. In contrast to previous studies, no explicit prosodic judgment task was added. Robust effects of accentuation were evident in the form of an “accent positivity” (200–500 msec) for accented elements irrespective of their congruity. Our results show that incongruously accented elements, that is, superfluous accents, activate a specific set of neural systems that is inactive in case of incongruously unaccented elements, that is, missing accents. Superfluous accents triggered an early positivity around 100 msec poststimulus, followed by a right-lateralized negative effect (N400). This response suggests that redundant information is identified immediately and leads to the activation of a neural system that is associated with semantic processing (N400). No such effects were found when contextually expected accents were missing. In a later time window, both missing and superfluous accents triggered a late positivity on midline electrodes, presumably related to making sense of both kinds of mismatching stimuli. These results challenge previous findings of greater processing for missing accents and suggest that the natural processing of prosody involves a set of distinct, temporally organized neural systems.

In spoken communication, speakers use prosody—the melody and rhythm of speech—in ways that help the listener understand the message. The function of prosody is very prominent in West Germanic languages such as Dutch, German, and English (Vallduvi, 2002; Ladd, 1996) where speakers assign pitch accents to the most important information in the utterance (the focus element) and leave less important parts unaccented (the background elements). Languages differ in the exact instantiation of accent on elements in focus, so we will use the generic term “focus accent” to refer to the phenomenon in this article.

As an answer to the question, What did the club give to the player?, the sentence They gave (background) a BONUS (focus) to the player (background) has an appropriate focus accent, which emphasizes the segment that answers the question, while a sentence would be inappropriate in which background information receives a focus accent instead, as in They gave (background) a bonus (focus) to the PLAYER (background). In this sense, accents “focus” the listener's attention to the most important information (Wilson & Wharton, 2006), facilitating utterance interpretation (reviewed in Cutler, Dahan, & Van Donselaar, 1997). The function of focus accent in guiding attention to what is important is also clear from the fact that implausible information that is unaccented tends not to be noticed (Wang, Bastiaansen, Yang, & Hagoort, 2011).

The distinction between focus and background information within an utterance, also called “information structure,” derives from the discourse context, which determines which information is familiar and therefore backgrounded. In context, listeners may expect the important information in a certain position within the sentence to be marked prosodically as focus. Nooteboom and Kruyt (1987) have shown that listeners are capable of recognizing inappropriate accentuation in context: in their off-line rating study, listeners rejected sentences containing unaccented elements which were expected to be in focus (“missing” focus accents). Oddly, they tolerated accents on background elements (“superfluous” background accents), despite the fact that focus accent is hypothesized to have the effect of focusing attention on important information. This pattern brings to mind the famous minimalist principle of design that “less is more.” Here, less marking of information structure than necessary, as by a missing accent, leads to more processing difficulty. Intuitively, however, one would have expected that more marking of information structure, as by a superfluous accent, would be more noticeable and hence increase processing costs.

This sort of behavioral data reported by Nooteboom and Kruyt is important but relies on a conscious judgment. ERPs have been used in a number of recent studies to investigate the neural substrates of the processing of linguistic prosody. ERPs are useful because they directly measure brain activity; changes in neural processing related to various conditions can reveal the time point at which a difference is recognized without the need for an explicit task, as well as giving an indication about the nature of the brain responses involved.

Two linguistic functions of prosody have received the most attention in the ERP literature to date: the processing of prosodic boundaries and the use of pitch accents for focus marking (see Table 1 for an overview). Prosodic boundaries, typically consisting of slowed speech tempo and a pitch change, serve to divide speech into segments, usually at syntactic boundaries. The processing of prosodic boundaries per se, as compared with a sequence without a break, consistently evokes an early positivity resembling the P2 component for acoustic differences (Li, Wang, & Lu, 2010), and a late positive component, sometimes called the closure positive shift (CPS; Steinhauer & Friederici, 2001; Steinhauer, Alter, & Friederici, 1999).

Table 1. 

Overview of Previous ERP Studies on Prosody Processing

Study
Task
Paradigm
Conditions
Effect
Interpretation
Possible Problems
FocusAccent 
Hruska et al., 2001 (German) Prosodic Auditory; question–answer pairs Superfluous accent None  Time-locking (1a) 
Missing accent NEG-POS N400-P600 Matching (2a, b) 
   Boundary (3) 
Hruska & Alter, 2004 (German) Prosodic Auditory; question–answer pair Superfluous accent POS CPS Matching (2a, b) 
Missing accent NEG-POS N400-P600 
Toepel & Alter, 2004 (German) Comprehension Auditory; dialogues with contrastive/neutral focus Superfluous NEG-POS NEG-CPS Time-locking (1a) 
Missing POS CPS Matching (2a) 
Prosodic Superfluous POS CPS  
Missing NEG-POS NEG-CPS  
Magne et al., 2005 (French) Prosodic Auditory; question–answer pairs Medial superfluous POS P3a + P3b Time-locking (1b) 
Final superfluous NEG N400/CNV Matching (2a) 
Medial missing POS P3b Boundary (3) 
Final missing NEG N400/CNV  
Heim & Alter, 2006 (German) Comprehension Auditory; isolated sentences with even Accent NEG EN  
Superfluous POS   
Missing POS   
Toepel et al., 2007 (German) Prosodic Auditory; dialogues with contrastive focus Superfluous POS CPS Time-locking (1a, c) 
Missing Visual NEG-POS N400-CPS Matching (2a, b) 
 
SemanticsProsody Mismatch 
Wang, Hagoort, & Yang, 2009 (Chinese) Comprehension Reading; dialogues with semantically in/appropriate focus/nonfocus Focus inappropriate vs. appropriate NEG N400 Time-locking (1d) 
Nonfocus inappropriate vs. appropriate None (NEG) (Very reduced N400) Boundary position unclear 
Appropriate nonfocus (vs. focus) NEG Larger N400  
Wang et al., 2011 (Dutch) None reported Auditory; dialogues with prosodic/semantic mismatch Missing None Time-locking (1d) 
Superfluous NEG N400 Matching (2b) 
Focus accent > no accent NEG N400  
Nonfocus accent = no accent None   
Sem. incongruent NEG N400  
 
Prosodic Boundaries 
Kerkhofs, Vonk, Schriefers, & Chwilla, 2007 (Dutch) None Auditory; dialogues, with prosodic/syntactic mismatch Mismatch prosodic/syntactic break POS CPS Time-locking (1e) 
Match prosodic/syntactic break POS (right) CPS Matching (2a, b) 
Bögels, Schriefers, Vonk, Chwilla, & Kerkhofs, 2009 (Dutch) Comprehension Auditory; prosodic breaks in single sentences Prosodic break POS CPS (larger with object verbs) Time-locking (1f) 
Boundary (3) 
Li et al., 2010 (Chinese) Comprehension Auditory; dialogues with prosodic/syntactic mismatch Missing prosodic boundary NEG LAN Time-locking (1d) 
Superfluous prosodic boundary NEG LAN + N400 Matching (2a) 
Prosodic boundary POS P2 (fronto-central) Boundary (3) 
Study
Task
Paradigm
Conditions
Effect
Interpretation
Possible Problems
FocusAccent 
Hruska et al., 2001 (German) Prosodic Auditory; question–answer pairs Superfluous accent None  Time-locking (1a) 
Missing accent NEG-POS N400-P600 Matching (2a, b) 
   Boundary (3) 
Hruska & Alter, 2004 (German) Prosodic Auditory; question–answer pair Superfluous accent POS CPS Matching (2a, b) 
Missing accent NEG-POS N400-P600 
Toepel & Alter, 2004 (German) Comprehension Auditory; dialogues with contrastive/neutral focus Superfluous NEG-POS NEG-CPS Time-locking (1a) 
Missing POS CPS Matching (2a) 
Prosodic Superfluous POS CPS  
Missing NEG-POS NEG-CPS  
Magne et al., 2005 (French) Prosodic Auditory; question–answer pairs Medial superfluous POS P3a + P3b Time-locking (1b) 
Final superfluous NEG N400/CNV Matching (2a) 
Medial missing POS P3b Boundary (3) 
Final missing NEG N400/CNV  
Heim & Alter, 2006 (German) Comprehension Auditory; isolated sentences with even Accent NEG EN  
Superfluous POS   
Missing POS   
Toepel et al., 2007 (German) Prosodic Auditory; dialogues with contrastive focus Superfluous POS CPS Time-locking (1a, c) 
Missing Visual NEG-POS N400-CPS Matching (2a, b) 
 
SemanticsProsody Mismatch 
Wang, Hagoort, & Yang, 2009 (Chinese) Comprehension Reading; dialogues with semantically in/appropriate focus/nonfocus Focus inappropriate vs. appropriate NEG N400 Time-locking (1d) 
Nonfocus inappropriate vs. appropriate None (NEG) (Very reduced N400) Boundary position unclear 
Appropriate nonfocus (vs. focus) NEG Larger N400  
Wang et al., 2011 (Dutch) None reported Auditory; dialogues with prosodic/semantic mismatch Missing None Time-locking (1d) 
Superfluous NEG N400 Matching (2b) 
Focus accent > no accent NEG N400  
Nonfocus accent = no accent None   
Sem. incongruent NEG N400  
 
Prosodic Boundaries 
Kerkhofs, Vonk, Schriefers, & Chwilla, 2007 (Dutch) None Auditory; dialogues, with prosodic/syntactic mismatch Mismatch prosodic/syntactic break POS CPS Time-locking (1e) 
Match prosodic/syntactic break POS (right) CPS Matching (2a, b) 
Bögels, Schriefers, Vonk, Chwilla, & Kerkhofs, 2009 (Dutch) Comprehension Auditory; prosodic breaks in single sentences Prosodic break POS CPS (larger with object verbs) Time-locking (1f) 
Boundary (3) 
Li et al., 2010 (Chinese) Comprehension Auditory; dialogues with prosodic/syntactic mismatch Missing prosodic boundary NEG LAN Time-locking (1d) 
Superfluous prosodic boundary NEG LAN + N400 Matching (2a) 
Prosodic boundary POS P2 (fronto-central) Boundary (3) 

NEG = negative shift in ERPs; POS = positive shift in ERPs; EN = expectancy negativity; LAN = left anterior negativity; 1a = time-locking to sentence onset; 1b = time-locking to focus accent onset; 1c = time-locking to verb onset; 1d = time-locking to target onset; 1e = time-locking to offset of word before boundary; 1f = time-locking to onset of stressed syllable before break; 2a = targets not matched for frequency; 2b = targets not matched for lexical stress position; 3 = targets at phrase boundary.

In contrast, it is less clear which electrophysiological components underlie the interaction of focus and accentuation. Because focus accent is a potential guide to important information during language comprehension, a number of recent studies have used ERPs to investigate how this information is processed, in particular the response to missing focus accents and superfluous background accents. An extensive list of neural components (responses with different latencies, polarities, and scalp distributions) has been reported in the literature, interpreted as reflecting various neural processes (see Table 1). We believe this variation is present because of the large differences in materials, methods, and experimental designs used in previous studies, rather than a large variability in the way focus accent is processed. Most important in our eyes, in all but two previous ERP studies, participants had to explicitly judge the prosodic well-formedness of the stimuli.

First let us examine the variability among studies. On the one hand, some studies (Bögels, Schriefers, Vonk, & Chwilla, 2011; Heim & Alter, 2006; Magne et al., 2005; Hruska & Alter, 2004; Toepel & Alter, 2004) found effects with a negative polarity. These have frequently been interpreted as evidence for difficulty in semantic processing because of a mismatch with the context, producing an effect similar to the N400, an increased negativity seen over central and parietal electrodes in response to words that do not fit semantically (Kutas & Hillyard, 1980). The second part of Table 1 lists studies that specifically address the relationship between semantic processing and focus, indicated by either syntactic structure (e.g., clefts) or by intonation.

Alternatively, as suggested by Magne et al. (2005), the negativity could be interpreted as a task-related effect such as the “contingent negative variation” (CNV), a negativity that reflects the cognitive preparation for an upcoming stimulus to which the participant must react (Rugg & Coles, 1996; Walter, Cooper, Aldridge, McCallum, & Winter, 1964). The fact that negativities have been found in studies in which an explicit judgment has been used makes this a plausible alternative to the N400 and leads to a completely different view of why the negativity occurs. Unfortunately, it is difficult to tell the two effects apart. The CNV has approximately the same scalp distribution as the N400; it can be more prolonged in duration and has an onset latency varying between 260 and 470 msec (Folmer, Billings, Diedesch-Rouse, Gallun, & Lew, 2011). If the negativity disappears when no explicit judgment task is carried out, that would suggest that the explicit task contributes to the effect and that the reported negativity should be considered a CNV rather than an N400.

A number of positivities have also been reported either instead of or in addition to negativities. Their interpretation has also varied widely, but most often involving reference to the CPS or P600 components. Several studies (Toepel, Pannekamp, & Alter, 2007; Hruska & Alter, 2004; Toepel & Alter, 2004) have attributed positivities elicited by focus elements to the CPS component, a positivity found in response to prosodic parsing (Pannekamp, Toepel, Alter, Hahne, & Friederici, 2005; Steinhauer & Friederici, 2001; Steinhauer et al., 1999), which they then reinterpreted as a marker of information segmentation at focus positions. Because focus elements in these studies often occurred at phrase boundaries that give rise to prosodic parsing, the exact underlying source of the CPS remains ambiguous.

If the positivity reflects information segmentation rather than prosodic parsing, it ought not to vary across sentence position. The prosodic parsing account suggests that the CPS will only occur at clear prosodic boundaries. Existing attempts to disentangle whether the positivity depends on the sentence position of focus accents (Magne et al., 2005) have not supported either of the two views of the CPS. Although Magne et al. report distinct effects for prosodic mismatches in medial sentence position (which they interpreted as a P300) and in final sentence position (N400), no evidence for a CPS-like positivity was found. It is thus important to further investigate how these positivities are correlated with focus and boundary processing as well as to carry out an experiment in which focus accent does not occur at a prosodic boundary.

Unlike the CPS positivity that occurs irrespective of prosodic congruity, other positivities have been found in response to the incongruity of focus accents in context (Schumacher & Baumann, 2010). Such positivities may have a distinct neural source related to the processing of prosodic incongruity and can be interpreted as belonging to the P600 family, positivities that are found when language processing becomes effortful or reanalysis or repair is necessary (Brouwer, Fitz, & Hoeks, 2012; Burkhardt, 2007; Hoeks, Stowe, & Doedens, 2004; Kaan, Harris, Gibson, & Holcomb, 2000; Hagoort, Brown, & Groothusen, 1993; Osterhout & Holcomb, 1992). The functional interpretation of these positivities must take into account whether they reflect the processing of focus accents per se or are rather elicited by prosodic incongruity.

Finally, the positivity seen in a number of studies may actually be related to the P300 component, as suggested by Magne et al. (2005). The P3 is a broadly distributed positive deflection that is seen in response to novel or unexpected stimuli primarily when participants are instructed to pay attention to the stimuli. The P3 has been divided into two parts: the P3a, which is thought to be evoked by the identification of task-related novel events, and the P3b, which is generally linked to task-related decision processes (Picton, 1992; Donchin & Coles, 1988). Examining prosody processing when no secondary task is included should shed light on the extent to which novelty and decision-related processes can account for the positivity reported in these studies.

Biphasic responses have also been reported, in the form of a negativity followed by a positivity. These have been generally interpreted as an N400, followed by either a CPS or a P600. Interpreting biphasic responses is difficult because of the same issues already discussed above for negativities and positivities taken alone. For instance, the CNV is often followed by a positive component called the CNV-Resolution, which is claimed to reflect executive functions that re-establish a cognitive equilibrium such as set-shifting or resetting motor programs (Jackson, Jackson, & Roberts, 1999). Thus, the negativity may reflect expectation violation, and the positivity the resolution of the decision process. All in all, it is difficult to establish whether the findings of previous studies reflect the natural processing of prosody in context.

We have already mentioned that the choice of task may play a role in the variety of responses reported in the literature. The task-related nature of the CNV and P300 emphasize this possibility. In fact, there is evidence that changing the metalinguistic task modifies the neural response to linguistic prosody. Toepel and Alter (2004) showed that neutral accents in a contrastive context (some sort of an underspecified, e.g., missing accent) did not affect processing relative to contrastive accents when participants performed a comprehension task focusing on content but led to a significant biphasic (negative–positive) ERP pattern when listeners performed a prosodic judgment task. For contrastive accents in a neutral context (some sort of an overspecified, e.g., superfluous accent), a negativity was seen for the comprehension task as opposed to a late positivity for the prosodic judgment task. This pattern of no negative effect for a superfluous accent accompanied by a clear negativity for a missing accent has been reported a number of times in the literature in studies using a prosodic judgment task; one goal of the current experiment is to see whether less prosodic marking of focus (missing accent) indeed corresponds to more processing effort when no explicit prosodic judgment task is employed.

The Present Study

The goal of this study is to investigate whether listeners are sensitive to the appropriateness of prosody in the discourse and whether they process missing and superfluous accents in the same way when no prosodic judgment task is employed. Using a strictly controlled naturalistic paradigm, the study focuses on the interaction of prosody and the information structure provided by the linguistic context in short dialogues in Dutch (for materials, see Table 2). In one version, the context question sets up a contrast set on the direct object; the resolution of this choice is given in the answer where the direct object is in focus, whereas in the second version, the question context includes a contrast on the prepositional object, and the direct object in the answer is background information instead. The intonation pattern of the answer is either congruent or incongruent with the context-dependent foregrounding.

Table 2. 

Experimental Materials




Accent
Direct ObjectPrepositional Object
Focus Direct Object Question 1 1a (congruous) 1b (incongruous) 
Did the club give a bonus or a fine to the player? They gave a BONUS to the player. They gave a bonus to the PLAYER. 
Heeft de club een premie of een boete aan de speler gegeven? Ze hebben een PREMIEaan de speler gegeven. Ze hebben een premieaan de SPELER gegeven. 
Prepositional Object Question 2 2b (incongruous) 2a (congruous) 
Did the club give a bonus to the player or to the trainer? They gave a BONUS to the playerThey gave a bonus to the PLAYER
Heeft de club een premie aan de speler of aan de trainer gegeven? Ze hebben een PREMIE aan de spelergegeven. Ze hebben een premie aan de SPELERgegeven. 



Accent
Direct ObjectPrepositional Object
Focus Direct Object Question 1 1a (congruous) 1b (incongruous) 
Did the club give a bonus or a fine to the player? They gave a BONUS to the player. They gave a bonus to the PLAYER. 
Heeft de club een premie of een boete aan de speler gegeven? Ze hebben een PREMIEaan de speler gegeven. Ze hebben een premieaan de SPELER gegeven. 
Prepositional Object Question 2 2b (incongruous) 2a (congruous) 
Did the club give a bonus to the player or to the trainer? They gave a BONUS to the playerThey gave a bonus to the PLAYER
Heeft de club een premie aan de speler of aan de trainer gegeven? Ze hebben een PREMIE aan de spelergegeven. Ze hebben een premie aan de SPELERgegeven. 

Questions introduced a contrastive focus on the direct object (Question 1) or on the prepositional object (Question 2). Answers had congruous accentuation (1a, 2a) or incongruous accentuation (1b, 2b). Incongruous answers always included a missing accent (1b: “bonus”) and a superfluous accent (1b: “PLAYER”). Accented elements are displayed in capitals, focus elements in bold; original Dutch stimuli are displayed in italics. The linear order of the contrastive elements in the question (e.g., “bonus” and “fine”) was counterbalanced across trials.

To avoid interference from task-related effects that may arise because of the judgment of prosodic congruity, participants performed a comprehension task on a limited number of trials that aimed to guarantee overall attention to the semantic coherence of the dialogues. Special care was taken to control for the following factors known to affect ERP responses: sentences were matched for length (in words), syntactic structure, target lemma frequency, target plausibility, and target expectedness. The last two factors were of special interest: plausibility and expectedness (see Section 2 of Table 1), because it has been shown that they interact with focus accent (Wang et al., 2011) and affect the amplitude of the N400 component more generally. All target nouns had lexically stressed initial syllables with long vowels, which reduced variance in word and accent identification points and allowed us to measure accent processing exclusively without any lexical stress variation (Ladd, Mennen, & Schepman, 2000). ERPs were time-locked to the acoustic onset of each target word rather than to the sentence onset, which would lead to jitter that could mask effects which are relatively short-lasting. Because congruous and incongruous sentences were identical (see Table 2), the baseline should not be an issue. Most importantly and in contrast to previous studies, special care was taken to place targets away from intonational phrase boundaries by placing the finite verb at the end of the sentence and making sure that no prosodic breaks were present at or close to the onset and offset of targets, as these may elicit a CPS (Steinhauer et al., 1999). We believe that by having taken these measures, our study provides an uncluttered view on the neural substrate underlying prosody processing in context.

In line with earlier findings in the literature showing a shallow processing of unaccented and backgrounded information (Li & Ren, 2012; Wang et al., 2011), it is hypothesized that in normal processing, missing accents will not be more noticeable than superfluous accents. On the basis of the known function of focus accent, which is to draw attention to important information, we expect the semantic content of the accented lexical item to be attended and the presence of an incongruous accent on background information to be noted. It is possible that missing accents will be responded to in the same way, but if there is a difference in processing, superfluous accents should require more processing resources than missing accents.

Participants

Twenty-nine right-handed Dutch native speakers (13 men, age = 18–29 years, mean = 21 years) with normal or corrected-to-normal vision and without any reported neurological, psychiatric, hearing, or language impairments were paid for participating. Participants signed a written informed consent in accordance with the Declaration of Helsinki. An additional six participants (two men) were not included because they did not meet predefined inclusion criteria (a minimum of 60% artifact free trials for any electrode used in the analysis in any condition). On average, the analysis was performed on 85% valid data over all conditions.

Stimuli

Stimulus construction started with 120 dialogue items (a question followed by an answer) in Dutch, the language used in this study, as illustrated in Table 2. Each question contained a contrastive set with a target noun (selected in the answer; bonus) and a nontarget noun (not selected in the answer; fine). To avoid variability in word identification points, both words had a syllable-initial lexical stress and equivalent average lemma frequencies (taken from the CELEX corpus; Baayen, Piepenbrock, & Van Rijn, 1993). Across conditions, contrast sets in the questions referred either to the direct object (“bonus or fine”) or to the prepositional object (“to the player or to the trainer”); the resolution of the choice represented the focus in the answer. Two further versions of each question were created, in which the linear order of the two contrasted items (e.g., “… bonus or fine …”) was reversed (e.g., “… fine or bonus…”); these versions were counterbalanced to avoid effects of linear presentation. Questions were followed by answers that were either prosodically congruous with a focus accent on one of the contrasted elements (answers (a) in Table 2) or prosodically incongruous with a focus accent on a backgrounded element from the question (answers (b) in Table 2). None of the answers contained semantically inappropriate information. Of interest in these answers were the direct object (i.e., bonus) and the prepositional object (i.e., player).

The plausibility of all target words was tested in Off-line Study 1 (“plausibility study”) with 96 non-Linguistics students who did not participate in the ERP experiment. Participants rated how plausible each target (bonus) and nontarget (fine) was to serve as an answer to the question (on a scale of 1 = very poor fit to 7 = very good fit). We also measured target expectedness by having participants indicate which word of the contrast pair (bonus or fine) they would select as the best answer to the question. On the basis of the results presented in Table 3, the 120 dialogue items were assigned to four item-groups with 30 dialogues each. Target-related factors did not differ significantly between conditions or across lists (all ps > .24).

Table 3. 

Stimulus Characteristics

Item Group
Direct Object
Prepositional Object
Sentence
Frequency
Plausibility
Expectancy
Frequency
Plausibility
Expectancy
Words
Group 1 1.1 5.1 0.45 1.2 5.2 0.49 7.9 
Group 2 1.1 5.3 0.48 1.2 5.1 0.44 7.9 
Group 3 1.0 5.1 0.42 1.1 5.1 0.44 8.2 
Group 4 1.1 5.3 0.45 1.3 5.3 0.48 8.0 
Item Group
Direct Object
Prepositional Object
Sentence
Frequency
Plausibility
Expectancy
Frequency
Plausibility
Expectancy
Words
Group 1 1.1 5.1 0.45 1.2 5.2 0.49 7.9 
Group 2 1.1 5.3 0.48 1.2 5.1 0.44 7.9 
Group 3 1.0 5.1 0.42 1.1 5.1 0.44 8.2 
Group 4 1.1 5.3 0.45 1.3 5.3 0.48 8.0 

Characteristics of the four experimental item groups (Groups 1–4). Item group refers to group of items that all occur in the same condition across lists. Frequency indicates lemma frequency in the CELEX lexical database (in number of occurrences per million). Plausibility is measured on a scale from 1 to 7 (1 = very bad fit, 7 = very good fit); scores are the results from Off-line Study 1 (plausibility study, n = 96; see Methods). Expectancy refers to the proportion of participants who selected the target from the contrastive set as an answer to the question (0 = not selected, 1 = selected, a score of 0.5 indicates that both elements in the contrastive pair are equally likely to be selected, see Off-Line Study 1 in Methods). Average number of words per sentence is given under Words.

To investigate prosody processing in naturally elicited speech, experimental stimuli were recorded as dialogues between two phonetically naive speakers: a male speaker produced the questions and a female speaker produced the answers. The speakers recorded clearly accented dialogues as a unit, speaking at a natural speech rate (6.4 syllables/sec) without any excessive emphasis. None of the stimuli contained any disfluencies or phrase boundaries; in fact, all utterances were produced as a single intonational phrase (Gussenhoven, 2005). To prevent unintended intonational differences between conditions, only congruous dialogues were recorded. Incongruous dialogues were generated on the basis of these congruous conditions by recombining questions and answers.

A total of 960 dialogues (120 dialogues × 2 linear orders × 2 question types × 2 answer types) were assigned to eight lists. Each participant was presented with one list of 120 dialogues (30 items × 4 conditions) using the Latin square format. None of the participants listened to more than one version of each sentence, and every participant listened to the experimental stimuli in a pseudorandomized order excluding more than two consecutive presentations of the same condition. In each list, half of the dialogues had focus on the direct object (n = 60) and the other half had focus on the prepositional object (n = 60). In each focus condition, half of the answers were prosodically congruous (focus was accented, n = 30), whereas the other half were prosodically incongruous (background was accented, n = 30). ERP processing differences cannot be attributed to differences in the acoustic characteristics of the stimuli, because the congruous and incongruous conditions were physically identical sentences (1a = 2b, 1b = 2a). All stimuli were normalized in loudness and analyzed acoustically.

An additional Off-line Study 2 (“prosodic congruity study”) was created with a subset of the stimuli to test whether mismatch conditions can be discriminated correctly. Seventeen Linguistics students that did not participate in the ERP study or Off-line Study 1 listened to a subset of dialogues taken from all conditions and indicated whether the question and the answer of a dialogue matched (scale of 1 = very poor fit to 7 = very good fit). No instruction was given with respect to prosodic well-formedness. A repeated-measures ANOVA with Accented Element (direct object vs. prepositional object) and Congruity (congruous vs. incongruous accent) as within-participants factors showed a highly significant main effect of Congruity, F(1, 16) = 245.6; p < .001, indicating that listeners were able to discriminate between congruous and incongruous prosody (average scores are given in Figure 1). No other effects were significant (all ps > .18).

Figure 1. 

Results of Off-line Study 2 (prosodic congruity study). Off-line Study 2 tested whether listeners are able to differentiate between congruous and incongruous conditions in recorded dialogues. Participants indicated the overall match of question and answer on a scale from 1 (= very poor fit) to 7 (= very good fit) without any explicit instruction to attend to prosody.

Figure 1. 

Results of Off-line Study 2 (prosodic congruity study). Off-line Study 2 tested whether listeners are able to differentiate between congruous and incongruous conditions in recorded dialogues. Participants indicated the overall match of question and answer on a scale from 1 (= very poor fit) to 7 (= very good fit) without any explicit instruction to attend to prosody.

Close modal

Acoustic Analysis

Acoustic measures were performed using the software package Praat (Boersma & Weenink, 2010) and are displayed in Figure 2A and B. Accented direct objects and prepositional objects had a longer acoustic duration and a higher fundamental frequency (f0) relative to unaccented ones. Segmental lengthening under accentuation was larger for direct objects (86 msec) than for prepositional objects (36 msec). Accentuation also affected pitch excursion (difference between f0 max and f0 min), which was higher for accented elements (80 Hz) than for unaccented elements (28 Hz; Figure 2B).

Figure 2. 

(A) Acoustic duration of target sentences. Average acoustic duration of segments in sentences with accented direct objects (1) and accented prepositional objects (2; in msec, standard deviation in brackets) and duration of accented (black bars) versus unaccented elements (gray bars). Abbreviations: start = interval from sentence onset until direct object onset; DO = duration of direct object; PO-DO = interval from direct object offset until prepositional object onset; PO = duration of prepositional object; end = interval from prepositional object offset until sentence offset. (B) Fundamental frequency and pitch excursion of target stimuli. The figure displays targets' absolute fundamental frequency values (f0, in Hz) and pitch excursion (difference between maximal and minimal f0) for accented (black) and unaccented (gray) direct and prepositional objects.

Figure 2. 

(A) Acoustic duration of target sentences. Average acoustic duration of segments in sentences with accented direct objects (1) and accented prepositional objects (2; in msec, standard deviation in brackets) and duration of accented (black bars) versus unaccented elements (gray bars). Abbreviations: start = interval from sentence onset until direct object onset; DO = duration of direct object; PO-DO = interval from direct object offset until prepositional object onset; PO = duration of prepositional object; end = interval from prepositional object offset until sentence offset. (B) Fundamental frequency and pitch excursion of target stimuli. The figure displays targets' absolute fundamental frequency values (f0, in Hz) and pitch excursion (difference between maximal and minimal f0) for accented (black) and unaccented (gray) direct and prepositional objects.

Close modal

The f0 contours of experimental stimuli were transcribed according to the transcription of Dutch intonation convention (Gussenhoven, 2005). Focus accents on direct objects (Figure 3A) and on prepositional objects (Figure 3B) showed the typical falling pitch contour for Dutch focus accents. In the transcription of Dutch intonation convention, the contour is transcribed as an H*L accent where the letters indicate the direction of pitch movement in the accented syllable, here a falling movement from H (high) to L (low) pitch, whereas the star denotes the pitch of the tone target in the accented syllable, here H (high). Figure 3 shows that the signal did not contain any disruptions of the f0 such as silent pauses or phrase tones in the vicinity of targets that would indicate a phrase boundary.

Figure 3. 

(A) Plot of all pitch contours of accented direct objects. Black vertical lines indicate the onset (dotted line) and offset (solid line) of direct objects, and gray lines display the onset (dotted line) and offset (solid line) of prepositional objects. The small arrows close to the 400-Hz line display the standard deviation of onset and offset times for direct objects. (B) Plot of all pitch contours of accented prepositional objects. Gray vertical lines indicate the onset (dotted line) and offset (solid line) of prepositional objects, and black lines display the onset (dotted line) and offset (solid line) of direct objects. The small arrows close to the 400 Hz line display the standard deviation of onset and offset times for prepositional objects.

Figure 3. 

(A) Plot of all pitch contours of accented direct objects. Black vertical lines indicate the onset (dotted line) and offset (solid line) of direct objects, and gray lines display the onset (dotted line) and offset (solid line) of prepositional objects. The small arrows close to the 400-Hz line display the standard deviation of onset and offset times for direct objects. (B) Plot of all pitch contours of accented prepositional objects. Gray vertical lines indicate the onset (dotted line) and offset (solid line) of prepositional objects, and black lines display the onset (dotted line) and offset (solid line) of direct objects. The small arrows close to the 400 Hz line display the standard deviation of onset and offset times for prepositional objects.

Close modal

EEG Procedure and Recordings

After electrode application, participants were seated in front of a computer screen in an electrically shielded room and completed a practice session before the actual experiment. Stimuli were presented auditorily via loudspeakers and were divided in two blocks of 60 dialogues (approximate block duration was 12 min). To minimize eye movement artifacts, participants fixated a black cross against a gray background, which appeared 100 msec before stimulus presentation and remained there until the end of the dialogue. In each trial, a question was presented (average duration = 2000 msec), followed by silence (500 msec), an answer (average duration = 2000 msec), and silence again (1200 msec). To encourage attentive processing, participants performed a comprehension task on 25% of all trials and indicated whether a probe word presented on the screen was semantically related to the preceding dialogue. Correct and incorrect responses were counterbalanced. After the response (or after the last silence period in trials without the comprehension task), four stars appeared on the screen (duration = 2000 msec) to indicate that participants had the opportunity to blink.

The EEG was recorded at 250 Hz using a 64-channel cap with Ag/AgCl electrodes, placed according to the international extended 10–20 system (Electro Cap International, Eaton, OH). All channels were amplified against the average of all connected inputs of the amplifier (TMS International, Enschede, The Netherlands). The amplifier measured DC without a high-pass filter but with a digital finite impulse response filter (cutoff frequency of 67.5 Hz) to avoid aliasing effects. After recording, electrodes were re-referenced to the algebraic average of left and right mastoid electrodes. Vertical eye movements and blinks were monitored via electrodes below and above the left eye, and horizontal movements from electrodes at the left and right canthus of each eye. Impedances were kept below 5Ω. All data were filtered off-line with a band-pass filter of 0.01–30 Hz.

EEG Analysis

Trials containing movement artifacts, ocular artifacts, or electrode drifts (determined by a ±75 μV voltage maximum) were rejected. Only participants with at least 60% valid data in all conditions for any electrode used in the analyses were included (n = 29). On average, EEG analysis was performed on 85% data per condition (SD = 24%). Number of rejected trials did not differ between conditions. ERPs were time-locked to the acoustic onset of each target word that was identical to the onset of its accented syllable.

ERP differences were identified in three time windows post target onset: Early time window (100–220 msec), N400 time window (300–500 msec), and late P600 time window (700–1000 msec). Average ERPs for a number of ROIs were computed as the average over several electrodes. Lateral ROIs included left anterior (FP1, AF3, AF7, F3, F5, F7), right anterior (FP2, AF4, AF8, F4, F6, F8), left central (FC3, FC5, C3, C5, CP3, CP5), right central (FC4, FC6, C4, C6, CP4, CP6), left posterior (P3, P5, P7, PO3, PO7, O1), and right posterior (P4, P6, P8, PO4, PO8, O2). Midline ROIs included anterior (FPz, AFz, Fz), central (FCz, Cz, CPz), and posterior (Pz, POz, Pz).

Repeated-measures ANOVAs were conducted separately for lateral and midline ROIs. ANOVAs for lateral electrodes were calculated with four within-subject factors: Accent (accented element vs. unaccented element), Congruity (contextually congruous accent vs. contextually incongruous accent), Anteriority (anterior vs. central vs. posterior regions), and Hemisphere (left hemisphere vs. right hemisphere). ANOVAs for midline ROIs included all factors except Hemisphere. ANOVAs were performed on mean voltage values and adjusted with the Huynh–Feldt correction for nonsphericity where appropriate. For direct objects, a 200-msec prestimulus baseline correction was calculated for segments with a duration of 1300 msec. For prepositional objects, a 100-msec within-stimulus baseline was chosen because processing differences were expected to have arisen after the perception of mismatches on the direct object (for similar reasoning and procedure, see, e.g., Mueller, 2009; Philips, Kazanina, & Abada, 2005).

Behavioral Results

Participants judged the semantic relatedness of a probe word to the preceding dialogue in 25% of all trials. Participants were attentive and comprehended dialogues successfully (average accuracy of 87% correct). Task performance was not affected by prosodic congruity.

ERP Results for Direct Objects

ERP analyses concentrate on the direct object, whereas data for prepositional objects are regarded as exploratory: half of the time, prepositional objects were preceded by a direct object in a mismatch condition, which will have contaminated their processing (cf. Figure 2A for the average position of both elements in the sentence). Effects involving scalp distribution will be reported only if modified by the cognitive factors.

Statistical results are presented in Table 4, and ERP waveforms for all conditions are displayed in Figure 4. Marginally significant main effects or interactions (.05 ≤ p ≤ .10) will be reported in footnotes for future reference but will not be followed up or interpreted.

Table 4. 

Statistical Results for Direct Objects

Direct Object
df
100–220 msec
300–500 msec
700–1000 msec
F
p
F
p
F
p
Lateral 
ACC 1, 28   6.207 .019   
ACC × CONG 1, 28   3.388 .076   
CONG × ANT 2, 56   3.513 .058 9.292 .002 
ACC × HEM 1, 28 3.832 .06   3.569 .069 
CONG × HEM 1, 28   7.466 .011   
ACC × CONG × HEM 1, 28 3.256 .082 4.416 .045   
ACC × CONG × ANT × HEM 2, 56 3.223 .048     
 
Midline 
ACC 1, 28   9.726 .004 3.677 .065 
CONG × ANT 2, 56     3.378 .043 
Direct Object
df
100–220 msec
300–500 msec
700–1000 msec
F
p
F
p
F
p
Lateral 
ACC 1, 28   6.207 .019   
ACC × CONG 1, 28   3.388 .076   
CONG × ANT 2, 56   3.513 .058 9.292 .002 
ACC × HEM 1, 28 3.832 .06   3.569 .069 
CONG × HEM 1, 28   7.466 .011   
ACC × CONG × HEM 1, 28 3.256 .082 4.416 .045   
ACC × CONG × ANT × HEM 2, 56 3.223 .048     
 
Midline 
ACC 1, 28   9.726 .004 3.677 .065 
CONG × ANT 2, 56     3.378 .043 

F values with p ≥ .1 are not included; marginal effects with .05 ≤ p < .10 are included for future reference. ACC = Accent; CONG = Congruity; ANT = Anteriority; HEM = Hemisphere.

Figure 4. 

ERP waveforms for direct objects. ERPs are time-locked to the onset of the direct object with a prestimulus baseline of −200 to 0 msec and show waveforms to accented (black) and unaccented (gray) direct objects. Solid lines represent congruous accentuation, and dotted lines represent incongruous accentuation.

Figure 4. 

ERP waveforms for direct objects. ERPs are time-locked to the onset of the direct object with a prestimulus baseline of −200 to 0 msec and show waveforms to accented (black) and unaccented (gray) direct objects. Solid lines represent congruous accentuation, and dotted lines represent incongruous accentuation.

Close modal

Early Time Window 100–220 msec

A four-way interaction of Accent × Congruity × Anteriority × Hemisphere, F(2, 56) = 3.223, p < .05, was found. Follow-up analyses with Anteriority as the split variable revealed an Accent × Congruity × Hemisphere interaction that was significant on posterior, F(1, 28) = 6.935, p < .05, marginal on central, F(1, 28) = 3.817, p = .06, and not significant on anterior regions, F(1, 28) = .075, p = .79. Following up on the posterior interactions with Accent as the split variable, the interaction of Congruity × Hemisphere was not significant for accented direct objects, F(1, 28) = 2.54, p = .12, or for unaccented direct objects, F(1, 28) = 2.659, p = .11. The mean voltage data suggest that the Congruity × Hemisphere interaction is triggered by left-sided posterior positivities for incongruously accented direct objects relative to congruous ones, probably coupled with a greater positivity for incongruously accented elements on the right. When looking at the Congruity effect on posterior regions separately for accented and unaccented direct objects, we found that it is present only for accented direct objects, F(1, 28) = 3.492, p = .07, because of a positivity for incongruously accented elements (superfluous accents) relative to congruously accented elements. No Congruity effect was found for incongruously unaccented direct objects (missing accents; F(1, 28) = .725, p = .4).

N400 Time Window 300–500 msec

There was a main effect of Accent, F(1, 28) = 6.207, p < .05, showing that accented direct objects elicited positive waveforms relative to unaccented ones. There was a three-way interaction of Accent × Congruity × Hemisphere, F(1, 28) = 4.416, p < .05, showing a significant Congruity × Hemisphere interaction for accented elements, F(1, 28) = 11.807, p < .01, but not for unaccented elements, F(1, 28) = .319, p = .58. Post hoc tests on accented direct objects revealed a significant Congruity effect at right sites, F(1, 28) = 4.8, p < .05, but not at left sites, F(1, 28) = .190, p = .67. The mean voltage values in Figure 5 show that the Congruity effect was a right-lateralized negativity for incongruously accented elements (superfluous accents on background elements) as compared with congruously accented ones (focus accents). No such negative effect was elicited by incongruously unaccented elements (missing accents on focus elements; cf. Figure 6). For midline electrodes, there was a main effect of Accent, F(1, 28) = 9.726, p < .01, indicating that ERPs to accented direct objects were more positive than ERPs to unaccented ones. No other main effects or interactions were significant.

Figure 5. 

ERP waveforms for accented direct objects: superfluous versus congruous accents. Incongruously accented elements (superfluous accents, dotted line) elicited early left-lateralized positive effects at posterior sites (100–220 msec) and a right-lateralized centro-posterior negativity (N400, 300–700 msec) as compared with congruously accented elements (solid line). Between 700 and 1000 msec, incongruously accented elements triggered a late posterior positivity (P600).

Figure 5. 

ERP waveforms for accented direct objects: superfluous versus congruous accents. Incongruously accented elements (superfluous accents, dotted line) elicited early left-lateralized positive effects at posterior sites (100–220 msec) and a right-lateralized centro-posterior negativity (N400, 300–700 msec) as compared with congruously accented elements (solid line). Between 700 and 1000 msec, incongruously accented elements triggered a late posterior positivity (P600).

Close modal
Figure 6. 

ERP waveforms for unaccented direct objects: missing versus congruous accents. Incongruously unaccented elements (dotted line) elicited a late posterior positive effect (P600) with a latency of 700–1000 msec poststimulus onset as compared with congruously unaccented elements (solid line). No other effects were significant.

Figure 6. 

ERP waveforms for unaccented direct objects: missing versus congruous accents. Incongruously unaccented elements (dotted line) elicited a late posterior positive effect (P600) with a latency of 700–1000 msec poststimulus onset as compared with congruously unaccented elements (solid line). No other effects were significant.

Close modal

Late P600 Time Window 700–1000 msec

For lateral electrodes, the Congruity × Anteriority interaction was significant, F(2, 56) = 9.292, p < .01. Follow-up comparisons suggest that this is because of a marginal effect of Congruity (incongruous more positive than congruous) on posterior regions, F(1, 28) = 3.842, p = .06, and the absence of such an effect on anterior and central regions (all p values > .1).1 On midline electrodes, there was also a significant interaction of Congruity × Anteriority, F(2, 56) = 3.378, p < .05; together these suggest that irrespective of whether the accent is missing or superfluous, direct objects with incongruous accentuation were more positive than congruous direct objects, but only at posterior sites. No other effects were significant.

ERP Results for Prepositional Objects

As mentioned above, the analysis of prepositional objects has an exploratory character because the ERPs to the prepositional object will be affected by the processing of the preceding violation on the direct object. Statistical results are presented in Table 5, and ERP waveforms for all four conditions are displayed in Figure 7.

Table 5. 

Statistical Results for Prepositional Objects

Prepositional Object
df
100–220 msec
300–500 msec
700–1000 msec
F
p
F
p
F
p
Lateral 
ACC 1, 28 4.027 .055 9.732 .004 4.391 .045 
ACC × ANT 2, 56   7.281 .007 19.591 .000 
CONG × ANT 2, 56 4.158 .04 9.942 .002 6.014 .013 
CONG × HEM 1, 28 5.420 .027   3.801 .061 
ACC × ANT × HEM 2, 56   7.195 .002   
ACC × CONG × ANT × HEM 2, 56 3.315  3.361 .049   
 
Midline 
ACC 1, 28 6.314 .018 14.924 .001 6.714 .015 
ACC × ANT 2, 56   5.214 .017 13.308 .001 
CONG × ANT 2, 56   6.789 .009 4.727 .021 
Prepositional Object
df
100–220 msec
300–500 msec
700–1000 msec
F
p
F
p
F
p
Lateral 
ACC 1, 28 4.027 .055 9.732 .004 4.391 .045 
ACC × ANT 2, 56   7.281 .007 19.591 .000 
CONG × ANT 2, 56 4.158 .04 9.942 .002 6.014 .013 
CONG × HEM 1, 28 5.420 .027   3.801 .061 
ACC × ANT × HEM 2, 56   7.195 .002   
ACC × CONG × ANT × HEM 2, 56 3.315  3.361 .049   
 
Midline 
ACC 1, 28 6.314 .018 14.924 .001 6.714 .015 
ACC × ANT 2, 56   5.214 .017 13.308 .001 
CONG × ANT 2, 56   6.789 .009 4.727 .021 

F values with p ≥ .1 are not included; marginal effects with .05 ≤ p < .10 are included for future reference. ACC = Accent; CONG = Congruity; ANT = Anteriority; HEM = Hemisphere.

Figure 7. 

ERP waveforms for prepositional objects. ERP waveforms are time locked to the onset of the prepositional object with a within-stimulus baseline of 0–100 msec poststimulus onset. The figure displays accented (black) and unaccented (gray) prepositional objects. Solid lines indicate congruous accentuation, and dotted lines indicate incongruous accentuation.

Figure 7. 

ERP waveforms for prepositional objects. ERP waveforms are time locked to the onset of the prepositional object with a within-stimulus baseline of 0–100 msec poststimulus onset. The figure displays accented (black) and unaccented (gray) prepositional objects. Solid lines indicate congruous accentuation, and dotted lines indicate incongruous accentuation.

Close modal

Early Time Window 100–220 msec

On lateral electrodes, there was a Congruity × Hemisphere interaction, F(1, 28) = 5.420, p < .05, but follow-ups did not reveal a statistically reliable Congruity effect (all ps > .42). According to the means, the interaction must have been due to a positivity for incongruous elements over the right hemisphere and a negativity over the left hemisphere. A Congruity × Anteriority interaction, F(2, 56) = 4.158, p < .05, did not show significant differences on any region (all ps > .18). The means suggest that the interaction came about by an anterior negativity and a posterior positivity for incongruous elements. On midline electrodes, there was a main effect of Accent, F(1, 28) = 6.314, p < .05, showing a positivity for accented prepositional objects relative to unaccented ones. No other effects reached significance.2

N400 Time Window 300–500 msec

The ERPs for prepositional objects did not show the negativity that we found in this time window for direct objects but predominantly rather large positive-going waves starting around 300 msec that were elicited for both accented and unaccented incongruous elements [Accent × Congruity × Anteriority × Hemisphere, F(2, 56) = 3.361, p < .05]. Follow-up analyses suggested that this interaction resulted from centro-posterior positivities associated with incongruity, for both accented and unaccented words, with the effects for accented words larger at right hemisphere sites. These positivities were accompanied by anterior negativities that were larger for unaccented words, most clearly so at left frontal sites. For midline electrodes, accented prepositional objects elicited a positivity relative to unaccented ones that was reflected in a main effect of Accent, F(1, 28) = 14.924, p = .001. In addition, both accented [Accent × Anteriority, F(2, 56) = 5.214, p < .05] and incongruous prepositional objects [Congruity × Anteriority, F(2, 56) = 6.789, p < .01] triggered centro-posterior positivities.

Late P600 Time Window 700–1000 msec

Accented prepositional objects were more positive than unaccented ones because of a main effect of Accent, F(1, 28) = 4.391, p < .05. The positivity for accented elements had a centro-posterior distribution [Accent × Anteriority interaction, F(2, 56) = 19.591, p < .001; Accent effect at central sites, F(1, 28) = 4.237, p < .05, and at posterior sites, F(1, 28) = 35.282, p < .001]. Incongruous prepositional objects elicited posterior positivities relative to congruous ones [Congruity × Anteriority, F(2, 56) = 6.014, p < .05] that was due to a Congruity effect at posterior sites, F(1, 28) = 5.091, p < .05. On midline electrodes, accented prepositional objects were more positive than unaccented ones because of a main effect of Accent, F(1, 28) = 6.714, p < .05. The positivity was distributed at centro-posterior sites as revealed by an Accent × Anteriority interaction, F(2, 56) = 13.308, p = .001, with an Accent effect over central, F(1, 28) = 7.081, p < .05, and posterior regions, F(1, 28) = 30.403, p < .001.

Summary of Results

Accent (present vs. absent) and Congruity (match vs. mismatch) interacted with each other at the direct object in both the early and the N400 time windows, but not in the later time window. In the early time window (100–220 msec), Congruity had an effect primarily on accented words: direct objects with superfluous accents elicited early positivities on left posterior sites, relative to direct objects with congruous accentuation. In the N400 time window (300–500 msec), incongruent accents elicited a right-lateralized centro-posterior negativity. No such effect was obtained for incongruous unaccented words (missing accents). In the late P600 time window (700–1000 msec), there was no interaction and both types of incongruous prosody, that is, missing and superfluous accents, were more positive than congruous prosody.

The exploratory analyses for the prepositional objects showed posterior positivities for incongruent relative to congruent prosody in the N400 and in the late time window, similar to the positivity for incongruence elicited by the direct object. In general, the processing of accentuation was evident in a broadly distributed main effect of Accent (300–500 msec), showing a positivity for accented relative to unaccented direct objects. A positive Congruity effect was also apparent for accented prepositional objects but it started somewhat earlier (100–220 msec, 700–1000 msec). Because of the fact that the processing of the prepositional object is affected by the congruity of the direct object earlier in the sentence, the following discussion addresses only ERP patterns associated with the direct objects.

The current study investigated the processing of linguistic prosody in context, particularly whether superfluous accents on background information and missing accents on focus information evoke distinct neural mechanisms in a natural paradigm without a prosodic task. Earlier studies have shown effects that we conjectured might be because of the specific tasks used in those studies. Additional variability in the results (see Table 1) may have resulted from issues involving the presence of prosodic boundaries and the time-locking and matching of stimuli, which we controlled in the current study.

The neural correlate of accentuation was evident as a broadly distributed positivity for accented elements relative to unaccented elements which started around 300 msec post onset of the direct object and 100 msec post-onset of the prepositional object. The positivity is independent of whether accent is congruent with the context or not and replicates earlier reports (Wang et al., 2011; Heim & Alter, 2006) of a positivity associated with the occurrence of a pitch accent. The effect can be interpreted as belonging to the P200 component for the processing of physical characteristics of accented elements or to the P300 for the attentive processing of accented elements (Heim & Alter, 2006). In our view, this effect is best described as an “accent positivity,” which consists of a sensory aspect related to the processing of acoustic features as well as of a cognitive aspect, which implies the attentive processing of prosodic prominence. The accent positivity is independent of information structure and contextual congruity and suggests that focus accent is processed in a bottom–up manner.

Possibly because of our strict time-locking and stimulus selection procedures, we were also able to discover very early effects (around 100 msec after target onset) for incongruous prosody, even in the absence of explicit instructions to attend to prosodic aspects of the stimuli. This early congruity effect (incongruous more positive than congruous) is present for accented words, but absent for unaccented words and likely reflects top–down processing of focus accent based on contextual information. Further evidence for a more elaborate processing of superfluous accents than missing ones was the negativity in the N400 time window that was triggered by superfluous accents but was absent for missing accents. This is not to say that missing accents on focused information went unrecognized. Both missing and superfluous accents triggered a late posterior positivity, resembling the P600 component. Thus, the neural response to prosodic congruity is qualitatively different between superfluous and missing accents early on, but very similar in the later P600 time window.

Our results are strikingly different from those reported in most previous studies, in which a prosodic judgment task was employed. These studies found that a missing accent leads to more processing difficulty than a superfluous one. However, our results are consistent with Toepel and Alter (2004), who also found evidence for a clear difference between the response to focus accent, depending on whether a prosodic judgment task was used or not. When no prosodic task was used, they found a broadly distributed negativity for superfluous accents (see also Wang et al., 2011). In our experiment, the superfluous accents also gave rise to a negativity. However, there were also important differences. Toepel and Alter did not find any other effect of incongruity for either superfluous or missing accents. In contrast, we found evidence for an increased processing cost in both mismatch conditions: For superfluous accents, there was an additional early positive effect (100–220 msec); for both types of incongruity there was a late posterior positivity (700–1000 msec). The use of stimuli which were time-locked to the onset of the accented syllable of targets might have been responsible for our divergent results, as well as the avoidance of phrase boundaries in the vicinity of targets, which allowed us to provide a clearer view on the neural correlates of processing focus accent per se.

Adding a Prosodic Judgment Task

The present results make it clear that specific, task-related neurocognitive mechanisms are active when a prosodic task is added (Table 1; current results; Toepel & Alter, 2004). In the Introduction, it was suggested that previous findings in studies employing a prosodic judgment task can be accounted for by what we know about task-related ERP components. Under a prosodic judgment task, a missing accent has most generally elicited a biphasic N400-P600 pattern (Hruska & Alter, 2004; Toepel & Alter, 2004; Hruska, Alter, Steinhauer, & Steube, 2001; but see Magne et al., 2005), whereas a superfluous accent gives rise to (i) no effect (Hruska et al., 2001), (ii) a negativity (however, only with a comprehension task, Toepel & Alter, 2004), or (iii) a late positivity (Hruska & Alter, 2004).

Missing and superfluous accents both represent violations of the normal alignment of prosody and information structure, namely the use of information in focus with an accent and background information without an accent. Despite this, there is an apparent asymmetry in neural processing between missing and superfluous accents, which becomes clear if one considers that their detection proceeds qualitatively differently, depending on the task at hand. With a prosodic judgment task, the listener may very well exploit the linguistic context, that is, the focus elements from the question, to predict the position of focus; thus, the detection of a missing accent where one is expected provides sufficient evidence for a prosodic mismatch decision. This may give rise to an effect like the CNV, which reflects the cognitive preparation for an upcoming stimulus to which the participant must react (cf. Magne et al., 2005). This negativity is often followed by a positive component called the CNV-Resolution, which is claimed to reflect executive functions that re-establish a cognitive equilibrium such as set-shifting or resetting motor programs (Jackson et al., 1999). Thus, the negativity often found for missing accents may reflect expectation violation, and the positivity could then index the resolution of the decision process: The participant becomes aware that the expected accent is indeed missing and that the stimulus is prosodically not well formed.

Processing a superfluous accent in the prosodic judgment task condition is different. There is no “warning” signal in the context that a critical stimulus is imminent and that a choice must be made at this particular point in the sentence. The superfluous accent is unexpected and most likely creates a surprise effect that might evoke a P300-like positivity for unexpected events (Picton, 1992; Donchin & Coles, 1988) rather than a CNV-like negativity for task-related expectation mismatch. In some cases, though, the superfluous accent may, for unknown reasons, escape detection altogether (Hruska et al., 2001). In summary, the findings in earlier studies seem to us to be artifacts of the added prosodic judgment task, obscuring the processes that are operational during “normal” speech processing.

Processing Prosody in Context without a Prosodic Task

Without a prosodic task that can modify the effects of incongruous prosody, we still find asymmetries in the processing of missing and superfluous accents. However, these seem to go in the opposite direction, with superfluous accents noticed earlier than missing ones. Superfluous accents give rise to an early positivity and an N400-like negativity, whereas no such effects are obtained for missing accents. This does not mean that the missing accent was “missed”: We did find a later positivity in response to both missing and superfluous accents. In addition, sentences with missing accents were clearly recognized as infelicitous in our Off-line Study 2. The exact nature of the late positivity is not completely clear. It resembles a P600, which has frequently been reported in cases where it is difficult to create a coherent representation for various reasons (Brouwer et al., 2012; Burkhardt, 2007; Hoeks et al., 2004; Kaan et al., 2000; Hagoort, et al., 1993; Osterhout & Holcomb, 1992). In line with previous accounts, we interpret the positivity for incongruous prosody as indicating effortful processing initiated to arrive at a coherent interpretation with respect to the preceding context; we will discuss this effect more extensively below.

Superfluous accents also gave rise to prominent early effects, triggering an early positive effect (100–220 msec post-onset). This early congruity effect has not been reported before, and we believe that it was because of our straightforward time-locking procedure and the extensive matching of experimental stimuli that we were able to detect it. The exact nature of this early positivity, however, is still a puzzle. It could be related to the P200 component evoked by changes in pitch direction (Friedrich, Kotz, Friederici, & Alter, 2004), but this seems unlikely as our positivity is triggered by accented elements (congruous focus accents vs. superfluous focus accents) in physically identical sentences, which differed only with respect to the preceding context. The positivity must thus be related to the incongruity with respect to the preceding discourse context. Exploration of the functional meaning of this early positive congruity effect must await further research.

Around 300 msec after the onset of a direct object with superfluous accent, a right-lateralized centro-posterior negativity was found, which resembles a standard N400 effect superimposed on the positive main effect of accentuation. Under the standard view of the N400 (Kutas & Hillyard, 1980), this negativity might reflect semantic integration demands caused by the interpretation of the prosodic mismatch and straightforwardly be interpreted as an N400 effect. That is, the superfluous accent may hinder the interpretation of background information as “given” and require its reinterpretation as “new” and in focus. Alternatively, Dutch focus accents can be used to indicate contrast when they occur in unexpected positions (Swerts, Krahmer, & Avesani, 2002). An unexpected accent might spur listeners to construct a contrastive interpretation for the element with a superfluous accent, and because contrast is not supported by the context, additional effort may be necessary. These effects are also very similar to effects seen in response to information structure mismatches such as in the repeated name penalty (LeDoux, Camblin, Swaab, & Gordon, 2006; Gordon, Grosz, & Gilliom, 1993). Using a reference form that is more prominent and elaborate than strictly required gives rise to an increase in N400. For instance, in a sentence such as “Pam washed the dishes while Pam talked about politics,” the second occurrence of Pam (underlined), in a position where a reduced form (e.g., she) is more appropriate, engenders a significantly larger N400 than in a control sentence. In a similar vein, in the current experiment the superfluous accent signals that the word contains important new information (e.g., Wilson & Wharton, 2006; Gussenhoven, 2005), which turns out not to be the case. Exactly how the N400 in response to superfluous accents should be interpreted is not completely clear: it is definitely a sign of additional difficulty as discussed above, but it may merely be a signal of an information structure mismatch or it may also reflect semantic activation or reprocessing. This should be looked into in future experiments.

P600 as Reanalysis of Prosodic Incongruity

As we have shown, late positivities were elicited by superfluous and missing accents in this study, which likely reflect the effortful processing of incongruous prosody aimed at salvaging an ill-formed utterance, as listeners try to make sense of what the speaker just communicated. Previous studies on prosody processing have attributed similar late positivities to the CPS (see Table 1) that is implicated in the processing of prosodic boundaries (Steinhauer & Friederici, 2001) or in the information segmentation at focus positions in context (Hruska & Alter, 2004; Toepel & Alter, 2004).

Unlike these CPS positivities, the late positivity in this study is clearly related to the processing of a mismatch between prosody and context and is therefore analyzed as a P600 effect. Late positivities in the current study cannot be straightforwardly interpreted as being effects of closure positivity as our stimuli were strictly controlled to avoid confounds with boundary-induced effects. As shown in Figure 3A and B, none of the experimental conditions contained any silent pauses, breaks, or pitch changes in the signal in the vicinity of targets which could have been confounded with a CPS response for prosodic parsing. Moreover, the positivities do not exclusively occur at focus positions but are elicited by both focus and background elements with incongruous prosody; thus, they cannot be exclusively attributed to focus segmentation.

Importantly, the comparison of congruous and incongruous conditions only used physically identical stimuli, and hence contextual congruity represents the only source of the positivity. The strongest evidence that late positivities in the present experiment do not reflect boundary processing is the fact that they occur not only after incongruously accented targets (Ladd, 1986) but also after incongruously unaccented targets. One might assume that accented words might generate the impression of a boundary because of their acoustic lengthening. However, no such segmental lengthening was measured for unaccented words, and these also gave rise to late positivities in the incongruous condition. The distribution of the positive congruency effect over posterior lateral and midline electrodes is identical in both conditions, which represents further evidence for its similar neural source.

We argue that the late positivities in our data are part of the P600 family and reflect general processes of making sense that are activated by prosodic mismatches (similar to Schumacher & Baumann, 2010). These positivities mark the workings of a general mechanism for the extended analysis of complex information, in this case prosodically misrealized information, and its integration in the discourse (e.g., Brouwer et al., 2012; Hoeks, Hendriks, Redeker, & Stowe, 2010; Burkhardt, 2007). A number of the studies reported in the literature have found late positivities for prosodic mismatches (see Table 1), regardless of whether a prosodic task was carried out, which suggests that the presence of a prosodic task is not the main source of the late positivity, though future research using strictly controlled materials will be needed to determine whether this is the case.

Conclusion

The current study has demonstrated that when listeners are not engaged in a conscious prosodic judgment task, they respond more strongly to accented background information than to unaccented focus information, and that this response is quite early (100 msec). This is not to say that listeners are unaware of missing accents as they clearly react to the incongruity of both sorts of contextual mismatch in a later stage of processing, underlining the importance of prosodic information to normal processing and integration of incoming information into the discourse context. Unlike previous studies in which a prosodic judgment task was used, however, our participants did not find that “less is more.”

This work was supported by an Ubbo Emmius Grant awarded to Diana V. Dimitrova. We would like to thank the three anonymous reviewers for their insightful suggestions, Ryan Taylor for technical assistance, and Myrte Gosen and Albert Everaarts for lending their voices to create the stimuli.

Reprint requests should be sent to Diana V. Dimitrova, Donders Institute for Brain, Cognition and Behaviour, Centre for Cognitive Neuroimaging, Kapittelweg 29, 6525 EN Nijmegen, The Netherlands, or via e-mail: [email protected].

1. 

The Accent × Hemisphere interaction was marginally significant, F(1, 28) = 3.569, p = .07, reflecting a trend for accented elements to be more negative than unaccented elements on electrodes over the left hemisphere.

2. 

A number of marginal effects was found, including a main effect of Accent on lateral electrodes, F(1, 28) = 4.027, p = .06, because of a positivity for accented prepositional objects and an interaction of Accent × Congruity × Anteriority × Hemisphere, F(2, 56) = 3.315, p = .06, most probably because of missing accents giving rise to a positivity on right anterior and central sites and to a negativity on right posterior sites.

Baayen
,
H. R.
,
Piepenbrock
,
R.
, &
Van Rijn
,
H.
(
1993
).
The CELEX lexical database
[CD-ROM]
.
Philadelphia, PA
.
Boersma
,
P.
, &
Weenink
,
D.
(
2010
).
Praat: Doing phonetics by computer
[Version 5.1.35, Computer program]. Retrieved June 10, 2010, from www.praat.org/
.
Bögels
,
S.
,
Schriefers
,
H.
,
Vonk
,
W.
, &
Chwilla
,
D.
(
2011
).
Pitch accents in context: How listeners process accentuation in referential communication.
Neuropsychologia
,
49
,
2022
2036
.
Bögels
,
S.
,
Schriefers
,
H.
,
Vonk
,
W.
,
Chwilla
,
D. J.
, &
Kerkhofs
,
R.
(
2009
).
The interplay between prosody and syntax in sentence processing: The case of subject- and object-control verbs.
Journal of Cognitive Neuroscience
,
22
,
1036
1053
.
Brouwer
,
H.
,
Fitz
,
H.
, &
Hoeks
,
J. C. J.
(
2012
).
Getting real about semantic illusions: Rethinking the functional role of the P600 in language comprehension.
Brain Research
,
1446
,
127
143
.
Burkhardt
,
P.
(
2007
).
The P600 reflects cost of new information in discourse memory.
NeuroReport
,
18
,
1851
1854
.
Cutler
,
A.
,
Dahan
,
D.
, &
Van Donselaar
,
W.
(
1997
).
Prosody in the comprehension of spoken language: A literature review.
Language and Speech
,
40
,
141
201
.
Donchin
,
E.
, &
Coles
,
M. G. H.
(
1988
).
Is the P300 component a manifestation of context updating?
Behavioral and Brain Sciences
,
11
,
355
372
.
Folmer
,
R. L.
,
Billings
,
C. J.
,
Diedesch-Rouse
,
A. C.
,
Gallun
,
F. J.
, &
Lew
,
H. L.
(
2011
).
Electrophysiological assessments of cognition and sensory processing in TBI: Applications for diagnosis, prognosis, and rehabilitation.
International Journal of Psychophysiology
,
82
,
4
15
.
Friedrich
,
C. K.
,
Kotz
,
S. A.
,
Friederici
,
A. D.
, &
Alter
,
K.
(
2004
).
Pitch modulates lexical identification in spoken word recognition: ERP and behavioral evidence.
Cognitive Brain Research
,
20
,
300
308
.
Gordon
,
P. C.
,
Grosz
,
B. J.
, &
Gilliom
,
L. A.
(
1993
).
Pronouns, names and the centering of attention in discourse.
Cognitive Science
,
17
,
311
348
.
Gussenhoven
,
C.
(
2005
).
Transcription of Dutch intonation.
In S. Jun (Ed.)
,
Prosodic typology: The phonology of intonation and phrasing
(pp.
118
145
).
Oxford
:
Oxford University Press
.
Hagoort
,
P.
,
Brown
,
C. M.
, &
Groothusen
,
J.
(
1993
).
The syntactic positive shift (SPS) as an ERP measure of syntactic processing.
Language and Cognitive Processes
,
8
,
439
483
.
Heim
,
S.
, &
Alter
,
K.
(
2006
).
Prosodic pitch accents in language comprehension and production: ERP data and acoustic analysis.
Acta Neurobiologiae Experimentalis
,
66
,
55
68
.
Hoeks
,
J. C. J.
,
Hendriks
,
P.
,
Redeker
,
G.
, &
Stowe
,
L. A.
(
2010
).
Gricean brainwaves: Brain responses to violations of the pragmatic maxim of quantity.
In S. Ohlsson & R. Catrambone (Eds.)
,
Proceedings of the 32nd Annual Meeting of the Cognitive Science Society, August 11–14, Portland, Oregon
(pp.
1325
1329
).
Austin, TX
:
Cognitive Science Society
.
Hoeks
,
J. C. J.
,
Stowe
,
L. A.
, &
Doedens
,
L. H.
(
2004
).
Seeing words in context: The interaction of lexical and sentence level information.
Cognitive Brain Research
,
19
,
59
73
.
Hruska
,
C.
, &
Alter
,
K.
(
2004
).
Prosody in dialogues and single sentences: How prosody can influence speech perception.
In A. Steube (Ed.)
,
Information structure: Theoretical and empirical aspects
(pp.
211
226
).
Berlin
:
Walter de Gruyter
.
Hruska
,
C.
,
Alter
,
K.
,
Steinhauer
,
K.
, &
Steube
,
A.
(
2001
).
Misleading dialogs: Human's brain reaction to prosodic information.
In C. Cave, I. Guaitella, & S. Santi (Eds.)
,
Orality and gestures
(pp.
425
430
).
Paris
:
L'Hartmattan
.
Jackson
,
S.
,
Jackson
,
G.
, &
Roberts
,
M.
(
1999
).
The selection and suppression of action: ERP correlates of executive control in humans.
NeuroReport
,
10
,
861
865
.
Kaan
,
E.
,
Harris
,
A.
,
Gibson
,
E.
, &
Holcomb
,
P.
(
2000
).
The P600 as an index of syntactic integration difficulty.
Language and Cognitive Processes
,
15
,
159
201
.
Kerkhofs
,
R.
,
Vonk
,
W.
,
Schriefers
,
H.
, &
Chwilla
,
D. J.
(
2007
).
Discourse, syntax, and prosody: The brain reveals an immediate interaction.
Journal of Cognitive Neuroscience
,
19
,
1421
1434
.
Kutas
,
M.
, &
Hillyard
,
S. A.
(
1980
).
Reading senseless sentences: Brain potentials reflect semantic incongruity.
Science
,
207
,
203
205
.
Ladd
,
D. R.
(
1986
).
Intonational phrasing: The case for recursive prosodic structure.
Phonology Yearbook
,
3
,
311
340
.
Ladd
,
D. R.
(
1996
).
Intonational phonology.
Cambridge
:
CUP
.
Ladd
,
D. R.
,
Mennen
,
I.
, &
Schepman
,
A.
(
2000
).
Phonological conditioning of peak alignment of rising pitch accents in Dutch.
Journal of the Acoustic Society of America
,
107
,
2685
2696
.
LeDoux
,
K.
,
Camblin
,
C. C.
,
Swaab
,
T. Y.
, &
Gordon
,
P. C.
(
2006
).
Reading words in discourse: The modulation of lexical priming effects by message-level context.
Behavioral and Cognitive Neuroscience Reviews
,
5
,
107
127
.
Li
,
X.
, &
Ren
,
G.
(
2012
).
How and when accentuation influences temporally selective attention and subsequent semantic processing during on-line spoken language comprehension: An ERP study.
Neuropsychologia
,
50
,
1882
1894
.
Li
,
X.
,
Wang
,
Y.
, &
Lu
,
Y.
(
2010
).
How and when prosodic boundaries influence syntactic parsing under different discourse contexts: An ERP study.
Biological Psychology
,
83
,
250
259
.
Magne
,
C.
,
Astésano
,
C.
,
Lacheret-Dujour
,
A.
,
Morel
,
M.
,
Alter
,
K.
, &
Besson
,
M.
(
2005
).
On-line processing of “pop-out” words in spoken French dialogues.
Journal of Cognitive Neuroscience
,
17
,
740
756
.
Mueller
,
J. L.
(
2009
).
The influence of lexical familiarity on ERP responses during sentence comprehension in language learners.
Second Language Research
,
25
,
43
76
.
Nooteboom
,
S. G.
, &
Kruyt
,
J. G.
(
1987
).
Accents, focus distribution, and the perceived distribution of given and new information: An experiment.
Journal of the Acoustic Society of America
,
82
,
1512
1524
.
Osterhout
,
L.
, &
Holcomb
,
P. J.
(
1992
).
Event-related brain potentials elicited by syntactic anomaly.
Journal of Memory and Language
,
31
,
785
806
.
Pannekamp
,
A.
,
Toepel
,
U.
,
Alter
,
K.
,
Hahne
,
A.
, &
Friederici
,
A. D.
(
2005
).
Prosody-driven sentence processing: An event-related brain potential study.
Journal of Cognitive Neuroscience
,
17
,
407
421
.
Philips
,
C.
,
Kazanina
,
N.
, &
Abada
,
S. H.
(
2005
).
ERP effects of the processing of syntactic long-distance dependencies.
Cognitive Brain Research
,
22
,
407
428
.
Picton
,
T. W.
(
1992
).
The P300 wave of the human even-related potential.
Journal of Clinical Neurophysiology
,
9
,
456
497
.
Rugg
,
M. D.
, &
Coles
,
M. G. H.
(
1996
).
Electrophysiology of mind—Event-related brain potentials and cognition.
Oxford
:
Oxford University Press
.
Schumacher
,
P. B.
, &
Baumann
,
S.
(
2010
).
Pitch accent type affects the N400 during referential processing.
NeuroReport
,
21
,
618
622
.
Steinhauer
,
K.
,
Alter
,
K.
, &
Friederici
,
A. D.
(
1999
).
Brain potentials indicate immediate use of prosodic cues in natural speech processing.
Nature Neuroscience
,
2
,
191
196
.
Steinhauer
,
K.
, &
Friederici
,
A. D.
(
2001
).
Prosodic boundaries, comma rules, and brain responses: The closure positive shift in ERPs as a universal marker for prosodic phrasing in listeners and readers.
Journal of Psycholinguistic Research
,
30
,
267
295
.
Swerts
,
M.
,
Krahmer
,
E.
, &
Avesani
,
C.
(
2002
).
Prosodic marking of information status in Dutch and Italian: A comparative analysis.
Journal of Phonetics
,
30
,
629
654
.
Toepel
,
U.
, &
Alter
,
K.
(
2004
).
On the independence of information structure processing from prosody.
In A. Steube (Ed.)
,
Information structure: Theoretical and empirical evidence
(pp.
227
240
).
Berlin
:
Walter de Gruyter
.
Toepel
,
U.
,
Pannekamp
,
A.
, &
Alter
,
K.
(
2007
).
Catching the news: Processing strategies in listening to dialogs as measured by ERPs.
Behavioral and Brain Functions
,
3
,
53
.
Vallduvi
,
E.
(
2002
).
The informational component.
New York
:
Garland
.
Walter
,
W. G.
,
Cooper
,
R.
,
Aldridge
,
V. J.
,
McCallum
,
W. C.
, &
Winter
,
A. L.
(
1964
).
Contingent negative variation: An electric sign of sensorimotor association and expectancy in the human brain.
Nature
,
203
,
380
384
.
Wang
,
L.
,
Bastiaansen
,
M.
,
Yang
,
Y.
, &
Hagoort
,
P.
(
2011
).
The influence of information structure on the depth of semantic processing: How focus and pitch accent determine the size of the N400 effect.
Neuropsychologia
,
49
,
813
820
.
Wang
,
L.
,
Hagoort
,
P.
, &
Yang
,
Y.
(
2009
).
Semantic illusion depends on information structure: ERP evidence.
Brain Research
,
1282
,
50
56
.
Wilson
,
D.
, &
Wharton
,
T.
(
2006
).
Relevance and prosody.
Journal of Pragmatics
,
38
,
1559
1579
.