Neural responses to an attended event are typically enhanced relative to those from an unattended one (attention enhancement). Conversely, neural responses to a predicted event are typically reduced relative to those from an unpredicted one (prediction suppression). What remains to be established is what happens with attended and predicted events. To examine the interaction between attention and prediction, we combined two robust paradigms developed for studying attention and prediction effects on ERPs into an orthogonal design. Participants were presented with sounds in attended or unattended intervals with onsets that were either predicted by a moving visual cue or unpredicted (no cue was provided). We demonstrated an N1 enhancement effect for attended sounds and an N1 suppression effect for predicted sounds; furthermore, an interaction between these effects was found that emerged early in the N1 (50–95 msec), indicating that attention enhancement only occurred when the sound was unpredicted. This pattern of results can be explained by the precision of the predictive cue that reduces the need for attention selection in the attended and predicted condition.
The sheer amount of change in the sensory environment (potential information) poses a considerable challenge for real-time perception and cognition. Being able to prioritize sensory processing to the demands of a task (attention) and exploit prior probabilities (prediction) provides a basis for efficient processing of sensory information. At a behavioral level, studies have shown that attending to an object or event enhances detection speed and accuracy (e.g., Carrasco, 2011; Posner, 1980). Likewise, benefits in response speed and accuracy also occur for predicted events (e.g., Paris, Kim, & Davis, 2013; Nobre, Correa, & Coull, 2007; Bar, 2004).
Despite the similarity of the effects of attention and prediction on behavior, it is striking that at a neural level these processes produce opposing responses. That is, neural responses to attended events are typically enhanced (e.g., Lange, Rösler, & Röder, 2003; Kastner & Ungerleider, 2000) whereas those to predicted events are often suppressed (e.g., Vroomen & Stekelenburg, 2010; Summerfield & Egner, 2009). Given that at a neural level the effects of prediction and attention oppose each other, the current study examined what occurs when an event is both attended and predicted. To do this, we combined two paradigms that have been commonly used to assess the effects of attention or prediction. In what follows, we outline these paradigms in detail.
The first paradigm, known as temporal cueing, requires participants to attend to a specific point in time. Neural responses to events occurring at this attended time point are compared with the ones at unattended time points (Lange, 2013; Lange et al., 2003). In this paradigm, participants are presented with two sounds separated by either a short (600 msec) or long (1200 msec) interval and asked to direct their attention to one of these time points following the first sound. Depending upon which interval has been cued, the second sound is either attended or unattended. To ensure attention is paid to the relevant time point, participants are instructed to only make a response to the second sound when it occurs at the attended interval and is a deviant (different sound). Such studies typically show a neural enhancement at the N1 latency (100 msec after the auditory event) for attended relative to unattended tones (Lange, 2012; Sanders & Astheimer, 2008; Lange & Röder, 2006; Lange et al., 2003, 2006).
The paradigm used to assess the effect of temporal prediction employs a visual cue to an up-coming auditory event. Neural responses to a temporally cued event (i.e., those followed by the visual cue) are then compared with a noncued event (Arnal, Morillon, Kell, & Giraud, 2009; Stekelenburg & Vroomen, 2007; Van Wassenhove, Grant, & Poeppel, 2005; Paris, Kim & Davis, in press). For example, studies have used visual cues such as presenting a hammer moving toward a nail or the movement of the jaw and lips, which provide information about the timing of the onset an auditory event (the sound of hammer hitting the nail or speech, respectively). These studies have shown that the amplitude of the N1 response to the sound is reduced when the participant sees such anticipatory movements compared to when they do not. In addition to such ecological stimuli, this “prediction suppression” effect has also been shown using more precisely controlled artificial stimuli. In a recent study by Vroomen and Stekelenburg (2010), participants were presented with either tones in isolation (unpredicted) or tones preceded by a moving disk that collided with a bar, producing a sound (predicted).
In all, the results of the above-mentioned studies suggest that combining the two paradigms would provide a straightforward tool for investigating the collective effects of attention and prediction on the N1 ERP. To our knowledge, these two paradigms have never been used in combination nor has the question of how attention and prediction may interact been explicitly tested in this way. Previous studies do, however, motivate two different proposals: (1) the effects of attention and prediction will interact such that prediction suppression will occur for unattended stimuli whereas prediction enhancement will occur for attended ones, and (2) these effects will not interact.
The first proposal is based on the model of predictive coding (Friston, 2005). Predictive coding proposes that neural responses reflect the residual difference (prediction error) between bottom–up sensory information and top–down predictions and that these prediction errors are also weighted according to their precision (i.e., inverse variance; Feldman & Friston, 2010). Importantly, attention is known to enhance precision (Friston, 2009; Rao, 2005) leading to a multiplicative effect on the magnitude of the neural response (prediction error) and subsequently to an interaction between attention and prediction (Arnal & Giraud, 2012; Feldman & Friston, 2010).
A recent study has found support for this proposal using fMRI (Kok, Rahnev, Jehee, Lau, & de Lange, 2012). In this study, the words “left” and “right” were used to indicate the likely location (75% probability) of an upcoming visual stimulus, and a triangular cue was used to direct attention to one side of the visual field. If the target stimulus occurred in the attended visual field, participants were required to respond. The results showed that prediction suppression occurred when attention was directed to the opposing visual field (unattended trials); however, the reverse (prediction enhancement) occurred for attended trials. This research suggests that attention and prediction do interact; yet due to the course temporal resolution of fMRI, it is unknown whether this interaction effect would also occur at the N1 latency.
The alternative proposal is that the effect that attention and prediction have on ERPs will not interact, that is, that the component effect of attention and prediction will be additive. This hypothesis is consistent with the results of studies that have investigated one of the component processes (e.g., attention) but also included elements of the other (e.g., prediction). For instance, a recent review by Lange (2013) pointed out that a number of temporal attention studies have also included manipulations of temporal prediction. Lange suggested that studies that have used “pure” manipulations of temporal attention showed a more pronounced N1 enhancement (Lange et al., 2003; Schröger, 1993, Experiment 1; Sanders & Astheimer, 2008) compared with those where the paradigm also included elements of temporal prediction (Lampar & Lange, 2011; Schröger, 1993, Experiment 2). This can best be illustrated in two studies by Lange herself. In 2003, Lange and colleagues used a temporal cueing paradigm and found strong N1 attention enhancement effects. However, in a subsequent study, in which the target tones were made predictable by a probabilistic cue, a relatively smaller N1 enhancement was found (Lampar & Lange, 2011). On the basis of this, Lange proposed that the effects of attention and prediction were additive, arguing that the amplitude of the N1 response indicates the relative degree of attention and/or predictive processing in response to a sensory event, with attention enhancing the neural response and prediction reducing it. Yet it should be noted that this proposal has been drawn from the results of multiple studies and has yet to be evaluated within a single study.
To test these two alternative proposals about the combined effect of attention and prediction, we conducted an experiment manipulating temporal attention and prediction in an orthogonal design. We achieved this by combining the attention enhancement paradigm of Lange et al. (2003) and the prediction suppression paradigm of Vroomen and Stekelenburg (2010) into a single study. These paradigms were selected because, in both paradigms, attention and prediction cueing operate on the same (temporal) dimension and the effects of attention and prediction have been shown around the same latency of the ERP (100 msec).
On the basis of previous research, it was expected that both the main effects of attention enhancement for attended relative to unattended sounds and prediction suppression for predicted relative to unpredicted sounds at the N1 latency would be found. Given the methodological differences compared to previous research, it was not known whether or how attention and prediction would interact in the current study.
Thirty-seven participants (21 women) from the University of Western Sydney took part in the experiment. Their age ranged from 19 to 37 years (M = 24, SD = 4.31). All participants reported having normal or corrected-to-normal vision and hearing and were right-handed. The study was conducted with approval of the ethics committee of the University of Western Sydney.
Four participants were rejected because they failed to perform adequately on the behavioral task (Lange, 2012). That is, they generated less than 60% correct responses to the attended targets or greater than 30% incorrect responses to unattended targets. A further four participants were excluded from data analysis due to insufficient data (less than 40 trials in any condition after artifact rejection). The final sample consisted of 29 participants (15 women, mean age = 24.5 years).
Stimuli and Experimental Design
Auditory stimuli were three different tones (220, 440, and 880 Hz). All tones were 100 msec in length with a 10-msec rise/fall time. Sounds were presented at a comfortable listening volume, and the intensity was kept constant across participants. The visual stimulus was a static image of the disc above a horizontal bar and a video of a disc, which “fell” from a fixed position to a horizontal bar (visual angle 4.5° × 4.5°). There was 600 msec of anticipatory movement before the disc hit the bar.
There were three presentation conditions: auditory only (AO), auditory-visual (AV) and visual only (VO; see Figure 1). In AO trials, the 440-Hz tone was presented first (as an auditory cue), followed by a low pitch (220 Hz) or high pitch (880 Hz) tone after either 600 or 1200 msec. This auditory sequence was presented along with a static image of the disc above a horizontal bar. In AV trials, the video of a disc falling was added to the AO condition, and here, the collision of a disc with the horizontal bar always occurred at the onset of the second auditory tone. The VO condition consisted of the same stimuli as the AV condition without a second tone being presented. In this condition, the disc hit the horizontal bar at either 600 or 1200 msec after the initial auditory cue.
Attention was manipulated by instructing participants to attend to either the early or late time point. If a deviant (low pitch) tone occurred at that time point, participants were asked to make a response using the keyboard. The attention manipulation was counterbalanced across four blocks of 288 trials, and participants were reminded during the breaks which time point they were attending to. Prediction was manipulated by presenting participants with the falling disc (AV trials) in predicted trials and the static disc in unpredicted (AO) trials. The VO condition was included so that the falling disc was not a valid cue to the onset of the tone, thus reducing the likelihood that the predictive cue would be used to orient attention. In this respect, the visual stimulus was followed by a sound (AV condition) in 65.3% of trials and by no sound (VO condition) in the remaining 34.7% of trials.
Overall, there were 952 trials in the experiment consisting of four repetitions of 66 standard AO and AV trails, four repetitions of 28 deviant AO and AV trials, and four repetitions of 50 VO trials. In this way, there were an equal number of short and long intervals presented with AO, VO, and AV stimuli. The second sound was a deviant in 19.4% of trials, a standard in 45.8% of trials, and not presented in 34.7% of trials, that is, VO trials.
The experiment took place in an electrically shielded room. Visual stimuli were presented on a 1024 × 768 resolution 51-cm CRT monitor positioned 1 m from the participant so that the videos subtended a visual angle of 3.15°. Sounds were presented through transducer earphones (ER-30 insert).
Each trial began with a static image of the disc above a horizontal bar, which was displayed for a variable duration between 500 and 1000 msec (10-msec steps, rectangular distribution). Following this, either the AO, AV, or VO stimuli was presented for 2 sec. Lastly, there was an additional 600 msec before the onset of the following trial. Each trial lasted on average 3.35 sec. Stimuli were presented using Psychtoolbox (Brainard, 1997).
Before the experiment, participants were presented with example stimuli to become familiar with the AO, VO, and AV trials as well as with the attention task. Subsequent to this, participants completed two 5-min practice blocks and were given their overall score (hits − false alarms). Additional trials and training were given to participants who scored less than 0.7 on this measure. To keep participants alert and motivated during the experiment proper, breaks were given every 5 min and participants were given feedback on their performance. The experiment lasted approximately 1 hr.
EEG Recording and Analysis
The EEG was recorded using 64 active electrodes (Biosemi, Amsterdam, The Netherlands) positioned according to the extended standard 10–10 system (Oostenveld & Praamstra, 2001) at a sampling rate of 512 Hz. An additional eight electrodes were used: Two electrodes on the mastoids, four ocular electrodes to detect eye blinks (horizontal and vertical EOG), and two electrodes served as reference (CMS/DRL). Before cap placement, participants brushed their hair to improve the conductance between scalp and electrodes (Mahajan & McArthur, 2010). The EEG was referenced offline to the average of the mastoids and low-pass filtered (30 Hz, 12 dB octave). Large artifacts were corrected from the data before ICA decomposition, and eye-blink stereotyped components were removed. Preprocessing and data analysis was performed using EEGLAB version 10 (Delorme & Makeig, 2004) and custom Matlab scripts (The Mathworks, Natick, MA).
The data were segmented into epochs of 600 msec (200 msec before and 400 msec after the short interval offset) for standard tones only, including a baseline period of 0–50 msec. This baseline window was chosen to account for any influence of slow drift (CNV) in the attention conditions (see Lampar & Lange, 2011; Correa & Nobre, 2008; Correa, Lupiáñez, Madrid, & Tudela, 2006). Epochs with amplitude exceeding ±150 μV at any channel were rejected.
We analyzed RTs and accuracy (Z(hits) − Z(false alarms)) for both predicted and unpredicted conditions at short and long intervals. A two-way ANOVA revealed a significant interaction between Interval and Condition, F(1, 25) = 4.85, p < .05. On the basis of this, the results for each interval were analyzed separately. The mean RT in the short interval was 590 msec after auditory onset, and the mean d′ score was 3.05. Neither of these scores differed between predicted (M = 573 msec, d′ = 3.01) and unpredicted (M = 575 msec, d′ = 2.95) trials (t(29) = 0.19, p > .05; t(29) = 0.18, p > .05). However, the pattern of results differed in the long-interval condition. RTs were significantly slower in the long condition (M = 611 msec; t(29) = 3.58, p < .05), and in this condition only, responses were faster to predicted (M = 601 msec) relative to unpredicted deviant sounds (M = 622 msec, t(29) = 2.31, p < .05).
Only ERP responses to short-interval trials were analyzed. This was because of differences in the ongoing probability between early and late conditions that may have interacted with visual prediction manipulation. For example, the overall probability of a sound occurring at the early interval increases when the visual cue occurs (39.5% vs. 65.3%); however, at the second interval, this probability decreased (79.0% vs. 69.3%), indicating that a sound is less likely when a predictive cue is seen at this interval. Thus, any prediction effect due to the visual cue at the late stage would be confounded by ongoing probability changes. In addition, only ERP responses to standard tones were analyzed as these trials contained no potential motor artifacts from behavioral responses.
All ERP analyses were performed using cluster-based permutation testing over all electrodes and time points between 0 and 150 msec.1 This method was chosen to fully describe and compare the spatial and temporal properties of attention and prediction effects while controlling for the Type 1 error rate. A detailed outline of this procedure can be found in Maris and Oostenveld (2007). The threshold for cluster inclusion was set at 0.05 with an average of 6.2 electrodes per cluster. A maximum T statistic was used, and 2000 permutations were run. To test for significance, an alpha value of .05 was used for one-tailed tests where the direction of the effect was hypothesized a priori (attention and prediction effects), and an alpha value of .025 when the direction of the effect was unknown (interaction analysis).
To investigate the effect of attention, we compared neural responses between unattended and attended events in both AO and AV conditions. Of primary interest was the comparison between attention conditions in the unpredicted (AO) condition as this condition (no visual cue) was most similar to that used in previous studies. The cluster-based permutation test found one significant channel time cluster from 48 to 84 msec in the unpredicted condition (p = .042; Figure 4A, top row), indicating larger responses to attended relative to unattended sounds. No significant clusters were identified in the comparison between attended and unattended sounds in the predicted condition (p > .05; Figure 4A). ERP waveforms representing the effect of attention can also be seen in Figure 2.
To investigate the effect of prediction, we compared neural responses between predicted (AV–VO) relative to unpredicted (AO) time points for both attended and unattended trials. Of primary interest was the comparison between prediction conditions in the unattended condition, as this condition was isolated from attention processes. In this condition, the cluster-based permutation test found one significant channel time cluster from 86–150 msec (p = .033; Figure 3, bottom row). There was also one significant cluster found for this comparison in the attended condition from 77 to 146 msec (p = .014). For both significant clusters, there was a reduced neural response for predicted relative to unpredicted trials (Figure 3, top row).
We analyzed the interaction between attention and prediction effects by comparing the magnitude of the attention effect between predicted and unpredicted conditions. This analysis was chosen as the other possible interaction (comparing the prediction effect between attended and unattended conditions) is problematic. This is because it is important when comparing predicted and unpredicted responses to subtract out the evoked response to the visual stimulus (i.e., AO vs. AV–VO; Vroomen & Stekelenburg, 2010; Stekelenburg & Vroomen, 2007) to account for differences in the stimuli between AV and AO stimuli. However, this subtraction method is problematic when attention is also directed to the same time point as the visual cue, as this AVattended − VOattended subtraction process may also subtract out additional components of attention that occur in the VO condition. For this reason, we decided to measure the interaction by specifically comparing the difference in attention scores between AV and VO conditions.
To perform the desired interaction analysis, we first normalized responses (zero mean and unit variance) across AV and AO conditions between attended and unattended conditions for each participant. Difference scores were then created by subtracting normalized attended and unattended ERPs in both predicted and unpredicted conditions, which were then submitted to a permutation analysis (Attendedpredicted − Unattendedpredicted vs. Attendedunpredicted − Unattendedunpredicted). This analysis resulted in one significant cluster from 48 to 95 msec at frontocentral electrodes (p = .023, Figure 4B).
It was also decided to evaluate the effect of attention in terms of the ongoing negativity of the ERP waveform. One component, known as the “Negative Difference” or Nd has been shown to reflect processing differences between attended and unattended events (Näätänen, 1990; Näätänen & Michie, 1979). On the basis of prior studies, we evaluated the difference in mean ERP amplitude from 200 to 300 msec between attended and unattended responses (McDonald, Teder-Sälejärvi, Di Russo, & Hillyard, 2003; McDonald & Ward, 2000), using the cluster permutation method described above (i.e., across all electrodes and time points between 200 and 300 msec). The results indicated no significant attention effect in this time window for both the predicted and unpredicted condition (ps > .05). In addition, the magnitude of the attention effect between prediction conditions was also not significantly different (p > .05).
The main aim of this study was to investigate whether the processes of attention and prediction would interact in the modulation of the N1 ERP. Previous studies suggested two possible outcomes; either attention and prediction interact such that attention enhances neural responses for predicted events (Kok et al., 2012) or, alternatively, these effects are additive and do not interact (Lange, 2012).
Before discussing the results with regards to the main issue, it is important to point out that the separate effects of attention and prediction were found as in previous studies (Lange, 2013; Vroomen & Stekelenburg, 2010). That is, an attention enhancement effect was found when there was no influence of prediction (unpredicted condition), and a prediction suppression effect was found when there was no influence of attention (unattended condition; see Figure 3). These results indicate that the current manipulations of attention and prediction were successful.
One aspect of the current attention effect that appears to differ from those reported previously was that it occurred relatively early, that is, 48–84 msec after auditory onset. Previous research has found such N1 effects typically around 100 msec (e.g., between 70 and 120 msec in Lange et al., 2003). However, earlier effects have been reported, for example, Lange and Röder (2006) did a point-wise analysis on 20-msec intervals and found that the earliest effect of attention emerged 60 msec after sound onset. Another apparent departure from previous findings is that the window of the current attention effect did not extend past 84 msec. The apparent truncation of this effect may be due to the permutation analysis, as this is known to be less sensitive in detecting effects than parametric statistics performed on a predefined subset of data, such as at 100 msec over central electrodes (Maris & Oostenveld, 2007). Indeed, qualitative evidence for a later effect of attention can be seen in the topographic plots (Figure 4) that show attention enhancement over central electrodes between 100 and 140 msec.
With regards to the main issue, possible interactions between the processing of attended and predicted events were tested for by comparing the magnitude of the attention effect in the AV (predicted) and AO (unpredicted) conditions. This analysis detected a significant cluster of electrodes between 50 and 95 msec indicating that the effect of attention was larger in the unpredicted condition. The latency of this effect overlaps considerably with the attention enhancement effect, and so it would seem that this interaction was driven by the presence of a significant attention enhancement effect in the unpredicted condition that was absent in the predicted condition.
One way to understand the interaction is in terms of the mechanism by which temporal attention produces an increase in ERPs and the extent to which this mechanism operates in the predicted and unpredicted conditions. The basic idea is that temporal attention leads to relatively greater neural activity because, in the face of limited processing resources, it allows more resources to be directed to the relevant temporal period. For example, in the unpredicted condition, when there is no attention cue, neural resources are spread diffusely over a relatively large time window; when attention is directed to the end of a particular interval (either short or long), neural resources can be oriented to that specific time point, leading to larger neural responses. We propose that this process of selection is made redundant when a visual stimulus precisely predicts when a sound will occur. That is, on the basis of the clear visual prediction to the onset of the sound in the prediction condition, greater attention selection is not needed, and therefore, it does not influence the response. Such a prediction effect would occur in both the unattended and attended conditions, and so there is no modulating effect of attention.
In the above proposal, the precision with which an event is predicted is critical for the interplay between attention and prediction. Indeed, variation in the precision of the prediction cues might explain why the current study showed different patterns of interaction effects compared to two previous studies (Hsu, Hämäläinen, & Waszak, 2014; Kok et al., 2012). In the study by Hsu and colleagues (2014), participants were presented with tone pairs, in which the second tone was always either higher or lower than the previous tone (predictable stream) or in which the second tone was of a random pitch (unpredictable stream). Importantly, the specific frequency of the second tone varied from trial to trial. Similarly, in the study of Kok and colleagues (2012) the prediction cue was given at the start of a block of trials and provided no additional information as to the features of the upcoming stimulus (such as when it will occur). In contrast, the prediction cue used in the current study provided continuous and precise information as to when the predicted sound will occur. According to the current proposal, less precise predictions would not affect attention selection much, and thus attention enhancement should occur. In line with this, attention enhancement was observed in both these studies in the predicted conditions.
We have argued that an important difference between prior studies and the current one is the precision of the prediction cue. It is also worth noting that previous studies were conducted with spatial (stimulus location)- and semantic (tone pitch)-based predictions rather than temporal ones. Given this, it could be that the different results we observed were due to some fundamental difference in how temporal and spatial/semantic prediction cues are processed. For instance, research has shown that temporal cues can influence the phase of neural responses (Busch & VanRullen, 2010; Schroeder & Lakatos, 2008). That is, visual to auditory predictions, as used in the current study, can cause neural responses to become phase-locked (aligned) to the onset of expected stimuli; such oscillatory effects have been shown to be involved in the processing of cross-modal temporal predictions (Lakatos et al., 2009; Schroeder, Lakatos, Kajikawa, Partan, & Puce, 2008) and have also been associated with N1 ERP effects (Arnal & Giraud, 2012; Arnal et al., 2009). This mechanism may explain why the processing of temporal predictions differs from spatial and semantic ones. This idea could be tested in a study that manipulates cue precision in both temporal and spatial/semantic domains. If prediction precision (and not temporal cueing specifically) contributed to the current set of results, then similar results should be observed when using stimuli with more precise spatial or semantic predictions.
In summary, this study combined two classic paradigms that have been used to examine the effects of attention and prediction into a single, orthogonal design. The component effects of attention enhancement and prediction suppression were found, but the novel result was the interaction between these effects such that that attention enhancement occurred for unpredicted events but not for predicted events. We propose that, at least in the temporal domain, the presentation of a precise prediction cue obviated the need for attention. This interpretation calls for further studies to explore how attention and prediction interact given differing degrees of predication precision in both temporal and spatial domains.
Reprint requests should be sent to Tim Paris, The MARCS Institute, University of Western Sydney, Bullecourt Avenue, Milperra, 2214, NSW, Australia, or via e-mail: firstname.lastname@example.org.
We originally performed permutation analysis on a set range of 50 msec either side of the N1 peak (i.e., 50–150 msec); however, this yielded significance from the earliest time point so we extended this window to encompass all effects of interest (0–150 msec; Kiesel, Miller, Jolicœur, & Brisson, 2008).