Abstract
Stimuli that have been generated by a person's own willed motor actions generally elicit a suppressed electrophysiological, as well as phenomenological, response compared with identical stimuli that have been externally generated. This well-studied phenomenon, known as sensory attenuation, has mostly been studied by comparing ERPs evoked by self-initiated and externally generated sounds. However, most studies have assumed a uniform action–effect contingency, in which a motor action leads to a resulting sensation 100% of the time. In this study, we investigated the effect of manipulating the probability of action–effect contingencies on the sensory attenuation effect. In Experiment 1, participants watched a moving, marked tickertape while EEG was recorded. In the full-contingency (FC) condition, participants chose whether to press a button by a certain mark on the tickertape. If a button press had not occurred by the mark, a sound would be played a second later 100% of the time. If the button was pressed before the mark, the sound was not played. In the no-contingency (NC) condition, participants observed the same tickertape; in contrast, however, if participants did not press the button by the mark, a sound would occur only 50% of the time (NC-inaction). Furthermore, in the NC condition, if a participant pressed the button before the mark, a sound would also play 50% of the time (NC-action). In Experiment 2, the design was identical, except that a willed action (as opposed to a willed inaction) triggered the sound in the FC condition. The results were consistent across the two experiments: Although there were no differences in N1 amplitude between conditions, the amplitude of the Tb and P2 components were smaller in the FC condition compared with the NC-inaction condition, and the amplitude of the P2 component was also smaller in the FC condition compared with the NC-action condition. The results suggest that the effect of contingency on electrophysiological indices of sensory attenuation may be indexed primarily by the Tb and P2 components, rather than the N1 component which is most commonly studied.
INTRODUCTION
Stimuli that have been generated by a person's own willed motor actions generally elicit a suppressed electrophysiological, as well as phenomenological, response compared with physically identical stimuli that have been externally generated (Hughes, Desantis, & Waszak, 2013). This well-studied phenomenon, known as sensory attenuation, has mostly been studied by comparing ERPs evoked by self-initiated and externally generated sounds (Pinheiro, Schwartze, Gutiérrez-Domínguez, & Kotz, 2020; Pinheiro, Schwartze, Gutierrez, & Kotz, 2019; Horváth, 2015; Gentsch & Schütz-Bosbach, 2011; Ford, Roach, & Mathalon, 2010; Baess, Widmann, Roye, Schröger, & Jacobsen, 2009; Aliu, Houde, & Nagarajan, 2008; Martikainen, Kaneko, & Hari, 2005; Schafer & Marcus, 1973). A large body of literature shows that certain components of the auditory-evoked potential are reduced in amplitude when participants hear sounds initiated by their own motor actions (e.g., overt speech, button press elicited tones), compared with when they passively listen to the same sounds. This effect has been most commonly observed with the N1 component (Jack et al., 2021; Harrison et al., 2021; Klaffehn, Baess, Kunde, & Pfister, 2019; Pinheiro et al., 2019; Elijah, Le Pelley, & Whitford, 2018; Neszmélyi & Horváth, 2017; Mifsud et al., 2016; Oestreich et al., 2016; van Elk, Salomon, Kannape, & Blanke, 2014; Bäß, Jacobsen, & Schröger, 2008) but has also been identified with the Tb (SanMiguel, Widmann, Bendixen, Trujillo-Barreto, & Schroger, 2013; Saupe, Widmann, Trujillo-Barreto, & Schröger, 2013) and P2 components (Horváth & Burgyán, 2013; Knolle, Schröger, Baess, & Kotz, 2012).
Within the sensory attenuation literature, a topic that has remained relatively unexplored is the role of stimulus contingency (Horváth, 2015). Here, we operationalize contingency as the probability of a stimulus (in this case, a sound) occurring because of a willed action (or willed inaction). Although there have been some attempts to investigate contingency in the sensory attenuation literature, most previous studies have operationalized contingency in terms of temporal contingency; specifically, most previous studies have manipulated contingency by varying the temporal delay between the motor action and sound. For example, Bäß et al. (2008) investigated the effect of tone frequency and onset predictability on N1 suppression. In their motor–auditory and auditory-only conditions, frequency and onset predictability were manipulated in a 2 × 2 design. Sounds could be (a) predictable in both frequency (a 1000-Hz sound) and onset, in which the sound immediately followed the button press; (b) unpredictable in frequency (ranging from 400 to 1990 Hz) but predictable in onset; (c) predictable in frequency but unpredictable in onset, in which a random delay of 500–1000 msec was imposed between the action and effect; and (d) unpredictable in both frequency and sound onset. They found that N1 suppression occurred regardless of the predictability of the frequency and onset of the sound. Pinheiro et al. (2019) conducted a study where various delays were inserted between the action and sound within action–effect contingencies as to induce temporal prediction errors. In 30% of trials, sounds that followed the button press were presented with a delay of either 50, 100, or 250 msec. They found that N1 suppression still occurred with delays of up to 100 msec between the action and the sound.
Another method of manipulating contingency is by changing the probability of a sound resulting from an action. Typical self-stimulation experiments often have action–effect contingencies of 100%, meaning that actions (e.g., button presses) lead to sounds in 100% of trials (Horváth, 2015). Action–effect contingencies—especially in ecological settings—do not always occur in the uniform manner that they do in the typical laboratory setting. For example, Knolle, Schröger, and Kotz (2013) tested whether N1 suppression effects were larger when self-generated sounds were correctly predicted rather than when they were not. Self- and externally generated sounds were categorized as frequent standard-pitched sounds (70% probability) or infrequent deviant-pitched sounds (30% probability). Although N1 suppression occurred for both standard and deviant sounds, the size of the N1 suppression effect was smaller for deviant sounds compared with the standard sounds, indicating that predictions for self-generated sounds contained specificity. In another study, SanMiguel, Widmann, et al. (2013) recorded EEG from human participants while they pressed a button that either consistently (88%), inconsistently (50%), or never (0%) initiated a sound. During conditions where button presses consistently produced a sound (88%), EEG responses were also obtained when sounds were omitted in 12% of trials. Their results showed that during such omitted trials, EEG responses showed a pattern of activity that shared brain sources and time course with exogenous responses to trials where there were actual stimuli. Based on these results, the authors argued that the brain activates a template of its response to predicted stimuli before sensory input arrived at our senses.
Ecologically, people sometimes incorrectly predict the outcome of an action (e.g., when trying to guess the outcome of a button press on a slots machine). Essentially, manipulation of action–effect contingency is a method of modifying participants' confidence in their predictions regarding the sensory consequences of their actions. In essence, both forms of operationalization—by which we mean both manipulating of temporal onset versus the manipulating the probability of sounds occurring—perform the function of reducing the participant's confidence that the expected sensory event will occur.
In this study, we manipulated action–effect contingency by varying the probability that actions (or inaction) would result in a tone being played. In these two experiments, participants observed a short animation and, on each trial, were required to decide whether to press a button (the space bar of the keyboard). This decision determined whether a sound would subsequently be presented after a significant delay. The two experiments were based on the experimental paradigm used in Han, Jack, Hughes, Elijah, and Whitford (2021), which demonstrated that willed inaction might also result in sensory attenuation. In both experiments, there were two conditions: the full-contingency (FC) condition and the no-contingency (NC) condition. In the FC condition, sounds always followed a willed inaction (in Experiment 1) or always followed the button press (in Experiment 2). In contrast, sounds followed willed inaction (Experiment 1) or a button press (Experiment 2) only 50% of the time in the NC condition. The definition of contingency—the probability of an event occurring as a result of an action minus the probability of an event occurring in the absence of that action—was taken from Elsner and Hommel (2004).
Like most previous studies in the sensory attenuation literature, we analyzed the N1 and P2 components of the ERP. We also included the Tb component, which is believed to reflect activity of the secondary auditory cortex (Rihs et al., 2013; Tonnquist-Uhlen, Ponton, Eggermont, Kwong, & Don, 2003; Gallinat et al., 2002; Wolpaw & Penry, 1975) and which has been found to be dependent on the extent to which participants had agency over the generation of auditory stimuli (Han et al., 2021). We analyzed all ERPs time-locked onto the sound. Previous research using the traditional self-stimulation paradigm has demonstrated suppression across the N1, Tb, and P2 components in the active condition relative to the passive condition (SanMiguel, Todd, & Schröger, 2013; Knolle et al., 2012). However, given that our studies use a different methodology compared with the traditional self-stimulation paradigm (Schafer & Marcus, 1973), we do not expect to find any evidence of N1 suppression in this study. This hypothesis was based on the results of Han et al. (2021), which used a similar experimental methodology. Consequently, effects of contingency were expected to be mediated by the Tb and P2 components. We therefore hypothesized that Tb amplitudes would be smaller and P2 amplitudes to be larger when probabilities of sound occurrence were 100% relative to when probabilities were set at 50%.
METHODS
Experiment 1
Participants
Forty undergraduate students from the University of New South Wales (Sydney, Australia) participated in the study in exchange for course credit (n = 40). All participants gave written informed consent before the experiment. Two participants were removed from analysis because of insufficient number of artifact-free epochs (as described in the EEG Recording and Analysis section), leaving a final sample of n = 38 participants (mean age = 21 years, SD = 7.5 years, 16 women). Study sample size was based on prior research by Han et al. (2021), which used a similar experimental paradigm. The study was approved by the Human Research Ethics Advisory Panel (Psychology) at the University of New South Wales.
Stimuli, Materials, and Procedure
The audio stimulus was a sinusoid tone of frequency of 1000 Hz, 100-msec duration, with a 5-msec linear rise/fall time. Audio stimuli were sent to participants through Sennheiser HD 210 headphones. Audio input/output was controlled by a specially written MATLAB script and was delivered via an AudioFile Stimulus Processor (Cambridge Research Systems). Participants made responses by pressing the space bar of a low-latency keyboard (DuckyShine 4, 1000 Hz report rate). Visual stimuli were displayed on a high-performance 24-in. monitor (BenQ XL2420T).
During each trial, participants observed a visual animation, which was adapted from the paradigm employed by Whitford et al. (2017) and Jack et al. (2019) and identical to that employed by Han et al. (2021). The animation lasted for about 6 sec. A schematic of the animation is presented in Figure 1. The animation consisted of a central red fixation line that sat in the middle of a green horizontal bar, which is referred to as the ticker tape. Participants were instructed to keep their eyes fixated on the fixation line during the trial. There was also a blue decision line and a green trigger line located on the right side of the ticker tape. The trigger line was initially positioned on the far right-hand side of the ticker tape; the decision line sat to the left of the trigger line (Figure 1A).
A schematic of the experimental protocol. Participants were instructed to fixate their eyes on the central red fixation line (A). After a 1-sec delay, the blue decision line and the green trigger line moved slowly toward the central red fixation line at a rate of 6.5°/sec (visual angle; B–C). Participants were told that they had the option of pressing the space bar of the keyboard by the decision time (D). In the FC condition in Experiment 1, if the participant did not press the space bar before the decision line overlapped with the fixation line, this would cause the audio stimulus to be played at trigger time (E). If the participant pressed the space bar during this time frame, the audio stimulus would not play at the trigger time. In the FC condition in Experiment 2, this contingency was reversed, such that if participants did not press the space bar before the decision time, this would inhibit the audio stimulus from being played at the trigger time. Vice versa, if participants pressed the space bar during this time frame, this would cause the audio stimulus to be played at the trigger time. In the NC condition in both Experiments 1 and 2, the probabilities were set so that audio onset would only follow button presses 50% of the time. Vice versa, audio onset would also occur to inaction 50% of the time. Participants were not told that the probability in the NC condition was 50% but were told that sounds “may or may not be played.” The lines continued to move for another 1 sec, before the animation concluded and the trial concluded, and the trial was completed (F).
A schematic of the experimental protocol. Participants were instructed to fixate their eyes on the central red fixation line (A). After a 1-sec delay, the blue decision line and the green trigger line moved slowly toward the central red fixation line at a rate of 6.5°/sec (visual angle; B–C). Participants were told that they had the option of pressing the space bar of the keyboard by the decision time (D). In the FC condition in Experiment 1, if the participant did not press the space bar before the decision line overlapped with the fixation line, this would cause the audio stimulus to be played at trigger time (E). If the participant pressed the space bar during this time frame, the audio stimulus would not play at the trigger time. In the FC condition in Experiment 2, this contingency was reversed, such that if participants did not press the space bar before the decision time, this would inhibit the audio stimulus from being played at the trigger time. Vice versa, if participants pressed the space bar during this time frame, this would cause the audio stimulus to be played at the trigger time. In the NC condition in both Experiments 1 and 2, the probabilities were set so that audio onset would only follow button presses 50% of the time. Vice versa, audio onset would also occur to inaction 50% of the time. Participants were not told that the probability in the NC condition was 50% but were told that sounds “may or may not be played.” The lines continued to move for another 1 sec, before the animation concluded and the trial concluded, and the trial was completed (F).
Upon commencement of the trial, after a 1-sec delay, both the decision line and the trigger line started to move toward the fixation line at a constant rate of 6.5°/sec (visual angle). The decision line intersected the fixation line after approximately 3 sec. The trigger line intersected the fixation line after approximately 4 sec; at this point, the auditory stimulus was presented (depending on the trial, as described below). The lines continued to move for another 1 sec, before the trial was completed.
There were two conditions in the experiment: the FC condition and the NC condition. In the FC condition, participants had the option of pressing the space bar on the keyboard before the decision line intersected the fixation line (hereon referred to as the decision time). Participants were told that if they did not press the button by the decision time (Figure 1B)—that is, if they performed a willed inaction—this would cause the audio stimulus to be played at the exact moment that the trigger line intersected the fixation line (hereon referred to as the trigger time; Figure 1C). If they chose to press the space bar before the decision time, this prevented the audio stimulus from being played at the trigger time. The probabilities were set so that inaction would always cause the audio stimulus to be played at the trigger time; conversely, pressing the button would always cause no sound to be played at the trigger time. Participants were asked to press the space bar on approximately half the trials while trying not to conform to an obvious pattern of responses. At the start of every trial, participants were reminded (by means of instructions on the screen) as to what their options were and what the consequences would be.
In the NC condition, participants were given instructions that were nearly identical to the instructions in the FC condition. However, the probabilities were set so that audio onset would only follow button presses 50% of the time. Conversely, audio onset would also occur to inaction 50% of the time. Essentially, whether the participant did or did not hear a sound during any given trial was random. Participants were not told of the exact probabilities but were instead told that sounds “may or may not” occur as a result of their actions in the instructions.
The experiment consisted of five FC blocks and 10 NC blocks, totaling 15 blocks for the whole experiment. For ease of reference, we make the following distinction between three different trial types:
FC trials in which participant inactions resulted in a sound 100% of the time,
NC-action trials in which participant action resulted in a sound 50% of the time, and
NC-inaction trials in which participant inaction resulted in a sound 50% of the time. It is important to note that NC-action and NC-inaction trials occurred within the same block
There were twice the number of NC blocks compared with FC blocks in the experiment. This was done to achieve approximately equal number of usable epochs, given that the sound was only presented on approximately 50% of trials in the NC condition. We only analyzed trials in which the auditory stimulus was played. Each block contained 24 trials. The order of the blocks alternated between the FC and NC blocks so that there were two NC blocks for every one FC block. Within the grouping of three blocks (two NC blocks and one FC block), the order was counterbalanced between participants. The starting block was also counterbalanced between participants.
EEG Recording and Analysis
EEG was recorded with a BioSemi ActiveTwo system from 64 Ag/EgCl active electrodes (P1, FPz, FP2, AF7, AF3, AFz, AF4, AF8, F7, F5,F3, F1, Fz, F2, F4, F6, F8, FT7, FC5, FC3, FC1, FCz, FC2, FC4, FC6, FT8,T7, C5, C3, C1, Cz, C2, C4, C6, T8, TP7, CP5, CP3, CP1, CPz, CP2, CP4,CP6, TP8, P9, P7, P5, P3, P1, Pz, P2, P4, P6, P8, P10, PO7, PO3, POz,PO4, PO8, O1, Oz, O2, Iz). A vertical EOG was recorded by placing an electrode above and below the left eye; a horizontal EOG was recorded by placing an electrode on the outer canthus of each eye. Electrodes were also placed on each mastoid and the nose. During data acquisition, the reference was composed of sites CMS and DRL, and the sampling rate was 2048 Hz.
For data analyses, we re-referenced the EEG data off-line to the nose electrode, as is common in studies investigating the components of interest and necessary for extracting the Tb component (SanMiguel, Widmann, et al., 2013; Näätänen & Picton, 1987). Data were band-pass filtered from 0.1 to 30 Hz using a half-amplitude phase shift-free Butterworth filter and then notch-filtered (50 Hz) to remove mains artifact. The filtered data were segmented into 500-msec epochs, from −100 msec prestimulus to 400 msec poststimulus. Only trials in which the auditory stimulus were played were analyzed. Epochs were baseline-corrected to the mean voltage from −100 to 0 msec. We corrected the epochs for eye blinks and movement artifacts using the technique described by Gratton, Coles, and Donchin (1983) and Miller, Gration, and Yee (1988). We excluded all epochs with signals exceeding peak-to-peak amplitudes of 200 μV and had a maximal allowed voltage step of 50 μV/msec. We analyzed the amplitudes of the N1, Tb, and P2 components of the auditory-evoked potential, which were calculated as the average voltage within time windows (30 msec width), the centers of which were defined using the collapsed localizer approach (Luck & Gaspelin, 2017). The collapsed localizer approach is a technique whereby one first averages (or collapses) the ERP waveforms across all conditions for all participants. The components of interest (e.g., N1, Tb, P2) are identified on this collapsed waveform, and a time window is centered around these peaks, which is then used for the statistical analysis of the original (or uncollapsed) waveforms (Luck & Gaspelin, 2017).
For the N1, Tb, and P2 components, mean voltage in the analysis window was submitted to paired samples t tests. All paired samples t tests were analyzed two-tailed. For each component, there were two contrasts of interest: (1) FC versus NC-action and (2) FC versus NC-inaction. Electrodes of interest for the N1 component were Fz, FCz, and Cz, whereas the electrodes of interest for the P2 component were FCz, Cz, and CPz. For the N1 and P2 components, electrodes and analyses were chosen to be consistent with Han et al. (2021), Whitford et al. (2017), and Jack et al. (2019). Electrodes for the Tb component (T7 and T8) were based on recommendations by Tonnquist-Uhlen et al. (2003) and SanMiguel, Widmann, et al. (2013).
Experiment 1 Results
The summary results of Experiment 1 are illustrated in Figure 4. In the FC condition, on average, participants opted for inaction, which led to a sound 61.3 (SD = 4.9) times. After accounting for artifact exclusion, this led to an average of 58.5 (SD = 7.1) epochs used for analysis. In the NC condition, on average, there was a distribution of 60.8 sounds (SD = 6.9) because of inaction and 62.4 sounds (SD = 6.7) because of action. After accounting for artifact exclusion, this led to an average of 57.2 (SD = 7.4) epochs used for analysis in the NC-inaction condition and an average of 59.8 (SD = 6.9) epochs used for analysis in the NC-action condition.
N1.
Figure 2A shows the grand average N1 component elicited in the FC and NC conditions. The time window for the N1 analysis was 109.5–139.5 msec. The contrast comparing FC versus NC-inaction did not reach significance, t(37) = 1.769, p = .085, d = 0.29. Similarly, the contrast comparing FC versus NC-action also did not reach significance, t(37) = −.336, p = .739, d = 0.06. These results indicate that the N1 amplitude did not differ between the FC and NC conditions, regardless of whether the sounds in the NC condition were associated with an inaction or a button press.
Experiment 1: Waveforms showing ERPs elicited by the FC condition and the NC-action and NC-inaction conditions in addition to corresponding topographic mappings. White dots illustrate the electrodes used in the analysis. (A) The N1 component was measured at electrodes Fz, FCz, and Cz, with time window 109.5–139.5 msec. (B) The Tb component was measured at electrodes T7 and T8, with time window 122.2–152.2 msec. (C) The P2 component was measured at electrodes FCz, Cz, and CPz, with time window 224.7–254.7 msec. (D) Raincloud graph containing density plots and scatter plots of mean amplitudes for the N1, Tb, and P2 components for the FC and NC conditions.
Experiment 1: Waveforms showing ERPs elicited by the FC condition and the NC-action and NC-inaction conditions in addition to corresponding topographic mappings. White dots illustrate the electrodes used in the analysis. (A) The N1 component was measured at electrodes Fz, FCz, and Cz, with time window 109.5–139.5 msec. (B) The Tb component was measured at electrodes T7 and T8, with time window 122.2–152.2 msec. (C) The P2 component was measured at electrodes FCz, Cz, and CPz, with time window 224.7–254.7 msec. (D) Raincloud graph containing density plots and scatter plots of mean amplitudes for the N1, Tb, and P2 components for the FC and NC conditions.
Tb.
Figure 2B shows the Tb component analysis elicited in the FC and NC conditions. The time window for the Tb analysis was 122.2–152.2 msec. The contrast comparing FC versus NC-inaction reached significance, t(37) = 2.586, p = .014, d = 0.42, whereas the contrast comparing FC versus NC-action did not reach significance, t(37) = 1.162, p = .253, d = 0.19. The results indicate that the Tb amplitude of the FC condition was suppressed relative to the amplitude of the NC condition, but only in the case where sounds in the NC condition were elicited via inaction. However, Tb amplitude in the FC condition did not differ from Tb amplitude in the NC condition when sounds in the NC condition were elicited by button press.
P2.
Figure 2C shows the P2 component analysis elicited in the FC and NC conditions. The time window for the P2 analysis was 224.7–254.7 msec. The contrast comparing FC versus NC-inaction was significant, t(37) = −3.523, p = .001, d = 0.57, as was the contrast comparing FC versus NC-action, t(37) = −3.447, p = .001, d = 0.56. The results indicate that the P2 amplitude of the FC condition was strongly suppressed relative to the amplitude in the NC condition both when sounds in the NC condition were elicited by button press or when they were elicited via a willed inaction.
Experiment 1 Discussion
In Experiment 1, participants completed a variation of the traditional self-stimulation task in which the sound was not time-locked to a motor action. In the FC condition, participants were instructed that choosing not to perform an action by the decision time would cause a sound to be played at the subsequent trigger time. In the NC condition, participants were given the same choice, but the probability under which sounds occurred was only 50% following a participant choice. That is, in the NC condition, electing not to press a button (NC-inaction) led to a sound on 50% of trials, whereas electing to press the button (NC-action) also led to the sound being presented on 50% of trials. The results revealed that the amplitude of the N1 components did not differ between the FC and NC conditions. The amplitudes of the Tb condition did differ, with Tb suppressed in the FC condition relative to the NC-inaction condition, but not the NC-action condition. The P2 component in the FC condition was strongly suppressed relative to both the NC-inaction and NC-action conditions.
In Experiment 1, participants were instructed that the performance of a willed inaction would elicit a sound in the FC condition. They were also instructed that a willed inaction “may or may not” cause a sound to be played in the NC-inaction condition. The aim of Experiment 2 was to investigate whether the key results of Experiment 1 (i.e., suppression of the Tb and P2 components in the FC condition relative to the NC conditions) would be replicated if the instructions were reversed, that is, if participants were instructed that the performance of a willed action (button press) would elicit a sound in the FC condition. Furthermore, including this second experiment might have helped to elucidate whether there was any relation to the Tb suppression effect being selective to the NC-inaction condition and the fact that it was inaction that triggered sounds in the FC condition.
Experiment 2
Participants
Forty-nine undergraduate students from the University of New South Wales (Sydney, Australia) participated in the study in exchange for course credit. The participant samples of Experiments 1 and 2 were unique; none of the participants in Experiment 2 had previously participated in Experiment 1. All participants gave written informed consent before the experiment. Four participants were removed from analysis because of insufficient number of artifact-free epochs (as described in the EEG Recording and Analysis section), leaving a final sample of n = 45 participants (mean age = 19 years, SD = 1.8 years, 24 women). Study sample size was based on Experiment 1 and on prior research by Han et al. (2021), which used a similar experimental paradigm. The study was approved by the Human Research Ethics Advisory Panel (Psychology) at the University of New South Wales.
Stimuli, Materials, and Procedure
The stimuli and materials were identical to Experiment 1. The only difference between the experiments was the action–effect contingency in the FC condition. Specifically, in Experiment 1, the audio stimulus was played if the participant opted not to press the button before the decision time, and participants were informed of this fact. In Experiment 2, this contingency was reversed: The audio stimulus was played if and only if the participant elected to press the button before the decision time, and participants were informed of this fact. As in Experiment 1, the audio stimulus was played at the trigger time, which occurred 1 sec after the decision time. There was no change to the instructions in the NC condition.
EEG Recording and Analysis
The EEG recording and analysis were identical to Experiment 1.
Experiment 2 Results
In the FC condition, on average, participants opted for a button press, which led to a sound 62.4 (SD = 4.4) times. After accounting for artifact exclusion, this led to an average of 60.3 (SD = 5.4) epochs used for analysis. In the NC condition, on average, there was a distribution of 62.8 sounds (SD = 6.4) because of action and 57.3 sounds (SD = 6.0) because of inaction. After accounting for artifact exclusion, this led to an average of 60.3 (SD = 7.1) epochs used for analysis in the NC-action condition and an average of 54.8 (SD = 7.2) epochs used for analysis in the NC-inaction condition.
N1.
Figure 3A shows the N1 component analysis elicited in the FC and NC conditions. The time window for the N1 analysis was 85.6–115.6 msec. The contrast comparing FC versus NC-action did not reach significance, t(44) = −1.102, p = .276, d = 0.16. Similarly, the contrast comparing FC versus NC-inaction did not reach significance, t(44) = −1.766, p = .084, d = 0.26.
Experiment 2: Waveforms showing ERPs elicited by the FC condition and the NC-action and NC-inaction conditions and the corresponding topographic mappings. White dots illustrate the electrodes used in the analysis. (A) The N1 component was measured at electrodes Fz, FCz, and Cz, with time window 85.6–115.6 msec. (B) The Tb component was measured at electrodes T7 and T8, with time window 133.9–163.9 msec. (C) The P2 component was measured at electrodes FCz, Cz, and CPz, with time window 175.9–205.9 msec. (D) Raincloud graph containing density plots and scatter plots of mean amplitudes for the N1, Tb, and P2 components for the FC and NC conditions.
Experiment 2: Waveforms showing ERPs elicited by the FC condition and the NC-action and NC-inaction conditions and the corresponding topographic mappings. White dots illustrate the electrodes used in the analysis. (A) The N1 component was measured at electrodes Fz, FCz, and Cz, with time window 85.6–115.6 msec. (B) The Tb component was measured at electrodes T7 and T8, with time window 133.9–163.9 msec. (C) The P2 component was measured at electrodes FCz, Cz, and CPz, with time window 175.9–205.9 msec. (D) Raincloud graph containing density plots and scatter plots of mean amplitudes for the N1, Tb, and P2 components for the FC and NC conditions.
Tb.
Figure 3B shows the Tb component analysis elicited in the FC and NC conditions. The time window for the Tb analysis was 133.9–163.9 msec. The contrast comparing FC versus NC-action did not reach significance, t(44) = 0.801, p = .427, d = 0.12. However, the contrast comparing FC versus NC-inaction was statistically significant, t(44) = 2.126, p = .039, d = 0.32, with the NC-inaction condition showing a larger Tb amplitude than the FC condition.
P2.
Figure 3C shows the P2 component analysis elicited in the FC and NC conditions. The time window for the P2 analysis was 175.9–205.9 msec. The contrast comparing FC versus NC-action reached significance, t(44) = −2.208, p = .032, d = 0.33, with the NC-action condition showing a larger P2 amplitude than the FC condition. Similarly, the contrast comparing FC versus NC-inaction also reached significance, t(44) = −3.305, p = .002, d = 0.49, with the NC-inaction condition showing a larger P2 amplitude than the FC condition.
GENERAL DISCUSSION
The results of both experiments are summarized in Figure 4. In this study, participants were required to either performed a willed inaction (in Experiment 1) or a willed action (a button press, in Experiment 2) for a sound to be played a second or more later. Participants' electrophysiological response to the sound was then measured with EEG. The pattern of results was identical across the two experiments. N1 amplitudes did not differ significantly across the FC and NC conditions, regardless of whether the audio stimulus was triggered by an inaction (NC-inaction) or a button press (NC-action). The P2 component was attenuated in the FC condition relative to both the NC-inaction and NC-action conditions, in both experiments. The results of the P2 component analyses suggest that effect of contingency on electrophysiological indices of sensory attenuation may be indexed by the P2 component. The Tb component was attenuated in the FC condition relative to the NC-inaction condition in both experiments. The results of the Tb analyses suggest that the Tb component may index the increased sense of agency associated with the 100% action–effect contingency in the FC condition. Given that Tb amplitude has been shown to be modulated by the degree to which participants had agency over sound generation (Han et al., 2021), this may explain why the Tb amplitudes in the NC-action condition were driven closer toward the FC condition in the current experiment. Even though in actual fact, the probability of a tone resulting from a button press was only 50% (and therefore had no actual contingency, as inaction also resulted in a sound 50% of the time), the existence of the button press meant that a tenuous connection could be made between the action and effect (similarly to the phenomenon of illusory control; Harris & Osman, 2012; Dixon, 2000). However, it should be noted that it is uncommon for evaluative (or higher-order cognitive) processes—such as judgments of agency—to influence ERP amplitude until after 300 msec (Wascher & Beste, 2010). According to Synofzik, Vosgerau, and Newen's (2008) multifactorial account of agency, the feeling of agency is the sense of agency someone experiences when they perform a motor action that is followed by a sensory event. This is what the literature typically refers to when discussing agency within the context of internal forward models (Synofzik et al., 2008). The judgment of agency, on the other hand, requires an explicit cognitive judgment of one's agency and does not rely on sensorimotor indicators. As such, given that there was no explicit measure of agency in this study, it is not exactly clear whether participants were experiencing feelings or forming judgments of agency. In a previous study that used a similar ticker tape design (which included two experiments, the first of which used inaction to elicit sounds; Han et al., 2021), it was argued that participants formed judgments of agency rather than feelings of agency because, in the first experiment, it was inaction that led to a sound and, in the second experiment, sounds were not time-locked to motor actions (with a gap of at least 1 sec between action and effect). However, much like the current study, in Han et al. (2021), there was no direct measure of the sense of agency, leaving the possibility that the ERP modulations reflect differences in the feelings of agency rather than judgments of agency. There is evidence that inaction can lead to temporal binding (Weller, Schwarz, Kunde, & Pfister, 2020), a phenomenon that arguably reflects feelings of agency as it occurs on a perceptual level. We suggest that more research be done to investigate the link between inaction–effects and sense of agency.
Bar graphs of Experiments 1 and 2 illustrating mean amplitudes for the N1, Tb, and P2 components for the FC and NC conditions. Error bars show the SEM. Asterisks represent levels of significance (*p < .05, **p < .01).
Bar graphs of Experiments 1 and 2 illustrating mean amplitudes for the N1, Tb, and P2 components for the FC and NC conditions. Error bars show the SEM. Asterisks represent levels of significance (*p < .05, **p < .01).
In both experiments, we found no difference in N1 amplitude between the FC and NC conditions, whether triggered by inaction (NC-inaction) or a button press (NC-action). Considering the different probabilities under which sounds occurred, one might have expected differences in N1 amplitude between the FC and NC conditions. Under the predictive coding account of perception, neuronal responses to stimuli have been argued to reflect prediction errors (Friston, 2005). In accordance with this account, stimuli that are more predictable trigger smaller neuronal responses than unpredictable stimuli because they result in smaller prediction errors (SanMiguel, Widmann, et al., 2013; Timm, SanMiguel, Saupe, & Schröger, 2013; Bendixen, SanMiguel, & Schröger, 2012; Schafer, Amochaev, & Russell, 1981). For example, Roth, Ford, Lewis, and Kopell (1976) investigated the effects of probability on auditory processing by delivering auditory stimuli in which a sequence of regularly occurring 65-dB pip sounds were randomly interrupted by white noise bursts. Half of the white noise bursts were preceded by a warning tone (and were thus highly predicable), and half were not (and were thus not predictable). They found that passively presented unpredictable white noise bursts elicited larger N1 amplitudes than passively presented predictable noise bursts. Given the results of Roth et al. (1976), one might expect that the N1 amplitude generated in the NC conditions would be larger than in the FC condition, considering that the occurrence of the sounds was less predictable in the NC conditions. However, this discrepancy may be because the type of predictability manipulated between the current study and the one conducted by Roth et al. (1976) was different. It has been demonstrated that sensory inputs are not only predicted/mispredicted but can also be unpredicted (Hsu, Hämäläinen, & Waszak, 2018; Hsu, Le Bars, Hamalainen, & Waszak, 2015; Arnal & Giraud, 2012). Mispredicted stimuli refer to instances where incoming stimuli are predicted incorrectly; they are generally associated with larger prediction errors because prediction errors involve a combination of incorrect prediction in addition to sensory input that is not anticipated. On the other hand, unpredicted stimuli refer to situations where sensory input is simply not anticipated. Here, prediction errors tend to be smaller because it only involves one type of prediction error: sensory input that is not anticipated. In the current study, timing of auditory stimuli is always predictable whereas the chance of stimuli being played is 50%, meaning that stimuli are unpredicted rather than mispredicted. However, Roth et al.'s (1976) study involved stimuli that were mispredicted rather than unpredicted. This may explain why prediction errors in the current study and that of Roth et al. (1976) produced different effects on N1 amplitude.
Furthermore, although lower sound probability (and hence higher prediction error when a sound does occur) normally result in N1 amplitudes that are more negative when compared with sounds with higher probabilities, the long interstimilus intervals (ISIs) may have given participants time to prepare cognitive resources to accommodate for whether the sound would play. For example, in a study by Polich (1990) investigating the P3 component, participants listened to a stream of 1000 Hz tones with random presentations of a 2000-Hz target tone, which participants had to identify via a finger tap. The chance of a 2000-Hz target tone appearing was either 20% or 80%, depending on the condition. The results, as reported by Polich (1990), demonstrated enhanced P3 amplitudes for target sounds of 20% probability compared with target sounds of 80% probability during low ISI conditions (about 2–3 sec), but not during high ISI conditions (about 4–10 sec). The study by Polich (1990) can thus be adapted to investigate the relationship between ISIs and probability for the N1 component.
A central design feature of the current paradigm is that the action (i.e., either a button press or a willed inaction) was temporally dissociated from the outcome (i.e., the sound). This design feature allowed us to explore the electrophysiological index of sense of agency while minimizing the potential confounds of motor-evoked activity. Previous studies have explored the impact of willed inactions on sense of agency using the measure of temporal binding, which is a phenomenon closely related to sensory attenuation, and found a temporal binding effect for inactions, providing further evidence that willed inactions can result in a sense of agency (Weller et al., 2020). However, the aforementioned design features of the current paradigm are—we suggest—likely also partly responsible for our failure to identify any modulation of the N1 component in the experiments. Given this, we suggest that it would be worthwhile studying the role of contingency in a more traditional self-stimulation paradigm in which the action is time-locked to the outcome, and for which we would expect to observe N1 suppression in the active condition relative to the passive condition, based on the existing literature. In such a hypothetical experiment, the active condition would be the traditional motor–auditory condition of the self-stimulation paradigm (Hughes et al., 2013), where participants perform a motor action (such as a button press) to elicit a sound that is time-locked to the motor action. The passive condition would be the traditional auditory condition of the self-stimulation paradigm, where participants passively listen to an identical sound. It is worth noting that this study did not include a passive condition—willed inaction triggering a sound does not count as a passive condition as normally defined.
Across both experiments, the P2 component was suppressed in the FC condition relative to both NC conditions (i.e., NC-inaction and NC-action). The functional significance of the P2 component is less clear than that of the N1 component (Crowley & Colrain, 2004). For example, although the P2 component has shown results like that of the N1 component in previous studies of sensory attenuation (i.e., suppression of the active condition relative to the passive condition; Horváth & Burgyán, 2013; Knolle et al., 2012), it has also demonstrated opposite results (Pinheiro et al., 2019), although only when there was a delay between the button press and the sound in their active condition. These inconsistent results may be attributed to the fact that factors underlying N1 and P2 suppression are likely caused by different factors (van Elk et al., 2014; Knolle et al., 2012), even though they have sometimes been seen as part of a single “N1–P2 complex” (Jack et al., 2021; Timm, SanMiguel, Keil, Schröger, & Schönwiesner, 2014; Crowley & Colrain, 2004).
In the current set of experiments, we observed P2 suppression, a result in line with some other studies that have investigated the P2 component in sensory attenuation (Horváth & Burgyán, 2013; Knolle et al., 2012). Roth et al. (1976) demonstrated that sounds with a higher probability elicited N1 components with smaller amplitudes but P2 components with larger amplitudes. The results in this study contradict this; sounds that were presented in the FC condition (and hence had a 100% probability of occurring after a button press) instead showed smaller P2 amplitudes compared with lower-probability sounds in the NC condition. However, as mentioned before, the types of predictability involved between the current study and that of Roth et al. (1976) were different. Therefore, we suggest that more research is needed to elucidate the nature of the relationship between stimulus probability and P2 amplitude.
One limitation of the current series of experiments was that—because of lack of time in the experimental session—we were not able to include a passive condition (i.e., a condition where participants passively listen to auditory stimuli identical to those in the FC and NC conditions), as is common in the traditional self-stimulation paradigm. As a result of this, we were unable to determine the effect of manipulating action–outcome contingency within the phenomenon of sensory attenuation. In the future, it might be worthwhile conducting similar experiments with a between-subjects design, with one group completing an FC condition contrasting with a passive condition and another group completing the NC conditions contrasting with a passive condition. Furthermore, participants were not asked to rate their sense of agency in the current study. Therefore, the interpretation of the Tb component for the NC-action condition being suggestive of a residual sense of agency could be further investigated in future studies by asking participants to judge the degree to which they felt a sense of agency in the three different conditions.
It is also worth noting that although the present studies investigated action–outcome probability, the experiments only used probability levels of 100% and 50%. There are several issues related to this design that are worth noting. First, this design does not assess probability on multiple levels (e.g., 100%, 75%, 50%, and 25%), which may help to address whether there is a direct correlation between probability levels and auditory suppression or component amplitude. Second, by setting the uncertain probability at 50%, it does not allow for priors toward one outcome to be formed, as the likely occurrence of sounds did not lean toward tends to appear versus tends not to appear. Without such a prior, prediction errors are less likely to be formed, meaning that the study was only assessing unpredictions rather than mispredictions (Hsu et al., 2015, 2018; Arnal & Giraud, 2012). Studies that have investigated the effect of probability of auditory processing have generally used probabilities such as 10% versus 90% or 20% versus 80% (Pereira et al., 2014; Pincze, Lakatos, Rajkai, Ulbert, & Karmos, 2002; Polich, 1990). Using the example of 10% versus 90%, when the probability of a sound occurring is 10%, then participants generally expect the sound not to occur; when the probability is 90%, the general expectation is that the sound will mostly occur. Future studies should therefore include more contingency conditions that will examine a broader range of probabilities as well as investigate the relationship between mispredictions versus unpredictions in sensory attenuation (this was not done in the present set of studies because of time constraints that would occur in a within-subject design). Different aspects of contingency could also be manipulated. One could manipulate not only how reliably a button press results in a sound but also how reliably sounds are associated with button presses. For example, a button press might result in a sound 100% of the time, but inaction could also result in a sound 50% of the time. This would lead to a 75% contingency rate with strong expectations of sound occurrence, but with the possibility of reducing a participant's sense of agency.
In conclusion, the results of the study suggest that differences in action–effect contingency may be indexed by the P2 and Tb components rather than the N1 component. The results are consistent with the idea that the Tb component may index judgments of agency, with participants in the FC condition experiencing full agency over the sounds as opposed to partial agency in the NC conditions. The results of the N1 analyses provide further support for the idea that N1 amplitude is not modulated by actions (or willed inactions) that are not time-locked to sounds.
Reprint requests should be sent to Nathan Han, School of Psychology, University of New South Wales, Sydney, NSW 2052, Australia, or via e-mail: [email protected].
Author Contributions
Nathan Han: Conceptualization; Data curation; Formal analysis; Investigation; Methodology; Software; Visualization; Writing—original draft; Writing—Review & editing. Bradley N. Jack: Conceptualization; Formal analysis; Methodology; Software; Writing—Review & editing. Gethin Hughes: Writing—Review & editing. Thomas J. Whitford: Conceptualization; Funding acquisition; Methodology; Project administration; Resources; Supervision; Writing—Review & editing.
Funding Information
Thomas J. Whitford: Ideas grant from the National Health and Medical Research Council (NHMRC) of Australia, grant number: APP2004067 Discovery Projects from the Australian Research Council, grant numbers: DP200103288 and DP170103094: Nathan Han: Australian Government Research Training Program Scholarship. Bradley Jack: ARC DECRA, grant number: DE220100739.
Data Availability Statement
All raw EEG data and BrainVision Analyzer history templates are available on the Open Science Framework at https://osf.io/p5f8n/. All enquiries about data processing and analysis procedures can be made to the corresponding author.
Diversity in Citation Practices
Retrospective analysis of the citations in every article published in this journal from 2010 to 2021 reveals a persistent pattern of gender imbalance: Although the proportions of authorship teams (categorized by estimated gender identification of first author/last author) publishing in the Journal of Cognitive Neuroscience (JoCN) during this period were M(an)/M = .407, W(oman)/M = .32, M/W = .115, and W/W = .159, the comparable proportions for the articles that these authorship teams cited were M/M = .549, W/M = .257, M/W = .109, and W/W = .085 (Postle and Fulvio, JoCN, 34:1, pp. 1–3). Consequently, JoCN encourages all authors to consider gender balance explicitly when selecting which articles to cite and gives them the opportunity to report their article's gender citation balance. The authors of this article report its proportions of citations by gender category to be as follows: M/M = .42; W/M = −.42; M/W = .08; W/W = −.08.