The neural underpinnings of perceptual awareness have been extensively studied using unisensory (e.g., visual alone) stimuli. However, perception is generally multisensory, and it is unclear whether the neural architecture uncovered in these studies directly translates to the multisensory domain. Here, we use EEG to examine brain responses associated with the processing of visual, auditory, and audiovisual stimuli presented near threshold levels of detectability, with the aim of deciphering similarities and differences in the neural signals indexing the transition into perceptual awareness across vision, audition, and combined visual–auditory (multisensory) processing. More specifically, we examine (1) the presence of late evoked potentials (∼>300 msec), (2) the across-trial reproducibility, and (3) the evoked complexity associated with perceived versus nonperceived stimuli. Results reveal that, although perceived stimuli are associated with the presence of late evoked potentials across each of the examined sensory modalities, between-trial variability and EEG complexity differed for unisensory versus multisensory conditions. Whereas across-trial variability and complexity differed for perceived versus nonperceived stimuli in the visual and auditory conditions, this was not the case for the multisensory condition. Taken together, these results suggest that there are fundamental differences in the neural correlates of perceptual awareness for unisensory versus multisensory stimuli. Specifically, the work argues that the presence of late evoked potentials, as opposed to neural reproducibility or complexity, most closely tracks perceptual awareness regardless of the nature of the sensory stimulus. In addition, the current findings suggest a greater similarity between the neural correlates of perceptual awareness of unisensory (visual and auditory) stimuli when compared with multisensory stimuli.
During waking hours, signals are continually impinging upon our different sensory organs (e.g., eyes, ears, skin), conveying information about the objects present and the events occurring within our environment. This flood of information challenges the limited processing capabilities of our central nervous system (James, 1890). As a consequence, much work within cognitive psychology and neuroscience has sought to understand how the human brain tackles this challenge by effectively filtering, segregating, and integrating the various pieces of sensory information to generate a coherent perceptual Gestalt (Murray & Wallace, 2012; Treisman & Gelade, 1980; Broadbent, 1958).
The bulk of the evidence to date with regard to the intersection between the bottleneck of information processing and perceptual awareness has been derived from studies focused on the visual system (Dehaene, Lau, & Kouider, 2017; Koch, 2004; Zeki, 2003). In fact, all major neurobiological theories regarding perceptual awareness, all emphasizing the importance of engaging widely distributed brain networks (Tallon-Baudry, 2012; van Gaal & Lamme, 2012; Naghavi & Nyberg, 2005), have been derived from observations within the visual neurosciences (Faivre, Arzi, Lunghi, & Salomon, 2017; Sanchez, Frey, Fuscà, & Weisz, 2017). In parallel, the neural markers associated with perceptual awareness have been derived from observations probing the visual system. Early fMRI (Dehaene et al., 2001), EEG (Del Cul, Baillet, & Dehaene, 2007; Sergent, Baillet, & Dehaene, 2005), and electrocorticographical (Gaillard et al., 2009) studies suggested that perceptual awareness was associated with the broadcasting of neural signals beyond primary (visual) cortex (Lamme, 2006) and more specifically engaging frontoparietal regions (Dehaene, Changeux, Naccache, Sackur, & Sergent, 2006). Arguably, the most consistent signature associated with this generalized neural recruitment is the P3b ERP. Namely, although early EEG components are similar regardless of whether stimuli enter perceptual awareness, stimuli that are perceived (vs. nonperceived) additionally yield components at later latencies. Subsequent studies converged on the observation that perceived stimuli broadcasted or triggered activity beyond that disseminated by nonperceived stimuli but emphasized that the neural ignition associated with awareness resulted in neural patterns that were both more reproducible (Schurger, Pereira, Treisman, & Cohen, 2010) and stable (Schurger, Sarigiannidis, & Dehaene, 2015) than patterns seen for nonperceived stimuli. In the latest itineration of the argument emphasizing the recruitment of global neural networks, researchers have highlighted the pivotal role of neural networks that are both integrated and differentiated (Cavanna, Vilas, Palmucci, & Tagliazucchi, 2017; Koch, Massimini, Boly, & Tononi, 2016a, 2016b; Tononi, Boly, Massimini, & Koch, 2016). Within this latter framework, the complexity of both resting state and evoked neural responses has emerged as a marker for perceptual awareness (Schartner, Carhart-Harris, Barrett, Seth, & Muthukumaraswamy, 2017; Andrillon, Poulsen, Hansen, Léger, & Kouider, 2016; Sarasso et al., 2015; Schartner et al., 2015; Casali et al., 2013).
It has been assumed that these theories of and neural markers for perceptual awareness gleaned from the visual system apply across sensory domains, an assumption that indeed comes with some supporting evidence. For example, there is a late sustained neural activity in perceived as opposed to nonperceived auditory stimulation conditions (Sadaghiani, Hesselmann, & Kleinschmidt, 2009). However, there are also important differences across sensory modalities, such as the association of auditory awareness with neural activity in fronto-temporal, as opposed to frontoparietal, networks (Joos, Gilles, Van de Heyning, De Ridder, & Vanneste, 2014). In an important recent contribution, Sanchez and colleagues (2017) demonstrated that, by applying machine learning techniques, it is possible to decode perceptual states (i.e., perceived vs. nonperceived) across the different sensory modalities (i.e., vision, audition, somatosensory). Although it is interesting that decoding of perceptual states across modalities is feasible, this observation does not tell us whether (and how) the brain performs this task. Last, Sanchez and colleagues (2017) have probed perceptual states across unisensory modalities, but to the best of our knowledge, no study has characterized differences between perceived and nonperceived stimuli across both unisensory and multisensory modalities. This knowledge gap is important, as in recent years, keen interest has emerged concerning the role played by multisensory integration in the construction of perceptual awareness (Faivre et al., 2017; O'Callaghan, 2017; Deroy, Chen, & Spence, 2014; Spence & Deroy, 2013). Indeed, as discussed above, theoretical models posit an inherent relationship between the integration of sensory information and perceptual awareness. For example, mathematical and neurocognitive formulations, such as integrated information theory (Tononi, 2012), global neuronal workspace theory (Dehaene & Changeux, 2011), and recurrent/reentrant networks (Lamme, 2006), postulate—explicitly or implicitly—that the integration of sensory information is a prerequisite for perceptual awareness. For example, integrated information theory posits that a particular spatio-temporal configuration of neural activity culminates in subjective experience when the amount of integrated information is high. In many of these views, subjective experience (i.e., perceptual awareness) relates to the degree to which information generated by a system as a whole exceeds that independently generated by its parts.
Motivated by this theoretical perspective emphasizing information integration in perceptual awareness and noting that our perceptual Gestalt is built upon a multisensory foundation, we argue that multisensory neuroscience is uniquely positioned to inform our understanding of perceptual awareness (Salomon et al., 2017; Blanke, Slater, & Serino, 2015; Noel, Wallace, & Blake, 2015; Faivre, Mudrik, Schwartz, & Koch, 2014; Mudrik, Faivre, & Koch, 2014; in addition, see Deroy et al., 2014, for a provocative argument implying that unisensory-derived theories of perceptual awareness cannot be applied to multisensory experiences). Consequently, in the current work, we aim to characterize electrophysiological indices of perceptual awareness across both unisensory (visual alone, auditory alone) and multisensory (combined visual–auditory) modalities. More specifically, we aim to establish whether previously reported neural markers of visual awareness generalize across sensory modalities (from vision to audition) onto the promotion of multisensory experiences. In the current study, we examine EEG responses to auditory, visual, and combined audiovisual stimuli presented close to the bounds of perceptual awareness. Analyses are centered around previously reported indices of visual awareness—the presence of late components in evoked potentials during perceived but not nonperceived trials (e.g., Dehaene et al., 2017; Dehaene & Changeux, 2011) as well as changes in neural reproducibility (Schurger et al., 2010) and complexity (Koch et al., 2016a, 2016b; Tononi et al., 2016).
Twenty-one (mean age = 22.5 ± 1.9 years, median = 21.2 years, range = 19–25 years; nine women) right-handed graduate and undergraduate students from Vanderbilt University took part in this experiment. All participants reported normal hearing and had normal or corrected-to-normal eyesight. All participants gave written informed consent to take part in this study, the protocols for which were approved by Vanderbilt University Medical Center's Institutional Review Board. EEG data from two participants were not analyzed as we were unsuccessful in driving their target detection performance within a predefined range (see below; see Figure 1, dotted lines), and thus data from 19 participants formed the actual analyses presented here.
Materials and Apparatus
Visual and auditory target stimuli were controlled via a microcontroller (SparkFun Electronics, Redboard, Boulder, CO) under the control of purpose-written MATLAB (The MathWorks, Natick, MA) and Arduino scripts. The microcontroller drove the onset of a green LED (3-mm diameter, 596- to 572-nm wavelength, 150 mcd) and a Piezo Buzzer (12-mm diameter, 9.7 mm tall, 60 dB (SPL), 4 kHz, 3-V rectangular wave). Target stimuli were 10 msec in duration (square wave, onset and offset < 1 msec, as measured via oscilloscope). The LED was mounted on the Piezo Buzzer, thus forming a single audiovisual object that was placed at the center of a 24-in. computer monitor (Asus VG248QE, LED-backlit, 1920 × 1080 resolution, 60-Hz refresh rate). In addition to the targets, to adjust participants' detection rates, we online adjusted the luminance and amplitude of background visual and auditory white noise with the Psychophysics toolbox (Brainard, 1997; Pelli, 1997). The luminance (achromatic and uniform) of the screen upon which the audio and visual targets were mounted was adjusted between 0 and 350 cd/m2 in steps of 4 RGB units (RGB range = 0–255 units, initial = [140, 140, 140] RGB), and auditory noise comprised variable-intensity white noise broadcast from two speakers placed symmetrically to the right and left sides of the monitor (Atlas Sound EV8D 3.2 Stereo). The white noise track initialized at 49 dB and adjusted in 0.4-dB increments (44.1-kHz sampling rate). Visual and auditory noise were adjusted by a single increment every 7–13 trials (uniform distribution) to maintain unisensory detection performance between 30% and 45%. This low unisensory detection rate was chosen to ensure satisfactory bifurcation between “perceived” and “nonperceived” trials in both unisensory and multisensory trials (Murray & Wallace, 2012).
Procedure and Experimental Design
Participants were fitted with a 128-electrode EGI Netstation EEG and seated 60 cm away from the stimulus and noise generators. Participants completed 12–14 blocks containing 200 repetitions of target detection, in which no-stimulus (catch trials), auditory-only, visual-only, and audiovisual trials were distributed equally and interleaved pseudorandomly. We employed a subjective measure of awareness (similar to a yes/no detection judgment; Merkile, Smilek, & Eastwood, 2001; see Figure 1) in conjunction with an extensive set of EEG analyses (electrical neuroimaging framework; Brunet, Murray, & Michel, 2011; see below). Thus, albeit perceptual awareness may arguably occur without the capacity for explicit report (see Eriksen, 1960), here, we operationalize perceptual awareness as the detection and report of sensory stimuli (see below for signal detection analyses suggesting that criterion for detection was unchanged across experimental conditions and detection reports likely reflected perceptual awareness). Participants were asked to respond, via manual response (button press), as quickly as possible when they detected a stimulus. Interstimulus interval was composed of a fixed duration of 800 msec, plus a uniformly distributed random duration between 0 and 2000 msec. The total duration of the experiment was approximately 3 hr 30 min, with rest periods in between blocks of approximately 5 min.
EEG Data Acquisition and Rationale
We contrasted participants' EEG responses for perceived (i.e., detected) versus nonperceived (i.e., nondetected) unisensory (i.e., either visual or auditory) and multisensory (i.e., conjoint visual and auditory) stimuli to determine whether indices of visual awareness generalize across sensory domains. High-density continuous EEG was recorded from 128 electrodes with a sampling rate of 1000 Hz (Net Amps 200 amplifier, Hydrocel GSN 128 EEG cap; EGI Systems, Inc.) and referenced to the vertex. Electrode impedances were maintained below 50 kΩ throughout the recording procedure and were reassessed at the end of every other block. Data were acquired with Netstation 5.1.2 running on a Macintosh computer and online high-pass filtered at 0.1 Hz.
Data were compiled for detection as a function of the sensory modality stimulated, where “detection” refers to a manual response immediately after presentation of a stimulus or a pair of stimuli. Two participants generated false alarm rates (reports of stimulus detection on catch trials when no stimulus was presented) that exceeded 2.5 SDs of the population average (false alarm rates = ∼20% compared with 8.2%; see Figure 1), leading to exclusion of their data from further analysis. Data were analyzed for RTs and in light of signal detection theory (Macmillan & Creelman, 2005; Tanner & Swets, 1954). To quantify sensitivity and response bias to the detection of near-threshold sensory stimuli across different sensory modalities, reports of detection during the presence of auditory, visual, or audiovisual stimuli were considered as hits. Analogously, reports of the presence of sensory stimulation during a catch trial were taken to index false alarms. Noise and signal distributions were assumed to have an equal variance, and sensitivity (i.e., d′) and response criteria (i.e., c) were calculated according to equations in Macmillan and Creelman (2005). Note that the assumption of equal variance does not affect quantification of the response criteria and simply scales sensitivity. Regarding RTs, data were trimmed for trials in which participants responded to stimuli within 100 msec of stimulus (0.9% total data trimmed) and were then aggregated.
As illustrated in Figure 1A, after 200 trials of each sensory condition (four blocks), relatively few adjustments of auditory and visual noise were needed to maintain participants within the predefined range of 30–45% unisensory detection performance (see also Control Analyses in Supplementary Materials online). That is, 65.42% for all audio noise adjustments were undertaken during the first 200 trials (thus, 34.58% were undertaken during the last 500 experimental trials), and 60.75% of visual noise adjustments happened during that same period (leaving 39.25% of visual noise changes occurring during the 500-trial experimental phase). Thus, EEG analysis (below) was restricted to the last 400–500 trials per sensory condition to reduce variability in the stimulus statistics. Data from these trials were exported to EEGLAB (Delorme & Makeig, 2004), and epochs were sorted according to sensory condition (i.e., A, V, AV, or none) and detection (perceived vs. nonperceived). Epochs from −100 to 500 msec after target onset were high-pass filtered (zero phase, eighth-order Butterworth filter) at 0.1 Hz, low-pass filtered at 40 Hz, and notch filtered at 60 Hz. EEG epochs containing skeletomuscular movement, eye blinks, or other noise transients and artifacts were removed by visual inspection. After epoch rejection, every condition (4 [sensory modalities: none, audio, visual, and audiovisual] × 2 [perceptual report: perceived and nonperceived]) was composed of an average of 179.16 ± 39 trials (average epoch rejection = 23.5%), with the exception of the catch perceived condition, which had 23.2 ± 3.9 trials, and catch nonperceived condition, which had 307.45 ± 31.5 trials. Excluding catch trials, there was no effect of sensory modality, perceptual report, or interaction between these with regard to total amount of trials (all ps > .19). Channels with poor signal quality (e.g., broken or excessively noisy electrodes) were then removed (6.2 electrodes on average, 4.8%). Data were rereferenced to the average and baseline corrected to the prestimulus period. Excluded channels were reconstructed using spherical spline interpolation (Perrin, Pernier, Bertrand, Giard, & Echalier, 1987). To account for the inherent multiple comparisons problem in EEG, we set alpha at <.01 for at least 10 consecutive time points (Guthrie & Buchwald, 1991), and most statistical reporting in the results states significant periods as “all ps < .01.”
Global Field Power
The global electric field strength was quantified using global field power (GFP; Lehmann & Skrandies, 1980). This measure is equivalent to the standard deviation of the trial-averaged voltage values across the entire electrode montage at a given time point and represents a reference- and topographic-independent measure of evoked potential magnitude. This measure is used here to index the presence (or absence) of late evoked potentials during perceived versus nonperceived visual, auditory, and audiovisual trials. On a first pass, we calculated average GFPs for each participant as well as for the sample as a whole (i.e., grand average) and for every condition. Then, the topographic consistency test (TCT; Koenig & Melie-Garcia, 2010) was applied across the entire epoch (−100 to 500 msec poststimulus) for each condition to determine whether there was statistical evidence for a consistent evoked potential. Subsequently, the TCT was applied at each time point for those conditions demonstrating a significant evoked potential to ascertain period during which evoked potentials were reliably evoked. For these analyses, alpha was a priori set to .05, false discovery rate (FDR) corrected (Genovese, Lazar, & Nichols, 2002)—the default alpha assumed by the test (Koenig & Melie-Garcia, 2010). After demonstrating the presence of evoked potentials relative to baseline (see above), we conducted a 3 (Sensory modality: audio, visual, audiovisual) × 2 (Perceptual state: perceived vs. nonperceived) repeated-measures ANOVA at each time point (−100 prestimuli onset to 500 msec poststimuli onset). Separate t tests across states of perception (perceived vs. nonperceived) for the different modalities (audio, visual, and audiovisual) were equally conducted. Last, to ascertain true multisensory interactions, we contrasted the GFP evoked by the audiovisual condition to the sum of the unisensory responses (e.g., Cappe, Thelen, Romei, Thut, & Murray, 2012). As a control, we equally index the GFP evoked by detected (i.e., false alarms) and nondetected (i.e., correct rejections) catch trials to ascertain whether either the noise features utilized to mask targets or the simple fact of reporting detection was sufficient to engender a GFP differentiation between conditions. The GFP analysis was solely conducted on participants with at least 20 false alarm trials (13 of 20 participants). For this analysis, a random subset of correct rejection trials was pulled for each individual to match the number of false alarm and correct rejection trials at an individual participant level. Complementing the GFP analyses, the topography exhibited by the different conditions was likewise examined. However, these are presented in the supplementary materials (see Figure S2) and not in the main text, as no strong theoretical prediction exists regarding a neural correlate of consciousness across unisensory and multisensory domains in topography (although see Britz, Díaz Hernàndez, Ro, & Michel, 2014).
Intertrial Variability Analyses
To probe the reproducibility of evoked potentials during different perceptual states and as elicited by stimuli of different modalities in a relatively simple manner, principal component analysis (PCA) was performed within each participant. More specifically, PCA identified the number of orthogonal dimensions, expressed as a proportion of the total possible (e.g., number of trials analyzed), needed to express a certain amount of the trial-to-trial variability (90% in the present case) for each channel. In a deterministic system with highly stereotyped responses, only a few dimensions are needed to capture most of the variability. To the extent that trial-to-trial recordings differ from one another, total variability increases, and PCA dimensionality increases. In the present case, each participant's data were divided into channel- and experimental-condition-specific matrices of single-trial data, with trials as rows and time points as columns. The dimensionality of each matrix was determined as a minimum number of principal components capturing 90% of the variance across trials. This number was further expressed as a percentage of the total number of dimensions and was taken as a measure of trial-to-trial variability for a given channel. For the audio, visual, and audiovisual conditions (for both perceived and nonperceived trials), the 120 trials whose mean most faithfully represented the average GFPs (determined via minimization of absolute value residuals), and thus the average response, were analyzed to maintain the number of potential dimensions equal across conditions. For the catch trials, all false-alarm catch trials were taken, and an equal number of correct-rejection catch trials were randomly selected on a participant-by-participant basis. This PCA was performed on a 101-msec wide sliding window (first originating at −100 msec and terminating at 0 msec poststimuli onset, 1-msec step size) to determine the temporal time course of the trial-to-trial variability (note that this time course analysis is thus smoothed). Results (Figure 5) are reported as the percentage of extra dimensions needed for each sensory modality to explain trial-to-trial variance in the perceived versus nonperceived conditions. As for the GFP analyses, catch trials were separately analyzed as a control procedure. A random subset of correct-rejection catch trials was sampled for each participant to match the number of correct rejections and false alarm trials. This last analysis was solely undertaken for participants with at least 20 false alarm trials (13 of 20 participants).
Last, Lempel-Ziv (LZ) complexity was quantified for each condition as a measure of complexity indirectly related to functional differentiation/integration (Sanchez-Vives, Massimini, & Mattia, 2017; Koch et al., 2016a, 2016b; Tononi et al., 2016; Casali et al., 2013). LZ is the most popular of the Kolmogorov classes (routinely used to generate TIFF images and ZIP files) and measures the approximate amount of nonredundant information contained within a string by estimating the minimal size of the “vocabulary” necessary to describe the entirety of the information contained within the string in a lossless manner. LZ can be used to quantify distinct patterns in symbolic sequences, especially binary signals. Before applying the LZ algorithm, as implemented in calc_lz_complexity.m, we first down-sampled our signal from 1000 to 500 Hz and converted it to a binary sequence. For every participant and every trial separately, we first full-wave rectified the signal and then assigned a value of “1” to a time point if the response was 2 SDs above the mean baseline value for that particular trial (−100 to 0 msec poststimuli onset). If the response was not 2 SDs above the mean baseline, a value of “0” was assigned. Next, binary strings were constructed for each trial by column-wise concatenating the values at each of the 128 electrodes (Casali et al., 2013) for the entire period poststimuli. Finally, the LZ complexity algorithm determined the size of the dictionary needed to account for the pattern of binary strings observed. The same procedure was repeated after shuffling the binary data after column-wise concatenation. This procedure was undertaken to calculate surrogate data with a priori maximal complexity given the entropy in the original data set. Finally, LZ was normalized by expressing it as the fraction of nonshuffled complexity divided by the shuffled version of the measure (see Andrillon et al., 2016, for a similar approach).
As expected from classical multisensory paradigms, a one-way repeated-measures ANOVA with four sensory conditions (none, audio, visual, and audiovisual) demonstrated a significant effect on RTs (F(3, 60) = 103.193, p < .0001). As illustrated in Figure 2A, this effect may have been driven by false alarms during catch trials, which were very slow (mean catch trials = 0.975 ± 0.10 sec [mean ± 1 SEM]) because neither stimulus was presented. In fact, the mean RT for catch trials (0.975 sec) was no different from the statistically expected value drawn from a fixed duration of 800 msec, plus a random duration between 0 and 2 sec described by a uniform distribution (one-sample t test to 0.9, p = .09). That is, on average, participants false alarmed halfway through the interstimulus interval. Thus, a one-way repeated-measures ANOVA with three conditions (audio, visual, and audiovisual) was performed and demonstrated a significant effect of Sensory modality (F(1, 20) = 720.19, p < .001, η2 = 0.97). The main effect was driven by the multisensory condition being fastest (M = 0.555 ± 0.09 sec), followed by the auditory (M = 0.588 ± 0.10 sec) and then the visual (M = 0.633 ± 0.11 sec) conditions (all comparisons are paired-samples t test with p < .046, Bonferroni corrected). Detection of audiovisual stimuli was faster than detection of the fastest unisensory stimulus defined on a participant-by-participant basis (audiovisual vs. fastest unisensory, p = .012; see Figure 2A and Methods for details).
On average, participants responded “yes” on 8.2% (mean) ± 1.1% (SEM) of the catch trials (i.e., false alarms), 45.1 ± 3.9% of the audio trials (d′ = 1.21, c = 0.92), 41.7 ± 4.6% of the visual trials (d′ = 1.27, c = 0.89), and 64.8 ± 4.7% (d′ = 1.83, c = 0.88) of the audiovisual trials (Figure 2B). Thus, we were successful in driving participants' performance to a detection rate that allowed the bifurcation of data with regard to perceptual report—perceived versus nonperceived. Note that, as illustrated in Figure 1, false alarm rates remained constant throughout the experiment, suggesting little fatigue or learning effects. A one-way ANOVA and subsequent paired-samples t tests on sensitivity (i.e., d′) values extracted from signal detection analyses (Macmillan & Creelman, 2005; Tanner & Swets, 1954) suggested that participants were most sensitive to the multisensory presentations (F(2, 40) = 19.84, p < .001; paired-samples t tests on audiovisual d′ vs. most detected unisensory d′, p = .007). Last, response criterion (i.e., c) was unchanged across the different sensory conditions (F(2, 40) = 0.05, p = .94; see Figure 2B). Thus, the behavioral data from this task illustrate multisensory facilitation in the form of the frequency, sensitivity, and speed of stimulus detection, while showing no change in response criterion. This last observation is particularly important as it suggests that participants' overt reports of stimulus detection reflect perceptual awareness as opposed to a change in what they consider “reportable.”
TCT (Koenig & Melie-Garcia, 2010) over the entire poststimuli interval demonstrated a reliable evoked potential when participants were presented with auditory, visual, or audiovisual stimuli, both when participants reported perceiving or not perceiving the stimuli (all ps < .01, FDR corrected). In contrast, no consistent evoked potential was apparent during catch trials, regardless of whether participants reported a stimulus or not (all ps > .08, FDR corrected).
For auditory stimuli, examination of the temporal time course of evoked potentials revealed deviations from baseline between 64 and 112 msec poststimulus and then from 134 msec poststimulus throughout the rest of the epoch for trials in which the stimulus was perceived and for the interval between 72 and 448 msec poststimulus and then again from 461 msec poststimulus throughout the rest of the epoch when the stimuli were not perceived. For visual stimuli, deviations from baseline were seen between 76 and 90 msec poststimulus and then from 138 msec poststimulus throughout the rest of the epoch when the stimuli were perceived and between 90 and 354 msec poststimulus and then from 387 msec poststimulus throughout the rest of the epoch when the stimuli were not perceived. Finally, for the audiovisual condition, evoked potentials were consistently seen beginning at 45 msec poststimulus and throughout the rest of the epoch when the stimuli were perceived and beginning at 92 msec poststimulus and throughout the rest of the epoch when the stimuli were not perceived.
Contrasts of the GFPs between conditions demonstrated a significant difference between perceived versus nonperceived stimuli for each of the three sensory conditions (see Figure 3). The statistically significant difference between perceptual states (i.e., main effect of Perceptual state in a 2 [Perceptual state] × 3 [Sensory modality (excluding catch trials)] repeated-measures ANOVA, n = 19, all ps < .01) was transient for the interval spanning 53–72 msec poststimulus onset and sustained after 102 msec, with an almost complete absence of late (i.e., +300 msec) response components for nonperceived stimuli (see Sanchez et al., 2017; Sperdin, Spierer, Becker, Michel, & Landis, 2015; Gaillard et al., 2009; Del Cul et al., 2007; Sergent et al., 2005; Dehaene et al., 2001, for similar results as well as Dehaene & Changeux, 2011, for a review). Stated simply, both perceived and nonperceived stimuli generated similar early sensory responses (<∼120 msec poststimuli onset). In contrast, the presence of relatively late (>∼120 msec poststimuli onset) response components was associated with perceived stimuli. Also statistically significant was the main effect of stimulus modality in the intervals between 110 and 131 msec poststimulus (n = 19, all ps < .01; this likely reflects auditory evoked potentials) and between 194 and 240 msec poststimulus (n = 19, all ps < .01; this likely reflects visual evoked potentials; Luck, 2005). Not surprisingly, given the lack of significant evoked potentials in these conditions (see above), paired-sampled t tests revealed no difference in the GFP evoked by “perceived” and “nonperceived” catch trials (all ts(12) < 1, all ps > .57), although this analysis relied on a considerably reduced number of trials (see Methods). Furthermore, results revealed a significant interaction between perceptual state and sensory condition 115 msec poststimuli onset and onward. Separate t tests across perceptual states (perceived vs. nonperceived) for the different sensory conditions (audio, visual, and audiovisual) revealed that, for auditory stimuli, the GFP diverged for perceived versus nonperceived stimuli at 121 msec poststimulus onset. For visual stimuli, this divergence occurred at 219 msec, whereas for multisensory stimuli, the divergence began 234 msec after stimulus onset.
Next, we determined whether the difference in GFP magnitude for perceived versus nonperceived multisensory stimuli could be explained by a simple combination of the unisensory responses. To do so, we compared the multisensory responses (perceived and nonperceived) to the sum of the unisensory responses (perceived and nonperceived; see Cappe et al., 2012; Cappe, Thut, Romei, & Murray, 2010, for a similar analysis). To do so, the evoked potentials for the unisensory conditions were first summed, and then the GFP was extracted (see Methods). This analysis showed a significant main effect of Sensory modality (A + V > AV; see Figure 4) beginning at 183 msec (n = 19, repeated-measures ANOVA, all ps < .01) and a main effect of Perceptual state (perceived > nonperceived; see Figure 4, bottom) between 97 and 188 msec poststimulus onset and from 222 msec onward (n = 19, repeated-measures ANOVA, all ps < .01). Most importantly, the results indicated a significant interaction such that multisensory responses to perceived stimuli were weaker than the sum of the two unisensory responses in a manner that differed significantly from the comparison of multisensory responses with nonperceived stimuli (n = 19, 2 [Perceptual state] × 2 [Sum unisensory vs. multisensory] repeated-measures ANOVA interaction, all ps < .01, 251 msec onward; see Figure 4, dark area and line indicating significance). Follow-up analyses using paired t tests showed no difference between the pair and the sum when stimuli were not perceived (all ps > .043) but showed a difference between these conditions beginning 194 msec poststimulus onset (p < .01) when the stimuli were perceived.
Collectively, these GFP results highlight that audiovisual stimuli that are perceived result in late evoked potentials that are not present when stimuli are not perceived, mirroring what has been well established within the visual neurosciences (e.g., see Dehaene & Changeux, 2011, for a review) and what seems to be emerging within the auditory neuroscience (e.g., see Sadaghiani et al., 2009). Interestingly, the presence of this late component exhibits subadditivity when contrasting the sum of unisensory and multisensory conditions (e.g., see Cappe et al., 2012, for similar results), an observation that is not true when stimuli are not perceived—due to the lack of late evoked potentials.
To extend analyses beyond response strength, we further employed measures that capture the variability (i.e., reproducibility) and complexity (next section) of EEG responses. Specifically, there are several measures that have been leveraged successfully for the characterization and differentiation of states of consciousness (e.g., coma vs. awake vs. anesthetized vs. dreaming; Schurger et al., 2015; Ecker et al., 2014; Casali et al., 2013). In the current work, we implement a relatively straightforward version of this strategy. To evaluate response variability across sensory conditions and perceptual states, we performed PCA on the EEG signal for each trial and participant on an electrode-by-electrode basis and identified the minimum number of principal components needed to capture 90% of the trial-to-trial variability (McIntosh, Kovacevic, & Itier, 2008). As illustrated in Figure 5, more dimensions were needed to account for intertrial response variability of perceived (vs. nonperceived) conditions. However, this difference was more prominent for unisensory conditions compared with multisensory conditions (Figure 5). More specifically, a 2 (Perceived vs. nonperceived) × 3 (Sensory modality; A, V, AV) repeated-measures ANOVA demonstrated a significant main effect of Sensory modality beginning 107 msec poststimulus onset and persisting throughout the entire epoch (p < .01), a main effect of Perceptual state beginning at 95 msec poststimulus onset and persisting throughout the rest of the epoch (p < .01), and a significant interaction between these variables beginning 99 msec poststimulus onset and persisting throughout the rest of the epoch (p < .01). The interaction is explained by a difference in the time at which the PCA bifurcated between perceptual states (if at all) for the different sensory conditions. For the unisensory condition, beginning at 91 msec after the auditory stimulus and at 239 msec after the visual stimulus, there was a significant increase in response variability trials in which the stimulus was perceived (p < .01, n = 19, paired-samples t test for both contrasts; Figure 5). In contrast, this increased variability for perceived trials was not apparent for the audiovisual stimuli (p > .09, n = 19, paired samples t test). Intertrial variability as quantified by the PCA was similar across perceptual states for the catch trials (all ts(12) < 1, p > .74).
The final theory-driven measure of interest here is a measure of capacity for information reduction—LZ complexity. This measure is of interest because of recent observations indicating that perceptual awareness may not emanate simply for the recruitment of broadly distributed networks but rather for the differentiation and integrations of activity among these networks (see Cavanna et al., 2017, for a recent review). These networks are postulated to fulfill axiomatic observations related to awareness (Tononi & Koch, 2015) that embody complex neural signatures of that mental state. Thus, here, LZ complexity—a measure of information reducibility—was measured across the poststimuli period of audio, visual, and audiovisual stimuli that were either perceived or not, and we queried whether similar patterns of complexity would apply across modalities (i.e., from visual to auditory) and number of modalities (i.e., from unisensory to multisensory). As illustrated in Figure 6, a 4 (Sensory modality; none, audio, visual, audiovisual) × 2 (Perceived vs. nonperceived) repeated-measures ANOVA revealed a significant main effect of Sensory modality (F(3, 57) = 44.92, p < .001), a significant main effect of Perceptual state (F(1, 18) = 40.82, p < .001), and a significant interaction between these variables (F(3, 57) = 3.21, p = .029). The main effect of Perceptual state was due to higher complexity for nonperceived stimuli (M = 0.24, SEM = 0.01) than for perceived stimuli (M = 0.19, SEM = 0.005; paired t test, t(18) = 6.32, p < .001). Regarding the main effect of Sensory modality, post hoc paired t tests (Bonferroni corrected) revealed that catch trials exhibited the most informationally complex patterns of activity, on average (M = 0.27, SEM = 0.007, all ps < .001), followed by auditory evoked potentials (M = 0.21, SEM = 0.010; contrasts to catch and audiovisual conditions significant with all ps < .03, but not the contrast to visual trials, p = .659), visual evoked potentials (M = 0.19, SEM = 0.010; contrast to audiovisual trials being nonsignificant, p = .253), and, finally, multisensory evoked potentials (M = 0.18, SEM = 0.008). The complexity of these multisensory responses was not significantly different from those of visual responses. The significant interaction was driven by the fact that there was a significant difference in evoked complexity between perceptual states (perceived vs. nonperceived) for catch trials (perceived: M = 0.24, SEM = 0.03; nonperceived: M = 0.30, SEM = 0.06; t(19) = 3.40, p = .003), auditory trials (perceived: M = 0.19, SEM = 0.05; nonperceived: M = 0.24, SEM = 0.04; t(19) = 6.63, p < .001), and visual trials (perceived: M = 0.17, SEM = 0.04; nonperceived: M = 0.22, SEM = 0.05; t(19) = 4.45, p < .001) stimulation. In contrast, this difference was not seen for audiovisual trials (perceived: M = 0.17, SEM = 0.03; nonperceived: M = 0.19, SEM = 0.04; t(19) = 1.32, p = .203). In fact, for the multisensory condition, Bayesian statistics suggested that, not solely, there is no evidence against the null hypothesis (as inferred via frequentist analyses described above), but in fact, there was considerable evidence for it (BF10 = 0.298 < 0.03, typically suggested as a cutoff favoring the null hypothesis; Jeffreys, 1961). Taken together, these analyses suggest that, although EEG complexity is generally decreased when stimuli are perceived (vs. nonperceived and normalizing for overall entropy) for unisensory stimuli, this is not true for multisensory stimuli. Interestingly, the decrease in complexity is also observed during catch trials when participants report perceiving a stimulus that is not present. Thus, the decrease in EEG evoked complexity is associated not only with physical stimulation but seemingly also with perceptual state.
A number of different neural markers of perceptual awareness have been proposed—from “neural ignition” and the presence of late evoked potentials (P3, P300, P3b; Dehaene et al., 2017; Dehaene & Changeux, 2011), to increased neural reproducibility (Schurger et al., 2010), to a high degree of information integration that can be indexed through measures such as EEG complexity (Koch et al., 2016a, 2016b; Tononi et al., 2016; Casali et al., 2013). Here, we sought to extend the use of these various measures posited to represent credible neural signatures of perceptual awareness for visual stimuli to multisensory perceptual processes—as much of our perceptual gestalt is constructed on a multisensory foundation. Collectively, our results support and extend prior work implicating neural signatures of perceptual awareness revealed in measures of EEG response strength, reproducibility, and complexity. We show, as has earlier work, that reproducibility and complexity indices of perceptual awareness are similar for visual and auditory conditions, but we also show that there exist significant differences in the indices of awareness associated with multisensory stimulation, differences that likely have important implications for furthering our understanding of multisensory perceptual awareness.
Neural Response Strength as a Modality-free Indicator of Perceptual Awareness
More specifically, conditions in which visual, auditory, or both visual and auditory stimuli were presented resulted in reliable variations in EEG response strength (as indexed via GFP) that covaried with perceptual state (i.e., was the stimulus perceived or not). In each of these conditions, comparison of perceived versus nonperceived stimuli revealed the presence of late evoked potentials that were only present under perceived circumstances. Thus, the presence of late evoked potentials appears to be a strong index of perceptual awareness under both unisensory and multisensory conditions. The striking absence of late EEG components to nonperceived stimuli resembles “ignition-like” single-unit responses to perceived stimuli that have been found in the temporal lobe of epileptic patients (Dehaene, 2014). This response pattern fits the assumption that conscious percepts arise late in the evolution of sensory responses, possibly because they necessitate more global brain activity (Noy et al., 2015; Dehaene & Changeux, 2011; Gaillard et al., 2009). This “ignition-like” effect, which at times has been difficult to capture in previous work (e.g., Silverstein, Snodgrass, Shevrin, & Kushwaha, 2015), likely results from several aspects of the current experiment. First, it may be argued that the lack of observable late responses in EEG signals may be due to our adaptive, online method of adjusting stimulus intensity—and not reflective of the manner in which individuals become aware of stimuli. This account, however, does not fully explain the GFP effects, as EEG analyses were restricted to the last 400–500 trials and in which auditory and visual noise levels were relatively fixed in intensity and the minimal changes in stimuli intensity did not provoke a change in GFP (see Control Analyses; Figure S1 online). Second, the current experiment is different from most previous EEG studies presenting stimuli at threshold (and demonstrating the occurrence of late EEG components, e.g., see Koch, 2004) in that, here, we interleave stimuli from different modalities (see Sanchez et al., 2017, for a similar observation of abolished late evoked responses for undetected stimuli in a multisensory context). Finally, it is possible that the clear presence of late evoked potentials in perceived trials but not in nonperceived trials arises because participants were working below the 50% detection rate and not at threshold (most prior work presented stimuli at threshold).
EEG Subadditivity in Multisensory Integration Is Associated with Perceived Stimuli
A second interesting observation regarding the GFP results relates to the comparison between the sum of unisensory evoked potentials (“sum”) and the multisensory response (“pair”). When stimuli were not perceived, there was no significant difference between the multisensory GFP and the GFP predicted by the sum of unisensory responses (i.e., no difference between sum and pair). In contrast, when the stimuli were perceived, the GFP of the audiovisual condition was distinctly subadditive when compared with the sum of the unisensory responses. Hence, although neural response strength (i.e., GFP) differentiates between perceptual states under both unisensory and multisensory conditions, the perceived multisensory response does not reflect a simple addition of the two unisensory responses. Indeed, subadditivity in EEG responses is often seen as a hallmark of multisensory processing (see Cappe et al., 2010, 2012, for examples), and here, it was evident only under perceived multisensory conditions, suggesting links between multisensory integration and perceptual awareness (see Baars, 2002, for a philosophical consideration arguing that conscious processing is involved in the merging of sensory modalities). Although a number of studies suggest that multisensory interactions may occur when information from a single sense is below the threshold for perceptual awareness (Salomon, Kaliuzhna, Herbelin, & Blanke, 2016; Aller, Giani, Conrad, Watanabe, & Noppeney, 2015; Lunghi, Morrone, & Alais, 2014; Lunghi & Alais, 2013), when both are presented at subthreshold levels after a period of associative learning (Faivre et al., 2014), or even when participants are unconscious (Arzi et al., 2012; Ikeda & Morotomi, 1996; Beh & Barratt, 1965), evidence for multisensory integration in the complete absence of perceptual awareness (without prior training) is conspicuously lacking (Faivre et al., 2017; Noel et al., 2015). The current results provide additional support for the absence of multisensory integration outside perceptual awareness but, as null results, must be interpreted with caution.
Across-Trial EEG Reproducibility Differentiates between Perceived and Nonperceived Unisensory but Not Multisensory Stimuli
The next putative index of perceptual awareness used in the current study was that of neural reproducibility (Schurger et al., 2015). This measure is predicated on the view that spatio-temporal neural patterns giving rise to subjective experience manifest as relatively stable epochs of neural activity (Britz et al., 2014; Fingelkurts, Fingelkurts, Bagnato, Boccagni, & Galardi, 2013). To address the stability of responses, we measured intertrial variability via a relatively straightforward metric, that is, PCA. Those results disclosed similar levels of neural reproducibility for visual and auditory conditions (although with different time courses) and a categorically distinct pattern for multisensory presentations. Specifically, there was no difference in neural reproducibility across trials for perceived versus nonperceived trials for the multisensory conditions, but there were reliable differences associated with the unisensory conditions. The increased variability for perceived unisensory stimuli runs counter to the view that responses to perceived trials are more reproducible (Schurger et al., 2010; Xue et al., 2010). However, we did not observe late response components to nonperceived stimuli, which reduces the amount of principal components that are needed to explain the variance of this part of the response. Indeed, the increase in principal components that are needed to explain the trial-to-trial variability for the perceived stimuli occurs very close in time to the bifurcation between perceived and nonperceived GFPs (auditory: GFP at 121 msec vs. PCA dimensionality increase at 91 msec; visual: GFP at 219 msec vs. PCA dimensionality increase at 239 msec). Thus, the relevant observation here is that both the strength (as indexed via GFP analyses) and the between-trial variability (as indexed via PCAs) seen in response to perceived multisensory stimuli are reduced in comparison with the unisensory conditions, with both of these effects appearing around the same time in the neurophysiological responses. On the other hand, in contrast to the observation that late evoked potentials seemingly index perceptual awareness regardless of sensory modality, the increase in reproducibility associated with perceived stimuli (Schurger et al., 2015) is most readily evident for multisensory stimuli. That is, although the observation derived from visual neurosciences indicating increased reproducibility for perceived stimuli (Schurger et al., 2015) may be applied to auditory neurosciences— the same pattern of results between auditory and visual modalities, although at different latencies—the PCA seems categorically different when probing perceived and nonperceived multisensory stimuli. These results highlight that, at least in the case of neural reproducibility, conclusions drawn from unisensory studies may not generalize to multisensory studies for work attempting to better understand the neural correlates of perceptual awareness.
The finding that signals of neural variability under multisensory conditions changed little as a function of perceptual state is consistent with computational models based on Bayesian inference (e.g., Körding et al., 2007) and maximum likelihood estimates. These models have been applied to psychophysical (Ernst & Banks, 2002), neuroimaging (Rohe & Noppeney, 2015, 2016), and electrophysiological (Boyle, Kayser, & Kayser, 2017; Fetsch, Deangelis, & Angelaki, 2013) observations concerning suprathreshold multisensory performance and collectively illustrate that the combination of sensory information across different modalities tends to decrease variability (i.e., increases signal reliability). Although the current study was not designed or analyzed to specifically pinpoint neural concomitants of multisensory integration, our findings may inform the models mentioned above by showing that, at least for the task employed in the current study, variance in the evoked neural response is more comparable across perceptual states for multisensory conditions compared with unisensory conditions. Interestingly, stimulus-induced reduction in neural variability has been observed across a wide array of brain areas and has been posited to be a general property of cortex in response to stimulus onset (Churchland et al., 2010). In subsequent work, it will be informative to examine whether, at the level of single neurons, variability (as measured through indices such as Fano factor; Eden & Kramer, 2010) decreases equally across perceptual states (while maintaining stimulus intensity near detection threshold) and whether these changes differ for unisensory brain responses compared with multisensory responses.
EEG Complexity Differentiates between Perceived and Nonperceived Unisensory but Not Multisensory Stimuli
Finally, consider that aspect of our results dealing with measured neural complexity associated with evoked responses due to visual, auditory, or audiovisual stimuli and as a function of perceptual state. In previous work, a derivative of this measure has successfully categorized patients along the continuum ranging from awake to asleep to minimally conscious and, finally, to comatose (see Casali et al., 2013). This work has shown that, when neural responses are evoked via TMS, they are less amenable to information compression when patients are conscious relative to when they are unconscious. To our knowledge, however, the present report is the first to examine EEG data complexity (compressibility) as a function of perceptual state and not as a function of level of consciousness. Our results indicate that evoked responses are less complex when either visual or auditory stimuli are perceived (compared with nonperceived). Interestingly, this difference was not evident under multisensory conditions. Furthermore, this measure was able to differentiate between the catch trials that were correctly “rejected” (i.e., no stimulus reported when no stimulus was presented) and false alarms (i.e., reports of the presence of a stimulus when none was presented—a possible analog of a hallucination). The switch in effect direction between levels of consciousness (i.e., more complex when patients are conscious) and perceptual state (i.e., more complex when stimuli are not perceived) likely is due to the fact that, in the former case, neural responses are artificially evoked—thus recruiting neural networks in a nonnatural manner—whereas in the current case, neural responses are evoked by true stimulus presentations. As an example, in the case of visual stimulus presentations, the present results indicate that neural information in the visual neural network architecture is more stereotyped for perceived versus nonperceived trials.
Taken together, the overall pattern of results (1) questions whether multisensory integration is possible before perceptual awareness (see Spence & Bayne, 2014, and O'Callaghan, 2017, for distinct perspectives on whether perceptual awareness may be uniquely multisensory or simply a succession of unisensory processes) and (2) questions the implicit assumption that all indices of perceptual awareness apply across all sensory modalities and conditions. Indeed, if assumed that the search for the neural correlates of perceptual awareness must result in a set of features that are common across all sensory domains (e.g., visual awareness, auditory awareness, audiovisual awareness), then the current findings would argue that the presence of late evoked potentials, as opposed to neural reproducibility or complexity, most closely tracks perceptual awareness. On the other hand, if one instead assumes that visual awareness, auditory awareness, and audiovisual awareness are categorically distinct (or nonexistent in the case of multisensory awareness; Spence & Bayne, 2014), then the current findings suggest a greater similarity between the neural correlates of perceptual awareness across visual and auditory modalities and not between unisensory and multisensory perceptual processes.
The authors thank Dr. Nathan Faivre for insightful comments on an early draft. The work was supported by an NSF Graduate Research Fellowship to J. P. N., by NIH grant HD083211 to M. T. W., and by Centennial Research Funds from Vanderbilt University to R. B.
Reprint requests should be sent to Jean-Paul Noel, 7110 MRB III BioSci Bldg., 465, 21st Ave. South, Nashville, TN 3721, or via e-mail: email@example.com.