Abstract
Although visual input arrives continuously, sensory information is segmented into (quasi-)discrete events. Here, we investigated the neural correlates of spatiotemporal binding in humans with magnetoencephalography using two tasks where separate flashes were presented on each trial but were perceived, in a bistable way, as either a single or two separate events. The first task (two-flash fusion) involved judging one versus two flashes, whereas the second task (apparent motion: AM) involved judging coherent motion versus two stationary flashes. Results indicate two different functional networks underlying two unique aspects of temporal binding. In two-flash fusion trials, involving an integration window of ∼50 msec, evoked responses differed as a function of perceptual interpretation by ∼25 msec after stimuli offset. Multivariate decoding of subjective perception based on prestimulus oscillatory phase was significant for alpha-band activity in the right medial temporal (V5/MT) area, with the strength of prestimulus connectivity between early visual areas and V5/MT being predictive of performance. In contrast, the longer integration window (∼130 msec) for AM showed evoked field differences only ∼250 msec after stimuli offset. Phase decoding of the perceptual outcome in AM trials was significant for theta-band activity in the right intraparietal sulcus. Prestimulus theta-band connectivity between V5/MT and intraparietal sulcus best predicted AM perceptual outcome. For both tasks, phase effects found could not be accounted by concomitant variations in power. These results show a strong relationship between specific spatiotemporal binding windows and specific oscillations, linked to the information flow between different areas of the where and when visual pathways.
INTRODUCTION
Many aspects of our lives, including motion processing, speech recognition, reading, sound localization, and visuomotor coordination, require temporal or spatio-temporal integration and segregation of sensory information in the subsecond scale. This fundamental process represents a core mechanism of perception, allowing change in the flow of sensory input to be consciously represented without any experience of discontinuity (White, 2018).
After seminal neurophysiological investigations proposing that perception depends on the rhythmic sampling of sensory information (Harter, 1967; Lansing, 1957; Bishop, 1932), the neural correlates of spatiotemporal integration/segregation have been linked to ongoing neural oscillations using neurophysiological techniques in humans (Pöppel, 1997; Varela, Toro, John, & Schwartz, 1981). The main hypothesis proposes that the alpha rhythm (8–12 Hz) defines a neural computation cycle within which integration of visual input occurs (VanRullen, 2016). This idea is supported by studies showing that spikes in sensory areas are more likely to occur at a specific phase of the ongoing oscillations (such as the peak or trough) compared with opposing phases (Haegens, Nácher, Luna, Romo, & Jensen, 2011). However, we also know that different sensory modalities have different preferential rhythms and, in the case of speech processing, the sampling rhythm can vary according to the complexity and timescale (e.g., phonemes, syllables and phrases: Dikker, Assaneo, Gwilliams, Wang, & Kösem, 2020; Giraud & Poeppel, 2012; Ahissar et al., 2001). Similarly, perceiving temporal variation related to complex visual objects requires a more extensive brain network than perceiving flicker or changes in a simple stimulus, such as an oriented line or grating (e.g., De Vries & Baldauf, 2019; Baldauf & Desimone, 2014). Moreover, integrating information about a dynamic event across space and time would involve an even more complex network, and potentially a longer computation cycle, compared with a stationary stimulus.
Recent evidence has brought support in favor of the idea that, within the visual modality, there are multiple preferential neural rhythms for the sampling of sensory information across space and time. Specifically, the perceptual sampling of stimuli that alternate in close temporal proximity and in the same spatial position has been linked with alpha band activity (Gulbinaite, İlhan, & VanRullen, 2017; Cecere, Rees, & Romei, 2015; Samaha & Postle, 2015), whereas stimuli separated by larger temporal intervals that also require sampling across space appear to involve slower frequencies within the theta band (Ronconi, Oosterhof, Bonmassar, & Melcher, 2017). This idea is in line with a theoretical framework of rhythmic perception that suggests that the frequency of ongoing neural oscillations determines the resolution of rhythmic sampling, in a way that faster oscillatory frequencies imply shorter temporal integration windows (Ronconi, Melcher, Junghöfer, Wolters, & Busch, 2022; Ronconi, Busch, & Melcher, 2018; Wutz, Melcher, & Samaha, 2018; Ronconi & Melcher, 2017; Samaha & Postle, 2015).
Although this initial evidence supports the idea that multiple neural rhythms are relevant for sampling visual information across space and time, and that these rhythms would coexist to determine our perception of a continuous sensory flow, the precise cortical networks underlying these fundamental computational principles of the visual system remain unexplored. The need for a precise mapping between spatiotemporal sampling mechanisms in human perception and the related, rhythm-based cortical network dynamics have been recently reinvigorated by some null findings that have questioned the idea that oscillatory alpha activity is both sufficient and necessary for visual temporal sampling. In a simple visual detection task, Ruzzoli, Torralba, Fernández, and Soto-Faraco (2019) found evidence for the involvement of prestimulus alpha amplitude but not phase. In another study testing the temporal segmentation of both flashes and sounds, Buergers and Noppeney (2022) found no effect of resting or prestimulus alpha frequency on visual (or audio–visual) parsing.
In the present study, we conducted an in-depth investigation of the neural cortical networks subserving spatiotemporal sampling of visual stimuli in humans using magnetoencephalography (MEG), which allows for a better spatial resolution as compared with EEG. We employed source-level multivariate decoding and connectivity analyses of source-reconstructed MEG data recorded during an integration/segregation task of temporal and spatio-temporal events. In the same blocks of trials, participants performed two perceptual discriminations: a two-flash fusion (TFF) and an apparent motion (AM) task, measuring temporal and spatiotemporal integration/segregation mechanisms, respectively. In both tasks, two separate flashes were physically presented on each trial, but participants perceived them in a bistable way. In the TFF condition, temporal integration would lead to the conscious report of a single stimulus as opposed to two discrete flashes, whereas in the AM condition, spatio-temporal integration would lead to a conscious report of single moving object as opposed to two discrete flashes in different spatial position. We hypothesized that perception of one versus two separate stimuli would reflect specific oscillatory band activity and that the two flash (temporally segmented) responses would reflect an increased strength in functional connectivity, and thus more efficient communication (Rassi, Wutz, Müller-Voggel, & Weisz, 2019; Panzeri, Ince, Diamond, & Kayser, 2014), between early visual cortex and higher visual processing areas.
METHODS
The main steps involved in the present study—as described in detail below—developed as follows: First, we mapped the cortical regions that differentiated integration versus segregation in the two different task conditions by analyzing MEG activity evoked after the stimulus onset as a function of the subjective perceptual interpretation of the same bistable stimuli. Second, we trained a multivariate classifier to decode the perceptual outcome from the phase (or power, as a control analysis) of prestimulus oscillatory activity within these networks. Finally, we used the resulting information to characterize the network-level interactions in terms of functional connectivity.
Participants
Thirty participants (20 women), aged 18–35 years, took part in the study as paid volunteers. No participants reported history of neurological disease or epilepsy. All of them reported normal or corrected-to-normal vision and hearing and gave informed written consent. Three participants were removed for the subsequent analyses, one because of excessive MEG artifacts, and two because they perceived AM in 90% of trials or more; thus, their perception could not be considered bistable. The experimental protocol was approved by the ethics committee of the Center for Mind/Brain Science at University of Trento and conformed to the principles of the Declaration of Helsinki of 2013.
Experimental Design—Apparatus, Stimuli, and Task Procedure
The display system used for presentation of visual stimuli within the magnetically shielded room was a DLP projector (PROPixx, VPixx Technologies Inc.) running at a refresh rate of 100 Hz, aimed at a translucent back-projection screen (projected screen size 510 mm × 380 mm) located in a dimly lit, magnetically shielded chamber at a viewing distance of 100 cm.
The stimulus presentation methodology follows the one previously used in Ronconi et al. (2017) and depicted in Figure 1A. The target stimuli (“flashes”) were luminance-defined Gaussian blobs sized 0.5° × 0.5°. Flashes were presented above the individual threshold, which was calculated before the main experiment, following the same procedure previously used (Ronconi & Melcher, 2017; Ronconi et al., 2017). The resulting average Michelson contrast value between the flash and the background was ∼15%.
Each trial began with a fixation point for a variable presentation time (ranging from 1350 to 1750 msec), and both target flashes had a duration of 10 msec (one refresh rate).
In the TFF trials, the two target flashes appeared in the same position, aligned to the horizontal axis (left or right hemifield, randomized across trials), with an eccentricity of 6° from the fixation. They were always separated by an ISI of 40 msec (four refresh cycles). This value was chosen based on extensive pilot studies, as well as previous reports (Drewes, Muschter, Zhu, & Melcher, 2022; Battaglini et al., 2020; Ronconi & Melcher, 2017; Ronconi et al., 2017).
In the AM trials, the first of the two target flashes was displayed at 6° of eccentricity aligned to the horizontal axis (left or right hemifield, randomized across trials). The second target flash appeared after an ISI of 120 msec (12 refresh rate) above or below the position of the first flash (at a distance of 4°) at the same eccentricity and in the same hemifield. This ISI was chosen based on pilot studies and a previous report (Ronconi et al., 2017).
A blank screen of 1500 msec followed the target presentation and anticipated the appearance of a response screen, in which participants had to report if they perceived one or two flashes for TFF trials, or if they perceived motion or alternation (and in which direction: upward or downward) for AM trials. No time constraints were imposed, and we stressed that only an accurate perception was important for the task and that RTs were not relevant. After a response was entered, the subsequent trial started after an intertrial interval of 1000 msec.
Each participant completed 10 MEG recording blocks of 8 min each, with an average number of trials completed equal to 751 (min.–max. range: 551–854). The different types of trials were randomly intermixed. An additional 5% of “catch” trials with longer ISI were presented for both trial types (100 msec for the TFF task and 200 msec for the AM task), with the aim of presenting clearly distinguishable targets that would reinforce bistable perception during the standard trials. Participants were unaware of the fact that bistable trials were all identical.
MEG Data Acquisition
Participants' whole-head MEG activity was recorded in a magnetically shielded room using a Neuromag 306 (Elekta) system with 102 magnetometers and 204 planar gradiometers, with a sampling rate of 1000 Hz. The system consisted of 102 sensors containing a triplet of one magnetometer and two gradiometers. To measure the head position while the participants' head was within the MEG helmet, for each participant, a specific head-frame coordinates set was defined before the experiment, using predefined cardinal points of the head (i.e., nasion and left and right pre-auricular points), as well as the location of five head-position indicator coils and a minimum of 200 other head-shape samples that were digitized for motion tracking using a Polhemus FASTRAK 3-D digitizer (Fastrak Polhemus, Inc.). The participant's head position relative to the MEG sensors was estimated before each MEG recording block (see Procedures section) by activating the head-position indicator coils to ensure that no major movements occurred during the data acquisition period.
MEG Data Processing
Raw data were initially processed using MaxFilter 2.0 (Elekta Neuromag), which allows external sources of noise to be separated from head-generated signals using a spatio-temporal variant of signal space separation (Taulu & Kajola, 2005; Taulu, Simola, & Kajola, 2005). Before that, data were visually inspected and noisy channels were excluded from the spatio-temporal variant of signal space separation filtering and replaced by interpolation. Movement compensation was applied, and each run was aligned to an average head position.
After obtaining the Maxfiltered data, the subsequent data-analysis steps were performed in MATLAB with the following freeware software packages: Fieldtrip for preprocessing, event-related fields (ERFs) and time–frequency analyses (Oostenveld, Fries, Maris, & Schoffelen, 2011), brainstorm for cortical sources reconstruction (Tadel, Baillet, Mosher, Pantazis, & Leahy, 2011), and CoSMoMVPA for multivariate pattern analyses (MVPAs; Oosterhof, Connolly, & Haxby, 2016).
Continuous MEG recordings were downsampled to 500 Hz and epoched from −1.5 sec before to 1 sec after the onset of the first stimulus. MEG epochs contaminated by artifacts were visually identified and manually rejected (an average of M = 21.05%, SD = 7.27% of trials for each participant were discarded after the artifact rejection procedure). In the TFF trials, after rejection of MEG epochs with artifacts, we obtained an average of 160 epochs (SD = 70) for segregation (i.e., two flashes, same position) and 237 for integration (SD = 68; i.e., unique flash). In the AM trials, we obtained an average of 167 trials (SD = 40) for segregation (i.e., two flashes, different position) and 193 (SD = 38) for integration (i.e., moving flash).
ERFs and Related Statistical Analysis
ERFs were calculated from artifact-free epochs as the average in amplitude across trials, after combining data from planar gradient pairs using vector addition. ERFs were baseline corrected using an interval of −200 to 0 msec before the first stimulus onset. Statistical analyses between segregation and integration in the two tasks conditions were entirely data driven; thus, we decided to perform permutations statistics (n = 10,000) and to apply a cluster-based correction for multiple comparisons considering both time (all time points after stimulus onset) and sensor space (204 gradiometers) as dimensions to correct for, using a family-wise alpha level of .05. Temporal windows where significant cluster-corrected differences emerged in the poststimulus ERFs analyses were used to temporally constrain the identification of ROI at the cortical source level, as described in the next paragraph.
Cortical Source Reconstruction and ROI Definition
The entire source reconstruction process followed the recent guidelines for cortical source reconstruction with M/EEG data and related statistical analyses (Tadel et al., 2019). Structural magnetic resonance images (MRIs) were available for all participants (except one) and were all preprocessed with FreeSurfer (Fischl, 2012; Dale, Fischl, & Sereno, 1999). For the only participant for which MRI was not available, we used the default cortical anatomy of the Montreal Neurological Institute (MNI).
We co-registered the brain surfaces from their individual segmented MRIs (Nolte, 2003) with an overlapping sphere head model. Empty-room recordings (2 min) collected the same day as the participant's recordings were preprocessed following the same steps as participants' data, and used to calculate the noise covariance matrix.
Forward modeling of electromagnetic fields was computed through the overlapping-spheres method (Huang, Mosher, & Leahy, 1999). The estimation of distributed source amplitudes (inverse modeling) was computed using a weighted minimum-norm inverse kernel (Hämäläinen & Ilmoniemi, 1994). A z-score normalization was applied to each cortical source trace with respect to the prestimulus period (−200, 0 msec): This standardization replaces the raw source amplitude (pAm) value with new values that are suitable for hypothesis testing and, moreover, reduces the influence of interindividual fluctuations in neural current intensity that is because of irrelevant anatomical or physiological differences (Tadel et al., 2019). Absolute values were used to compute the contrast measure between conditions regardless of the current's polarity.
After obtaining the individual cortical maps of source activity for each individual, cortical sources were normalized onto a standard MNI brain (Montreal, Canada; https://www.bic.mni.mcgill.ca/brainweb). Surface smoothing was applied using a circularly symmetric Gaussian kernel with a FWHM size of 5 mm. Such further steps improve the possibility to detect differential activity in a specific cortical region at the group level by reducing noise and interindividual variability.
Finally, source data were averaged over the time points of interest that emerged from the ERF analyses, blind to participant response, and compared between the different subjective perceptual outcomes (segregation vs. integration), separately for both tasks. The resulting anatomical structures that were differentially activated as a function of subjective perception were labeled according to both the Desikan-Killiany and Brodmann atlases (see Table 1) and were used as ROIs for the MVPA of time–frequency data and functional connectivity (phase coherence) analyses, both described in the next paragraphs. Specifically, for each ROI, we extracted the neural source activity of the single vertex (i.e., single cortical data point) showing the strongest difference between experimental conditions (segregation vs. integration) in each of the two tasks (TFF and AM).
Task . | ROIs Label . | MNI Coord. . | Cortical Location (AAL) . |
---|---|---|---|
TFF | V2_L | −3, −95, −24 | Cuneus_L |
V2_R | 35, −86, −19 | Occipital_Inf_R | |
MT_R | 54, −70, 4 | Temporal_Mid_R | |
IFG_R | 53, 27, 8 | Frontal_Inf_Tri_R | |
MFG_R | 25, 58, 16 | Frontal_Sup_R | |
MFG_L | −37, 54, 26 | Frontal_Mid_L | |
IFG_L | −51, 19, −1 | Frontal_Inf_Tri_L | |
STG_L | −67, −12, 4 | Temp_Sup_L | |
AM | TPJ | 55, −47, 18 | Temporal_Sup_R |
IPS/SupPariet_R | 36, −63, 42 | Angular_R | |
MTG_R | 47, −21, −8 | Temporal_Mid_r | |
Insula_R | 45, 1, 3 | Insula_R | |
SupFront_R | 27, −9, 69 | Frontal_Sup_R | |
SupFront_L | −18, −8, 78 | Frontal_Sup_L | |
IFG_R | 55, 18, −2 | Frontal_Inf_Oper_R | |
Insula_L | −43, −1, −4 | Insula_L | |
STG_L | −51, 18, −11 | Temporal_Pole_Sup_L | |
TempInf_R | 45, −47, −17 | Temporal_Inf_R | |
V1_L | −9, −95, 1 | Calcarine_L |
Task . | ROIs Label . | MNI Coord. . | Cortical Location (AAL) . |
---|---|---|---|
TFF | V2_L | −3, −95, −24 | Cuneus_L |
V2_R | 35, −86, −19 | Occipital_Inf_R | |
MT_R | 54, −70, 4 | Temporal_Mid_R | |
IFG_R | 53, 27, 8 | Frontal_Inf_Tri_R | |
MFG_R | 25, 58, 16 | Frontal_Sup_R | |
MFG_L | −37, 54, 26 | Frontal_Mid_L | |
IFG_L | −51, 19, −1 | Frontal_Inf_Tri_L | |
STG_L | −67, −12, 4 | Temp_Sup_L | |
AM | TPJ | 55, −47, 18 | Temporal_Sup_R |
IPS/SupPariet_R | 36, −63, 42 | Angular_R | |
MTG_R | 47, −21, −8 | Temporal_Mid_r | |
Insula_R | 45, 1, 3 | Insula_R | |
SupFront_R | 27, −9, 69 | Frontal_Sup_R | |
SupFront_L | −18, −8, 78 | Frontal_Sup_L | |
IFG_R | 55, 18, −2 | Frontal_Inf_Oper_R | |
Insula_L | −43, −1, −4 | Insula_L | |
STG_L | −51, 18, −11 | Temporal_Pole_Sup_L | |
TempInf_R | 45, −47, −17 | Temporal_Inf_R | |
V1_L | −9, −95, 1 | Calcarine_L |
These cortical areas were labeled according to both the Desikan-Killiany and Brodmann atlases and were used as ROIs for the MVPA of time–frequency data in the prestimulus interval and for functional connectivity analyses. AAL = automated anatomical labeling.
Time–Frequency Decomposition and ROI-based Single-trial (Phase/Power) Decoding
Artifact-free epochs of ROIs source activity were transformed into time–frequency domain using a complex Morlet wavelet with varying number of cycles (three at the lowest frequency and 10 at the highest) to obtain time–frequency (complex number) representation in 68 frequency bins from 3 to 30 Hz and 250 time points covering the entire epoch length relative to the stimulus onset.
The main analysis where perception was decoded from prestimulus phase followed a similar method used in our previous study (Ronconi et al., 2017); specifically, for each participant, a searchlight with a cross-validated naïve Bayes phase classifier to classify whether and at which frequencies the prestimulus phase of ongoing ROI activity could predict subjective perception. For the cross-validation, a split-half method was used on single trial source activity estimated for each ROI: 50% of the trials were selected pseudorandomly for training the classifier, and the remaining half were used for testing. We performed this operation twice, training on one half and testing on the other half, and vice versa. Classification accuracy was computed as the number of correctly predicted condition labels divided by the total number of predictions. In all cases, the train and test set were both balanced across the two conditions (integration or segregation). In other words, the number of trials in each condition was the same; where necessary, (a few) trials were dropped using subsampling from the train or test set to ensure balance.
For the classifier, we used a custom re-implementation of some of the functionality present in the Circular Statistics Toolbox (Berens, 2009). We used a novel multivariate phase classification approach, which was previously published in Ronconi et al. (2017). The input of the classifier was phase data from a set of trials with two conditions for a set of k features (combination of time points and frequencies). For each condition label c (indicating integration or segregation) and feature i in the training set, the average phase θc,i and concentration parameter κc,i was computed. For each trial in the test set, the probability pi,c that it belonged to class c according to feature i was computed using the von Mises circular probability density function (as implemented in circ_vmpdf.m in the circstat toolbox). Because our classification approach was Naïve Bayes (assuming independence across features), the combined class probability Pc that a trial belonged to condition c was computed as Pc = p1,c * p2,c * … pk,c, integrating the information across the k features. The predicted condition label was set to the one with the highest probability. For improved accuracy when using very small probability values, our implementation took the logarithm of the probabilities and summed them. Because we used balanced trial counts for the two conditions, there was no need to assume different prior probabilities accounting for class frequency.
For the temporal-frequency searchlight used in each ROI, each searchlight was based on radii of four time points and eight frequencies. For a given “center” feature (combination time point and frequency), features within a distance of four time points and eight frequencies were selected and used for cross-validated classification as described earlier. The classification accuracy was then assigned to the center feature. This process was repeated for each feature, resulting in a classification accuracy map for all time points and frequencies within each ROI. Given that for the decoding analyses we used the split-half method just described, the classification of perceptual outcome in the TFF trials was based, on average, on 320 trials (SD = 68), whereas in the AM trials, the classification was based on 332 trials (SD = 37).
A complementary decoding analysis in the same ROIs was performed on power values, to check that any effects found on prestimulus phase were not attributable to concomitant variations in the oscillatory amplitude within the same time/frequency/ROI, given that phase and power dimensions are significantly correlated (Nelli, Itthipuripat, Srinivasan, & Serences, 2017). In this case, decoding was performed with the same procedure described above for phase. The only difference was that features were not treated according to von Mises circular probability density function, but instead according to a normal (Gaussian) distribution, which is the standard option when dealing with continuous data.
Functional Connectivity and Related Statistical Analysis
Connectivity analysis was performed between pairs of ROIs defined at the cortical source level, using as hubs the specific ROIs where perception could be successfully decoded. Estimating functional connectivity at the source level has the advantage of reducing the effect of electromagnetic field spread and preventing spurious (non-independent) source-leakage effects, such as linear mixing or cross-talk between time series (Schoffelen & Gross, 2009). Specifically, we hypothesized that stronger connectivity states around stimulus onset would lead to better communication between lower-order and higher-order visual regions, in agreement with recent findings (Rassi et al., 2019), thus promoting a more accurate representation (i.e., temporal segregation) of visual stimuli.
To estimate the coupling between pairs of ROIs, we employed the magnitude squared coherence, a widely used measure of phase-dependent connectivity (Schoffelen & Gross, 2009), calculated in a prestimulus time period extending 1 sec before the first stimulus onset. As before, we used the same number of trials to estimate connectivity in the two conditions (integration or segregation), by subsampling the condition with more trials.
We focused our analysis in the frequency bands that emerged as significant predictors of subjective perception in the prestimulus phase decoding analyses. On the basis of our previous study (Ronconi et al., 2017), we expected these frequencies to be in the theta and alpha band. Given that the frequency of alpha could play a role in determining integration versus segregation of visual stimuli (e.g., Ronconi et al., 2018; Wutz et al., 2018; Samaha & Postle, 2015), the whole alpha band was split into lower alpha (8–10 Hz) and upper alpha (11–14 Hz). Bonferroni correction for multiple comparisons was employed to correct for these different frequency bands tested.
RESULTS
Behavioral Results
Perceptual judgments of the stimuli, presented randomly in the left or right visual hemifield, were bistable in both types of trials (TFF and AM). Specifically, two distinct flashes were reported, on average, on 40.2% (SD = 17.4%) of trials in the TFF condition and 46.3% (SD = 8.2%) of trials in the AM condition. The two trial types did not differ significantly in the rate of segregation/integration trials, t(26) = −1.6, p = .12. These results suggest that ISI values effectively caused the two stimuli to be integrated on about half of the trials, confirming the estimated thresholds from our pilot studies.
ERFs and Cortical Sources Estimation
Cluster-based permutation tests allowed us to detect reliable differences between the ERFs evoked by segregation and integration in both the TFF and AM tasks. The complete set of sensors showing cluster-corrected significant differences for each comparison can be seen in Figure 1B. In the TFF condition, ERFs started to differ as a function of subjective perception as early as 84 msec after the first flash onset (around 24 msec after both stimuli had offset) and continued till the end of the time period considered (700 msec) in a large group of sensors (minimum cluster-corrected p = .002; maximum cluster-corrected p = .033). In the AM condition, ERFs started to differ in a later time window, possibly because the second stimulus here appeared 120 msec after the first one; specifically, ERFs differed significantly starting from 390 msec (250 msec after both stimuli had been presented and removed) and continued till the end of the time period considered (700 msec; minimum/maximum cluster-corrected p = .049). As a general pattern emerging from the ERF analysis visible both in the TFF and the AM tasks, in all sensors where cluster-corrected differences emerged, when the two stimuli were perceptually segregated, this elicited higher ERF amplitudes.
Cortical source estimation allowed us to identify ROIs that showed differential activity as a function of the type of percept (single/motion vs. double/alternation). They were considered as ROIs for prestimulus analyses (MVPA decoding and connectivity) only if their extension was equal to or exceeded 10 cortical vertices. The TFF and AM tasks elicited activities in two large and mostly non-overlapping cortical regions (Figure 2). Specifically, the TTF task showed different activity in visual areas including bilateral V1/V2 and the right V5/MT area, in the left superior temporal area and bilaterally in frontal areas, that is, the inferior and mid/superior frontal gyri. In contrast, the AM task showed different activity in the right V5/MT area, in the right intraparietal sulcus (IPS), in the right TPJ, in the right middle temporal gyrus, in the right insula, and bilaterally in the superior frontal gyrus. The complete set of sources that showed significant different activations in the two tasks are displayed, and their anatomical labels are reported in Table 1 together with their MNI coordinates and cortical locations as derived from automated anatomical labeling (Tzourio-Mazoyer et al., 2002).
Decoding of Perceptual Outcome from Source-level Prestimulus Phase
Single-trial data from all ROIs was extracted, and the relative time–frequency transformations were obtained to evaluate if a prestimulus phase could be used to decode subjective perception and, if so, at which oscillatory frequencies. On the basis of the previous literature reviewed above, which examined phase effects in relationship to binding mechanisms in visual perception, we focused our phase decoding analyses on the theta, alpha, and low beta frequency range (3–20 Hz).
The decoding accuracy (t values) for the different perceptual outcomes obtained with the naive Bayes classifier searchlight performed on single-trial phase values is shown in Figures 3 and 4. Cluster-corrected permutation tests revealed that the time–frequency ranges in which subjective perception could be accurately decoded was different between the TFF and the AM task, and it was observed in different ROIs. Indeed, the highest decoding accuracy in predicting participants' perceptual outcome from the phase of prestimulus oscillation in the TFF task was found in the V5/MT area of the right hemisphere, with frequencies spanning predominantly the theta and the alpha band (∼5–12 Hz) and around −400/−200 msec relative to the onset of the first stimulus (p = .048). On the contrary, decoding perceptual outcome in the AM condition was significant in the right IPS area for lower frequencies, specifically in theta band (∼4–7 Hz) at and around −700/−400 msec relative to the onset of the first flash (p = .026). Notably, these findings are perfectly in line in terms of time/frequency windows with previous EEG evidence in an independent participant sample (Ronconi et al., 2017), thus representing a replication of our previous findings.
To further corroborate the distinction between the spatial and frequency features of the networks that subserve temporal integration in the TFF and AM tasks, we used raw decoding accuracy values in the alpha (8–12 Hz) and theta (4–7 Hz) bands extracted in the two time windows in which significant decoding results emerged, and performed an ANOVA with Frequency (alpha vs. theta) and Task (TFF vs. AM) as within-subject factors. The ANOVA did not reveal significant main effects, but, importantly, it revealed a significant Frequency × Task interaction, F(1, 26) = 12.32, p = .002. When this interaction was explored in further detail, we found, as predicted, that in the right IPS/Sup. parietal, the theta-based decoding accuracy was higher in the AM task as compared with the TFF task, t(26) = 3.23, p = .0015. Conversely, in the V5/MT area, the alpha-based decoding accuracy was higher in the TFF task compared with the AM task, t(26) = 1.82, p = .0405. This direct comparison between decoding accuracy in the two tasks further corroborates the idea that the two types of spatiotemporal integration were distinguishable in terms of the spatial and frequency activity that were predominantly involved.
Prestimulus MEG Connectivity Is Predictive of Upcoming Perceptual Integration/Segregation
On the basis of the MVPA results that revealed significant decoding performance from prestimulus phase of the right V5/MT (for the TFF task) and of the right IPS (for the AM task) areas, we used these ROIs as hubs for the prestimulus connectivity analyses within the extended network that showed differential activation as a function of integration/segregation of visual stimuli.
For the TFF task, we predicted that increased functional connectivity would allow for more rapid and efficient processing, enabling finer temporal resolution (Figure 5A). In line with this hypothesis, we found that perceptual segregation (i.e., perception of two distinct flashes) was preceded by a significant increase of prestimulus connectivity in the upper alpha band (11–14 Hz) between the areas V5/MT and V2 of the right hemisphere (p = .0384; one-tailed, Bonferroni corrected); similarly, a tendency for a significant increment in prestimulus connectivity was found in the theta band (4–7 Hz) between the areas V5/MT and inferior frontal gyrus (IFG) of the right hemisphere (p = .0505; one-tailed, Bonferroni corrected). No other pairwise differences in connectivity to/from the area V5/MT were found to be significant (all p > .289).
For the AM task (Figure 5B), we again found that perceptual segregation (i.e., perception of two distinct flashes) was preceded by a significant increase of prestimulus connectivity, in line with our hypotheses. Specifically, there was stronger prestimulus connectivity in the theta band (4–7 Hz) between the areas IPS/superior parietal and V5/MT of the right hemisphere (p = .045; one-tailed, Bonferroni corrected). No other pairwise differences in connectivity to/from the right IPS/superior parietal were found to be significant (all p > .087).
Decoding of Perceptual Outcome from Source-level Prestimulus Power
The decoding accuracy (t values) for the different perceptual outcomes obtained with the naive Bayes classifier searchlight performed on single-trial power values is shown in Figures 6 for the TFF task and in Figure 7 for the AM task. Cluster-corrected permutation tests revealed that the time–frequency ranges in which subjective perception could be accurately decoded was different between the TFF and the AM task, as already seen in the analyses of the prestimulus phase. Significant decoding accuracy in predicting participants' perceptual outcome from the power of prestimulus oscillation in the TFF task was found in two frontal regions: (i) in the right IFG area, with frequencies spanning predominantly the alpha and the low-beta bands (∼7–18 Hz) and around −500/−200 msec relative to the onset of the first stimulus (p = .002), and (ii) in the left middle frontal gyrus (MFG) area, with frequencies spanning predominantly the upper alpha and low-beta bands (∼10–20 Hz) and around −200/−100 msec relative to the onset of the first stimulus (p = .045). In contrast, decoding of perceptual outcome in the AM condition was significant only in the right V5/MT area for low beta-band frequencies (∼13–20 Hz) at and around −700/−500 msec relative to the onset of the first flash (p = .034). These analyses on decoding based on oscillatory power, which showed quite a different pattern than that found with phase, strongly suggest that the phase effects we reported were not confounded by concomitant variations in power.
DISCUSSION
Starting from the idea that one cycle of low-frequency neural oscillations represents the elementary unit for sampling sensory information in different domains (VanRullen, 2016; van Wassenhove, 2016; Pöppel, 2009), in the present study, we used multivariate decoding of MEG data to shed light on the neural networks underlying the fundamental ability of the human visual system to integrate and segregate visual input. Our findings clearly point to two different functional networks underlying two aspects of visual temporal processing. The first network, involved in rapid temporal segregation of stimuli separated by just a few tens of milliseconds, was associated with early visual processing areas and visual area V5/MT. Indeed, V5/MT is sensitive to stimuli presented at high temporal frequency and has been previously associated with temporal perception (Bueti, Bahrami, & Walsh, 2008). Here, we showed that the phase of alpha oscillations localized to this area predicted integration versus segregation in the TFF task. Moreover, V5/MT also showed increased functional connectivity with early visual areas (V2) in the upper alpha-band when participant segregated the two stimuli. This is consistent with our hypothesis that rapid and efficient communication within this early visual processing circuit enables high temporal resolution performance.
In contrast, for a longer temporal scale and with visual information displayed in different spatial locations, higher-order cortical areas in the parietal lobe (i.e., IPS/superior parietal cortex) were identified as the source of phase decoding in the theta band. This area also showed increased theta-band connectivity with the area V5/MT when participants segregated the two stimuli as opposed to perceiving a single object in (apparent) motion. These findings build on work showing a prominent theta band rhythm in visual processing areas (Spyropoulos, Bosman, & Fries, 2018) as well as in parietal cortex (Raghavachari et al., 2006), and suggest that active integration and segregation of sensory stimuli, at least in the visual modality, relies on a phase-dependent temporal coding at low-mid frequency oscillations. One possibility is that phase-amplitude coupling, as previously demonstrated for both alpha and theta oscillations (Köster, Martens, & Gruber, 2019; Jensen, Gips, Bergmann, & Bonnefond, 2014; Lisman & Jensen, 2013), would allow low–mid frequency oscillations to modulate gamma-band activity to organize simple perceptual representations in time. This would limit the number of representations that can be processed in each oscillatory cycle, depending on “hardware” limits, that is, the basic temporal resolution of our visual system, and also on whether they involve tracking events in a single or in different spatial locations. The coexistence of these different rhythms could theoretically account also for integration of stimuli of higher complexity than the ones employed in the present study, such as words, objects, or faces (Wang & Luo, 2017; Drewes, Zhu, Wutz, & Melcher, 2015), that would require a more complex brain network of visual regions to be tracked in their spatiotemporal dynamics (e.g., De Vries & Baldauf, 2019; Baldauf & Desimone, 2014).
Our results are among the first to elucidate cortical origins of alpha and theta activity in the context of visual temporal parsing, building on previous sensor-level EEG findings (Ronconi et al., 2017). In fact, in the present study we replicate, in a new set of participants and with different neuroimaging tools, our previous EEG finding showing that the perceptual interpretation (integration vs. segregation) depended on the phase of ongoing/prestimulus oscillations at different frequency bands (Ronconi et al., 2017). Not only were the frequencies that showed maximum decoding accuracy for the tasks closely matching between the present MEG and the previous EEG data, but there was a matching topography of the maximum decoding accuracy with the right posterior channels found previously, compatible with results obtained here at the cortical sources level. The current findings replicate and substantially extend those findings to also uncover the network connectivity that may underlie these two sampling frequencies.
Previous work (Dou, Morrow, Iemi, & Samaha, 2022; Mathewson, Gratton, Fabiani, Beck, & Ro, 2009) has found that phasic effects in the alpha band on perception are more pronounced when alpha power is strong, likely because of a more accurate phase estimate. Our control analysis indeed showed that not only phase, but also power in the alpha and beta bands, was predictive of subsequent perceptual decision making by participants, in line with previous reports (Romei et al., 2008; Van Dijk, Schoffelen, Oostenveld, & Jensen, 2008; Ergenoglu et al., 2004). Nonetheless, the two effects of power and phase showed no overlap at the source level, leading us to discard the possibility that the reported phase effects were because of differences in power.
Given the evidence we found that power in frontal areas (left MFG and right IFG) was predictive of subjective perception in the TFF task, one possibility is that such separation hints toward a possible dissociation between phase and power effects at the level of perceptual decision making. Whereas ongoing phase might modulate the perceptual outcome based on spatiotemporal binding windows, ongoing power might reflect the instantiation of a decision bias (Balestrieri & Busch, 2022; Iemi, Chaumon, Crouzet, & Busch, 2017; Limbach & Corballis, 2016; Lange, Oostenveld, & Fries, 2013). However, a better characterization of this distinction, as well as of the nature of this bias as perceptual, sensory, or decisional/idiosyncratic (Grabot & Kayser, 2020; Samaha, Iemi, Haegens, & Busch, 2020; Iemi & Busch, 2018), should be addressed in future studies because it was not within the scope of the present study. Another possibility is that this frontal power effect could reflect the ongoing activity of neural networks controlling sustained attention, who are known to include medial and ventral areas within the inferior frontal lobe (i.e., the so-called ventral attentional network; Corbetta & Schulmann, 2011; Corbetta, Patel, & Shulman, 2008), areas where we found significant power decoding in the alpha/low-beta frequencies. Recent work suggests that attention allocation can influence the outcome of rapid perceptual segmentation (Sharp, Gutteling, Melcher, & Hickey, 2022; Sharp, Melcher, & Hickey, 2018), potentially playing a role in perceptual judgments in this study.
Regarding the significant decoding of perception in the AM task based on beta power in the right MT area, this effect partially overlaps in time with the theta phase effect found in right superior parietal sources (i.e., IPS/superior parietal cortex). Whether a theta–beta phase-amplitude coupling between these two important cortical nodes exists, and is relevant for this and other temporal binding tasks, should be investigated more deeply in future work. Some initial support for this possibility comes from invasive recordings in awake humans, showing that coupling between theta and beta occurs and may play an important role in the binding problem, including integration/segregation of information both within and across cortical areas (Malekmohammadi, Elias, & Pouratian, 2015).
Previous studies that have investigated the role of prestimulus neural activity on perception have tended to find phase modulation 200 msec or more before stimulus onset, as in the current study. As discussed by Brüers and VanRullen (2017) and Iemi et al. (2017), there are several reasons for such effects being “pushed back” away from the stimulus onset. One major explanation is that wavelet analyses “smear” the effect over time and so they would be contaminated by the ERPs/ERFs, making them insensitive to any real prestimulus effects around the time of stimulus onset. To ensure that our analysis was not contaminated by the ERFs, we ran a simulation in which we applied a time–frequency transformation to a synthetic signal (specifically, a sinusoid with an onset time of 0 msec). Following the guidelines of Tallon-Baudry, Bertrand, Delpuech, and Pernier (1996), we calculated for each frequency the wavelet's temporal resolution t, which is defined as twice the SD of the Gaussian envelope at a particular frequency, to determine the extent of temporal contamination caused by the wavelet. This analysis gives an estimate of −210 msec as the earliest prestimulus period for a signal at 5 Hz. This does not overlap with the significant effects reported here, suggesting that the prestimulus results that we found were not influenced by differences in the ERFs themselves. At the same time, it means that any prestimulus effects at the time of stimulus onset, if they had occurred in the last 200 msec, would not be detected in this analysis.
Our results are in line with previous theoretical proposals claiming that timing does not involve a single, centralized clock for the visual system, but that visual timing is dependent on the pattern of activity within distributed networks (Burr & Morrone, 2006). Indeed, they support the idea that the different temporal scale of sensory integration and segregation depend on the specific underlying networks. This hypothesis has been supported recently by studies investigating multisensory interaction using the auditory-induced and the tactile-induce double flash illusion (DFI). Cooke, Poch, Gillmeister, Costantini, and Romei (2019) and Fotia, Cooke, Van Dam, Ferri, and Romei (2021) showed network-specific oscillatory correlates, whereby auditory DFI could be linked to occipital alpha oscillations, whereas the tactile DFI could be linked to occipital beta oscillations, a rhythm typically associated to somatosensory processes. We built on and extend these previous findings by showing that we can observe different oscillatory fingerprints of subjective perception depending on the temporal range of sensory binding even within the visual modality.
One limit to our approach is that the link between perceptual thresholds and oscillations is correlational, and future work with more causal measures (such as neurostimulation) are important to better indicate directionality of causation (for a review, see Ghiani, Maniglia, Battaglini, Melcher, & Ronconi, 2021). If we consider the duty cycle (i.e., the half of an oscillatory cycle associated with increased neural excitability; Jensen, Bonnefond, & VanRullen, 2012), the shorter ISI employed in our TFF task fits into a frequency within the alpha band, whereas the longer ISI employed in the AM task fits into a frequency within the theta band. On the one hand, this is consistent with our hypothesis that it is the time frames of neural processing (oscillatory and connectivity patterns) that determine our temporal resolution and temporal binding. It is not a coincidence, by this logic, that the TFF fits into the duty cycle of alpha. However, further causal evidence (see below), and a wider variety of tasks with different temporal ranges and levels of task complexity, is needed to confirm a precise mapping between temporal window of stimulus processing and oscillatory cycles at the corresponding frequencies.
In terms of causal manipulations, it is interesting to note that the key cortical areas found in this study are in agreement with previous findings obtained with TMS. For example, timing processes in the visual domain have been causally linked to V5/MT using temporal discrimination tasks (Salvioni, Murray, Kalmbach, & Bueti, 2013; Bueti, Bahrami, & Walsh, 2008; Bueti, van Dongen, & Walsh, 2008; for a review, see Mioni, Grondin, Bardi, & Stablum, 2020). There is also evidence linking transcranial alternating current stimulation (tACS) at specific frequencies with binding processes in the visual system (for a review, see Ghiani et al., 2021). Using the TFF tasks, Battaglini et al. (2020) showed that participants tended to integrate two subsequent flashes more often (i.e., they tend to report just one flash) when 10-Hz tACS (i.e., alpha tACS) was applied over V5/MT of the right hemisphere and surrounding extrastriate regions. Moreover, our results are in line with studies showing distinct roles for V5/MT and parietal lobe for spatiotemporal resolution of perception and motion extrapolation (Battelli, Cavanagh, & Thornton, 2003; Battelli et al., 2001) and with theoretical models proposing the existence of a when pathway involving V5/MT and the parietal lobe of the right hemisphere (Battelli, Pascual-Leone, & Cavanagh, 2007). Interestingly, patients with right parietal damage do not have impairment in low-level temporal processing as measured by flicker detection thresholds (Battelli et al., 2003). Instead, parietal patients have shown a bilateral deficit in AM perception (Battelli et al., 2001), whereas deficits in other attentional tasks, such as multiple-object tracking, were present only in the hemifield contralateral to the parietal lesion. Such dependence of AM perception on right hemisphere areas—irrespective of stimuli presentation hemifield—closely matches our data showing the involvement of a network of right hemispheric regions. Together, these results suggest that the parietal cortex of the right hemisphere may serve as a main control hub for theta-driven spatiotemporal integration in visual perception.
The importance of the current replication and extension of previous results using this same (Ronconi & Melcher, 2017; Ronconi et al., 2017) or similar paradigms (Milton & Pleydell-Pearce, 2016; Wutz, Muschter, van Koningsbruggen, Weisz, & Melcher, 2016; Wutz, Weisz, Braun, & Melcher, 2014; Mathewson et al., 2009; Varela et al., 1981) is heightened by recent null findings on alpha oscillations and perception. As already introduced, recent studies have reported no effect of alpha phase on stimuli detection (Ruzzoli et al., 2019), visual awareness and accuracy (Benwell, Coldea, Harvey, & Thut, 2022; Benwell et al., 2017), or RTs (Vigué-Guix, Morís Fernández, Torralba Cuello, Ruzzoli, & Soto-Faraco, 2022). In another study probing temporal processing of both flashes and sounds, Buergers and Noppeney (2022) found no effect of alpha frequency (both as an individual trait and as a varying state) on visual integration. Given these null findings, the relevance of the current work is twofold: First, it reinforces the idea that the phase of ongoing alpha band oscillations can shape conscious perception, and that this contribution is critical in the integration and segregation of visual stimuli. Second, in contrast to Buergers and Noppeney (2022), we provide evidence of the role of ongoing alpha oscillations in pacing visual perception, by demonstrating a pattern of connectivity between V2 and V5/MT critically specific to the upper alpha band and to the segregation of visual stimuli. Reasons for this discrepancy might include the multifaceted nature of alpha oscillations, because there is evidence that there is no “single” alpha frequency in the brain (Womelsdorf, Valiante, Sahin, Miller, & Tiesinga, 2014). Alpha can be linked to bottom–up sensory processing and, thus, to neural activity more linked to thalamo-cortical loops (Bollimunta, Mo, Schroeder, & Ding, 2011; Hughes, Lőrincz, Turmaine, & Crunelli, 2011). Such alpha oscillations may represent the “hard-wired” temporal precision of our visual system that can be measured even in a resting state (Drewes et al., 2022; Ronconi et al., 2022; Samaha & Postle, 2015). However, alpha oscillations are also implicated in cortical networks more connected to parieto-occipital (and potentially fronto-occipital) feedback signaling (Halgren et al., 2019; Van Kerkoerle et al., 2014). This second aspect of alpha should change, at least partly, based on the degree of top–down control required by the task (Wutz et al., 2018). The scenario is further complicated by the fact that different parameters of alpha can be analyzed (i.e., power, phase, frequency; Keitel, Ruzzoli, Dugué, Busch, & Benwell, 2022), but they are not completely independent (Nelli et al., 2017), and also by the fact that—as reviewed in the Introduction section—alpha is connected to perception in both a state-like and a trait-like fashion (resting-state vs. task-related alpha). This multifaceted and complex nature of alpha oscillations might at least in part explain the discrepancy between different studies.
We should acknowledge also potential limits of the present study, which include the fact that in testing our hypothesis that rapid visual segmentation would correlate with stronger connectivity, the results were corrected only for the different frequencies analyzed, in a hypothesis-driven approach, and so the relative findings should be taken cautiously. In addition, the study focused only on two temporal intervals. Although there is variation in temporal thresholds (Drewes et al., 2022; Battaglini et al., 2020; Samaha & Postle, 2015), we used a single ISI for each task across observers. This choice prioritized maintaining the same physical stimulus characteristics across trials and observers, so that any difference in neural activity could be linked to the subjective perceptual response. The downside of this choice is that we it made our paradigm insensitive to potential individual differences.
Finally, it should be noted that another limit of the specific design of our experiment, in which two different temporal integration tasks were studied in interleaved trials, was that we could not disentangle whether the specific networks involved, and the frequency of their sampling rhythm, could change as a function of stimulus presentation hemifield. This would have required a much larger number of trials. Although the involvement of right-lateralized networks in this task, irrespective of stimuli presentation hemifield, is consistent with previous studies in patients reviewed above, future work is needed to confirm this hypothesis.
To summarize, the current results demonstrate the existence of two networks for visual temporal integration in right-lateralized cortical regions that have been traditionally included in the human where and when visual pathways: a first and faster network involving early visual areas (V2 to V5/MT) that determines the basic temporal resolution of perception at the speed of the alpha oscillation, and a second slower network involving parietal regions (IPS) that had a key role in the integration of more complex spatiotemporal events at a theta speed. The different sampling frequencies involved, alpha and theta, according to the present findings reflect the activity of different cortical networks, their different spatial extensions, and connectivity patterns. Overall, these findings contribute to elucidate the neural mechanisms that transfer the continuous inflow of sensory information into coherent and interpretable temporal sequences of events.
Corresponding author: Luca Ronconi, School of Psychology, Vita-Salute San Raffaele University, Via Olgettina 58, 20132, Milan, Italy, or via e-mail: [email protected].
Data Availability Statement
Data are available upon motivated request to the authors.
Author Contributions
Luca Ronconi: Conceptualization; Data curation; Formal analysis; Investigation; Methodology; Visualization; Writing—Original draft; Writing—Review & editing. Elio Balestrieri: Investigation; Methodology; Writing—Review & editing. Daniel Baldauf: Methodology; Resources; Software; Supervision; Writing—Review & editing. David Melcher: Conceptualization; Data curation; Funding acquisition; Methodology; Project administration; Resources; Supervision; Writing—Original draft; Writing—Review & editing.
Funding Information
H2020 European Research Council (https://dx.doi.org/10.13039/100010663), grant number: StG Agreement 313658.
Diversity in Citation Practices
Retrospective analysis of the citations in every article published in this journal from 2010 to 2021 reveals a persistent pattern of gender imbalance: Although the proportions of authorship teams (categorized by estimated gender identification of first author/last author) publishing in the Journal of Cognitive Neuroscience (JoCN) during this period were M(an)/M = .407, W(oman)/M = .32, M/W = .115, and W/W = .159, the comparable proportions for the articles that these authorship teams cited were M/M = .549, W/M = .257, M/W = .109, and W/W = .085 (Postle and Fulvio, JoCN, 34:1, pp. 1–3). Consequently, JoCN encourages all authors to consider gender balance explicitly when selecting which articles to cite and gives them the opportunity to report their article's gender citation balance.