Abstract
Central to our understanding of how visual-evoked potentials (VEPs) contribute to visual processing is the question of where their anatomical sources are. Three well-established measures of low-level visual cortical activity are widely used: the first component (“C1”) of the transient and multifocal VEP, and the steady-state VEP (SSVEP). Although primary visual cortex (V1) activity has often been implicated in the generation of all three signals, their dominant sources remain uncertain due to the limited resolution and methodological heterogeneity of source modelling. Here, we provide the first characterisation of all three signals in one analytic framework centred on the “cruciform model”, which describes how scalp topographies of V1 activity vary with stimulus location due to the retinotopy and unique folding pattern of V1. We measured the transient C1, multifocal C1, and SSVEPs driven by an 18.75 Hz and 7.5 Hz flicker, and regressed them against forward models of areas V1, V2, and V3 generated from the Benson-2014 retinotopy atlas. The topographic variations of all four VEP signals across the visual field were better captured by V1 models, explaining between 2 and 6 times more variance than V2/V3. Models with all three visual areas improved fit further, but complementary analyses of temporal dynamics across all three signals indicated that the bulk of extrastriate contributions occur considerably later than V1. Overall, our data support the use of peak C1 amplitude and SSVEPs to probe V1 activity, although the SSVEP contains stronger extrastriate contributions. Moreover, we provide elaborated heuristics to distinguish visual areas in VEP data based on signal lateralisation as well as polarity inversion.
1 Introduction
Visual evoked potentials (VEPs)—electrical scalp potentials emanating from the brain in response to visual stimuli—are a central tool for electrophysiological work in both basic and clinical research domains to study the mechanisms of visual-cognitive processes and how they differ in a variety of clinical disorders (Creel, 2019; Kothari et al., 2016). Three main VEP-eliciting techniques are widely used to probe low-level visual processing in distinct, complementary ways. First, transient VEPs elicited by abruptly presented, isolated stimuli reveal a full cascade of processing stages in response to a discrete sensory event, with the earliest, “C1” component (60–90 ms), drawing particular interest for questions regarding how soon visual processing can be influenced by cognitive factors such as attention (Luck, 2014). Second, the steady-state VEP (SSVEP), a narrow-band spectral component generated by rhythmically flickering stimuli, can be used to trace evoked response amplitude continuously over time for a prolonged stimulus (Regan, 1966). Third, multifocal VEPs (mfVEP) are generated using rapid, statistically precise stimulation protocols across the full visual field (Baseler et al., 1994; James, 2003; see Lalor et al., 2006 for a related technique), providing a temporal profile of aggregate, rudimentary processing of pulses within a long, rapid sequence. While the transient VEP offers a view of more processing stages, mfVEPs and SSVEPs offer high signal-to-noise ratios (SNR) from shorter recording times, as well as the simultaneous probing of multiple visual locations.
Given the prominence and complementarity of these three distinct VEP signals, it is important to know what their cortical sources are, not only to expand the insights that they can provide individually but also to understand how they relate to one another. Source analysis studies have modelled the cortical origins of early visual components such as the C1 component of both the transient (Di Russo et al., 2002, 2005) and multifocal (James, 2003; Slotnick et al., 1999) VEP, as well as the SSVEP (Di Russo et al., 2007), with the implicated areas typically including primary visual/striate cortex (V1). However, much uncertainty remains regarding the dominant sources of these signals due to the ill-posed nature of the inverse problem in source analysis—that many dipole configurations can explain a given scalp potential distribution. This is particularly problematic for the visual system where many distinct areas lie in close proximity or have similar cortical orientation, which can lead to “crosstalk”, where signals from one area masquerade as those from another (Bradley et al., 2016; Cottereau et al., 2015; Dale & Sereno, 1993; Felleman & Essen, 1991; Grech et al., 2008; Hauk et al., 2022; Zorzos et al., 2021). What is more, there exist many different source analysis methods, with different settings and assumptions to constrain them, which can yield different source estimates (Jonmohamadi et al., 2014; Mahjoory et al., 2017).
However, the fact that visual brain regions are retinotopically organised (Felleman & Essen, 1991) can provide crucial additional traction on the problem of distinguishing their scalp-projected activity. This is particularly true at early stages of the visual hierarchy where receptive fields are small (Dumoulin & Wandell, 2008), allowing characteristic cortical positions and folding patterns to make clear predictions of retinotopically varying scalp topographies. Indeed, this forms the basis of recent, improved source analysis methods that constrain estimates using functional retinotopic maps (J. M. Ales et al., 2010; Hagler, 2014; Hagler & Dale, 2013; Hagler et al., 2009). The primary example of this kind of geometric means of source identification is the “cruciform model” of V1 morphology. V1 extends over a medial-occipital area encompassing the Calcarine sulcus, with the upper visual field projecting to its ventral floor and lower field to its dorsal ceiling, and the medially-facing banks of the sulcus where visual field locations near the vertical meridian are represented (Benson et al., 2014; Halliday & Michael, 1970; Wandell et al., 2009). This geometry predicts that a V1 signal should invert in polarity when stimuli appear in the upper versus the lower visual field and become increasingly lateralised as stimuli are presented closer to the vertical meridian. This logical principle formed the basis of the original claims that the C1 component of the transient VEP is generated in V1 (Clark et al., 1994; Jeffreys & Axford, 1972). The same logic has also been applied to the C1 component of the mfVEP (Lalor et al., 2012; Slotnick et al., 1999), but not yet to the SSVEP, the sources of which have thus far been indicated only by source analysis (Di Russo et al., 2007).
The validity of relying on the cruciform model to identify V1 as a source for scalp potentials has, however, been debated (J. M. Ales et al., 2013; Kelly, Schroeder, et al., 2013; Kelly, Vanegas, et al., 2013). Some studies had focussed on an abbreviated form of the cruciform model centred on polarity reversal across the horizontal meridian, but a problem with this is that extrastriate areas V2 and V3 also reverse in cortical surface orientation between the upper and lower visual field (Edwards & Drasdo, 1987; Halliday & Michael, 1970; Maier et al., 1987). J. M. Ales et al. (2010) corroborated this with fMRI-informed topographic simulations, arguing therefore that polarity inversion across the horizontal meridian is not a unique identifier of V1. J. M. Ales et al. (2013) further pointed out that some animal neurophysiology work indicates considerable overlap in the latencies of responses in V1 and many extrastriate areas (Bullier & Nowak, 1995; Chen et al., 2007; Maunsell & Gibson, 1992; Nowak et al., 1995; Raiguel et al., 1989; Robinson & Rugg, 1988; Schroeder et al., 1998). These points together imply that early, polarity-inverting VEP components like the C1 might be as consistent with V2/V3 as they are with V1. However, while this indeed undermines the sole use of polarity reversal to identify a V1 source (J. M. Ales et al., 2010), it fails to take account of additional topographic shifts predicted by the full cruciform model for stimuli near the vertical meridian, where foci should become more ipsilateral near the upper vertical meridian and more contralateral near the lower vertical meridian (Kelly, Vanegas, et al., 2013). Since V2 and V3 do not mimic this aspect of V1 cruciform geometry, a full consideration of geometry across the entire visual field should generate topographic predictions unique to V1.
The goal of this study was to carry out a full “geometric analysis” of the transient VEP C1, the mfVEP C1, and the SSVEP to establish the extent to which retinotopically varying topographies can distinguish between sources in V1 and V2/V3 due to their distinct cortical morphology. Further, we leverage all three signals to estimate response latency differences across the three visual areas. To do this, we recorded transient VEPs, mfVEPs, and SSVEPs at fast and slow flicker rates, and analysed their topographic variation across the visual field with respect to forward-model predictions derived from the publicly available Benson-2014 retinotopy atlas (Benson et al., 2014) for V1, V2, and V3. This study complements recent work using retinotopy-constrained source estimation (RCSE) methods to distinguish V1, V2, and V3 activity (J. Ales et al., 2010; Hagler, 2014; Hagler & Dale, 2013; Hagler et al., 2009) by taking a distinct viewpoint and approach. Rather than using individual functional retinotopic maps to fit individual VEP data, we used grand-average retinotopic maps to fit grand-average VEP data. Generally speaking, there is an inherent trade-off between individual-based and grand-average-based approaches from a source-modelling quality perspective. Using individual data unsmoothed by averaging offers potentially more distinctive topographic variation across locations and areas to constrain the models and distinguish those areas, but reaping this benefit depends on how accurately those individual variations are measured despite noise in individual fMRI, which in practice usually calls for additional model adjustments to improve fit. It has been found that source localisation based on a grand-average can generally perform favourably compared to individual localisation owing to higher signal-to-noise ratio in the grand-average (Poncet & Ales, 2023). Ultimately, modelling at the grand-average level allowed us to focus on the characteristic topography features across the visual field for sources in different visual areas, allowing us to provide a pathway to understand the VEP topographical trends indicative of different visual areas and thus enable a degree of reliable source inference in the absence of individual fMRI data. Through this approach, we established that V1 makes the dominant contribution to all four VEP signals. Although extrastriate areas were able to explain some variance both on their own and alongside V1, when included with V1 in the same regression models of the full visual field, their weights did not build substantively until distinctly beyond the C1 peak in the transient VEP, and built little if at all in the mfVEP. Meanwhile, modelling full sinusoidal time courses for the fundamental SSVEP frequencies revealed frequency-dependent relative extrastriate contributions and possible response lags for V2 and V3 on the order of tens of milliseconds, relative to V1. Overall, our topographic analyses offer a comprehensive information source for embellishing and strengthening the heuristics that are used for distinguishing V1 sources from V2/V3 for any number of stimulus locations, emphasising not just gross polarity reversals but also highly diagnostic topographic shifts within and across quadrants.
2 Methods
2.1 Participants
Ten healthy young adults took part in this experiment (5 Females, 10 right handed) with a mean age of 23.5 years (SD = 3.3). They were compensated for their participation with a lump sum of €20. All participants gave written informed consent to be included in the study, were over the age of 18, had normal or corrected to normal vision and reported no neurological or psychiatric conditions. All operations were approved by the Human Research Ethics for Sciences board of University College Dublin and adhered to the guidelines set out in the Declaration of Helsinki.
2.2 Stimuli
Stimuli were presented using Psychtoolbox-3 (Brainard, 1997; Kleiner et al., 2007; Pelli, 1997) in a dark, sound-attenuated chamber on a 1024 x 768 Dell E771p CRT monitor (32.5 X 24.5 cm) operating at a refresh rate of 75 Hz. Participants were seated at a chin rest at a distance of 57 cm from the monitor. They maintained a stable gaze on a white fixation cross, which was displayed against a uniform grey field (64 cd/m2, half the monitor’s maximum luminance). Stimuli were constructed from an annular “dartboard” pattern of alternating black and white checks, and had an inner and outer radius of 2.75o and 7.25o (see Fig. 1A). This was divided angularly into 16 wedges of 22.5o polar angle, each containing a 4 x 4 check pattern (angle x eccentricity, equally spaced), which spanned the visual field. This full-field stimulation allowed us to measure the retinotopic patterns of visually evoked EEG topographies and compare them to predictions made by the Benson-2014 retinotopic atlas (Benson et al., 2014). The retinotopic layout of V1 and V2 in this average brain is shown in Figure 1B-C, demonstrating the cruciform morphology of V1 as it passes through the Calcarine sulcus. This morphology motivates the cruciform model of V1 (centre of Fig. 1D) as a means of determining striate origin for scalp signals, which should exhibit a specific sequence of topography shifts as a function of retinotopic stimulus position. Figure 1D also outlines the cortical morphology of V2 and V3 (non-cruciform) along with V1, all of which predict polarity reversal across the horizontal meridian. Crucially however, V1 predicts a pattern of topography shifts within quadrants of the visual field that is not shared by V2 or V3 since the latter do not have cruciform morphologies. This is outlined in Figure 1D by illustrating typical dipole orientations (arrows) that shift from aligned to opposed between the horizontal and vertical meridian when comparing V1 and V2, while V3 dipole orientation does not change within quadrants.
2.3 Procedure
Participants observed a series of visual stimulation protocols designed to elicit a transient VEP, a pattern-pulse mfVEP, and an SSVEP, for each of the 16 wedges. They were given no task other than to maintain steady eye gaze on a central fixation cross. Across the three protocols, overall stimulation time was approximately 70 minutes (45, 10, and 15 minutes for the transient VEP, mfVEP, and SSVEP, respectively) and participants had the opportunity to take regular breaks. Although the absence of a task may have allowed participants to be inattentive or low in arousal, such states would likely result in signal amplitude changes globally (Eason et al., 1969) and so were not of concern for the retinotopic variations of interest.
The transient VEP protocol consisted of 200 trials for each location in which stimuli were flashed for 50 ms in random sequence, separated by 500 ms with the fixation cross always on the screen. This was divided into seven blocks of approximately 7 minutes each. In the mfVEP protocol, the 16 wedges followed orthogonal rapid pulse sequences that were governed by 16 time-lagged copies of a zero-autocorrelation series of binary numbers derived from a 4095 frame m-sequence (Baseler et al., 1994; James, 2003). Stimulus pulses lasted two frames (26.6 ms) and occurred at transitions from -1 (the “off digit”) to 1 (the “on digit”). To limit the overall pulse speed, three out of every four pulses were discarded, as has been done elsewhere (Vanegas et al., 2013), which yielded a protocol with 256 stimulus pulses for each location extending across a time period of 54.6 seconds. This was repeated 10 times with different m-sequences yielding 2560 pulses per location. The fixation cross appeared on the screen one second prior to the first pulse in each repetition. The SSVEP protocol included a fast (18.75 Hz) and a slow (7.5 Hz) subdivision. The 18.75 Hz flicker was produced by a sequence of two “on” and two “off” frames. The 7.5 Hz flicker was produced by a sequence of 10 frames (5 on and 5 off). Each wedge flickered at both frequencies for six unbroken periods of 5 seconds. To reduce overall stimulation time, two locations 180o apart flickered simultaneously, one at each frequency. Locations were stimulated in random order, separated by a self-paced gap (the participant clicked the mouse to proceed to the next location). The fixation cross appeared on the screen 1 second prior to the start of each trial.
2.4 Data acquisition
EEG data were recorded at 512 Hz by an ActiveTwo Biosemi system with 128 scalp electrodes following the Biosemi ABC layout (Biosemi, The Netherlands) and six external flat-faced electrodes placed above and below the left eye, on the left and right outer canthi, and on the left and right mastoids. Eye gaze and blinks were monitored via the four electrooculograms and using an Eyelink Plus 1000 Tower system (SR Research, ON, Canada) recording at 1000 Hz.
2.5 Data processing
EEG processing was carried out using a combination of in-house Matlab scripts (Mathworks, The United Kingdom) and EEGLAB routines (Delorme & Makeig, 2004). These processes were similar across the three stimulation protocols except that trials containing artifacts were not rejected for the mfVEP or the SSVEP as this would complicate the analysis due to their continuous stimulation streams. This is justified since both signals are robust to artifacts because the SSVEP has a much narrower spectral band than do artifacts and the mfVEP has a very large number of trials making it robust to occasional artifacts.
Across the board, continuous data were low pass filtered by convolution with a 77-tap hanning-windowed sinc function with a 3-dB corner frequency of 35.3 Hz and 50 Hz attenuation (mains) of 83.5 dB. Long segments of persistently high noise in individual channels were linearly interpolated using EEGLAB’s eeg_interp function. To identify these, the data were partitioned into short segments of 5 seconds and low- and high-frequency standard deviation (following a 5 Hz low pass filter and subtraction of this filtered data, respectively) were calculated and subjected to thresholds of 20 μV and 10 μV. Only long segments with a consistent sequence of seven or more such segments were interpolated, and short gaps between such long segments (two or fewer segments) were also interpolated.
The data were then re-referenced to the average of all scalp channels, and low frequency trends were removed. To avoid distortions that can result from typical high pass filters (Acunzo et al., 2012), a linear detrending approach was taken instead. This entailed partitioning the data into segments and detrending each segment by subtracting the line of best fit for each electrode. The segment durations were adjusted slightly so that the partition edges did not coincide with any epoch window. These durations differed across the stimulation protocols to take their individual requirements into account. For the transient VEP, a duration of 4 seconds was used in order to span multiple trials and avoid biasing the fitted line with any trial order effects. For the SSVEP, where unbroken epochs of 4 seconds were needed, the duration was set to 5 seconds. Finally, for the mfVEP, where at any moment in time a large number of trials are at various stages of their time course and avoiding partition points inside epochs is impossible, a shorter duration of 500 ms was used.
At this point, for the transient VEP, continuous data were then subjected to in-house artifact detection routines to identify and label time points with electrode “pops”, slow wave drift, muscle activity and blinks, removing the affected channels from analysis during the affected time periods. Pops were identified by the absolute value of a 20-ms-lag amplitude difference exceeding 50 μV. Slow drift and muscle activity artifacts were detected using discrete Fourier transforms of successive 1-second partitions. Slow drift was identified if a ratio exceeding 5:1 was found between the maximal frequency-domain amplitudes in the 0–3 Hz and the 3–7 Hz ranges, and muscle activity was identified if a similar ratio between the 20–40 Hz and 3–7 Hz ranges exceeded 2:1. Blinks were identified both with the eye-tracking data (for full blinks) and via the difference between the upper and lower vertical electrooculograms (for partial blinks), which was transformed by taking its first derivative (to remove any slow baseline shifts) and applying a 200 ms moving average (to extend blink detection to the periods immediately before and after the main blink where EEG may still be affected). Data points where this signal exceeded 1 μV were labelled as blinks. Eye tracking data were used to identify saccades, which were defined as gaze displacements over a 10 ms period in excess of 0.2o. Trials were rejected if either saccades or blinks took place within a window of -50 ms to 200 ms surrounding stimulus onset. The average trial-loss rate across participants was 11.3%, (3.3% specifically due to saccades), with a maximum of 28.7% (leaving a minimum of 142 trials per participant per location).
Trials for the mfVEP and SSVEP were not removed based on eye tracking data because they included a small number of continuous data streams. Nevertheless, to quantify fixation performance, we calculated the average number of saccades per trial. For this purpose, we used the saccades identified by Eyelink as for each one of these events, our algorithm often found several smaller events that Eyelink had merged. Despite this, agreement between the two methods was high as, across participants, between 94% and 99% of Eyelink-defined saccade events were within 50 ms of one found by our algorithm (measured as distances between the start of events). Thus, the average number of saccades in excess of 0.2o per 5-second SSVEP trial was found to be between 0.4 and 4.8 across participants (M=1.4). In mfVEP trials, which were longer (54.6 seconds), these figures were 1.3 and 35.8 (M=16). In the transient VEP trials, they were 0.04 and 0.29 (M = 0.1) for an epoch window of -100 ms to 400 ms centred on stimulus onset. Although eye movements influence the retinotopic position of stimuli of central relevance in this paper, the impact of this is simply to add noise to the retinotopic position of stimuli rather than introduce any systematic bias. To ensure that no systematic bias in gaze behaviour across stimulus locations was present in SSVEP or transient VEP trials, we calculated the correlation between gaze and stimulus position along the horizontal and vertical plane for each participant, finding that none were significant. This kind of fixation bias would not be feasible in mfVEP trials due to the fast pace of stimulus pulsing. Instead, as a measure of gaze stability, we calculated the interquartile range (IQR) of horizontal and vertical gaze deviation in each 54.6-seconds run. Across participants, these were between 0.28o and 1.01o (M=0.55o), and between 0.18o and 0.93o (M=0.49o) respectively. For the SSVEP, the IQRs for horizontal and vertical gaze displacement during stimulus flicker were between 0.22o and 0.82o (M=0.48o), and between 0.31o and 0.74o (M=0.49o). For the transient VEP, the IQRs for horizontal and vertical gaze displacement within -100s and 400 ms of stimulus onset were between 0.27o and 0.97o (M=0.43o), and between 0.19o and 0.7o (M=0.38o).
2.6 VEP signal derivation
For the transient VEP and the mfVEP, data were divided into stimulus-locked epochs ranging from -100 ms to 400 ms and averaged for each of the 16 wedges separately. Note that for the mfVEP, the resultant waveforms were almost identical (r=0.98) when the continuous data were regressed against the stimulus pulse matrix. Grand-average waveforms for each signal and wedge were then produced by averaging across participants. Topographies were calculated between 70 and 80 ms to correspond with the C1 timeframe.
For the SSVEP, the first 500 ms after flicker onset were discarded in order to exclude the onset-related VEP, and a 4-second epoch was extracted from that point onwards. Each of these 4-second epochs (six per location and SSVEP frequency) was transformed to the frequency domain by means of a discrete Fourier transform, and the flicker frequencies of 18.75 Hz and 7.5 Hz were extracted from the spectrograms. These were then averaged across repetitions to yield participant averages, which were in turn averaged across participants. Fourier coefficients are complex numbers, whose angle in the complex plane represents phase and whose magnitude (distance from origin) represents amplitude. Whereas it is common practice in SSVEP studies to analyse magnitude only, here our analysis had to remain in the complex domain to preserve phase differences, so that oscillatory components that are opposite in polarity, and hence 180o apart in phase, could be correctly captured as amplitudes of opposite sign in the topographies in a similar way to the C1 above. Assuming that SSVEP oscillations evoked by different visual field locations have a similar phase across a given visual brain region (Hagler, 2014), two oppositely-oriented cortical surface sections of that region will evoke oscillations of opposite phase on the scalp (Vanegas et al., 2013, 2015). We thus determined a dominant reference phase by computing the “principal phase axis”—the line through the origin in the complex plane to which all electrodes and retinotopic locations best align—and projected each complex FFT value onto this axis. In this way, components with opposite polarity would land on opposite ends of the principal phase axis and have opposite signs. We assigned positive and negative polarity to the two ends of the axis in such a way that upper field locations close to the horizontal meridian produced negative topographic foci in accordance with the C1. Importantly, because the same axis is applied to all locations, this choice of polarity assignment can have no influence on the nature of polarity changes across the visual field or the within-quadrant topography shifts that are differentially characteristic of V1, V2, and V3. It therefore does not bias the results to demonstrate striate generation of the SSVEP or topographic similarity between the SSVEP and the C1. The principal phase axis was estimated and applied at the grand-average level. Although this could also be done at the single-subject level to accommodate individual differences in the SSVEP phase, we opted for the grand-average estimate to avoid noisy measurements of the phase across stimulus locations within individual subjects.
In all cases, epoch times were adjusted to account for display timing differences across stimulus locations resulting from the raster scan of CRT monitors, which for our monitor incurred a presentation delay of 11.5 ms between the pixels at the very top versus bottom of the monitor. Assuming a constant raster scan speed, this was remedied by applying a correction to stimulus presentation times proportional to the distance of the stimulus from the top of the monitor. This was done based on the average vertical position of all pixels within the stimulus.
2.7 Forward modelling
To predict EEG topographies for sources in V1, V2, and V3 based on the stimuli used to elicit the empirical VEPs, the cortical surface area corresponding to each wedge was estimated in an MNI-ICBM152 non-linear average brain (Grabner et al., 2006). To do this, the Benson-2014 retinotopic atlas (Benson et al., 2014) was fit to the MNI-152 average brain, which facilitated the estimation of retinotopic maps for the visual areas. Following reconstruction and volumetric segmentation of the MNI-152 average brain using Freesurfer (http://surfer.nmr.mgh.harvard.edu/; Dale et al., 1999; Fischl & Dale, 2000; Fischl et al., 1999; Salat et al., 2004), the Benson-2014 atlas was applied using the Neuropythy Python module (Benson & Winawer, 2018) and regions of interest (ROIs) were defined for the segments of the visual field encapsulated by each of the 16 wedges for each visual area. This was used to generate a list of voxel coordinates corresponding to the grey matter surface area for each wedge in each visual area.
The Matlab toolbox Fieldtrip (Oostenveld et al., 2010) was then used to generate a conductivity head-model based on this brain (boundary element model; BEM). To generate a forward model, this was co-registered with the Biosemi electrode coordinates by manually rotating the electrode coordinates to align with the head model. The forward model was then used to derive scalp topographies resulting from dipoles located at each of the voxels determined by the above ROIs, the orientations of which were constrained by the surface normal of the corresponding cortical surface area. To find the surface normal at a given voxel, we applied linear regression to the 5 x 5 x 5 cube of voxels (5 mm3) centred on it to find the line that best predicted voxel brightness values from their position coordinates, or in other words, best discriminated white from grey matter. These single-voxel topographies were then averaged over all voxels corresponding to each wedge (assuming uniform activation across all voxels) to yield the final predicted topographies for each wedge. This whole process was repeated for V1, V2, and V3 to yield predictions of retinotopically mapped topographies for each wedge and each of the three visual areas.
3 Results
3.1 V1 provides the best fit to empirical VEPs
Figure 2 shows the grand-average topographies of each empirical VEP signal (Fig. 2A-D) and the topographies predicted for V1-3 based on the Benson-2014 retinotopy atlas (Fig. 2E-G). Each of the VEPs demonstrated a pattern of topographies that inverted in polarity between the upper and lower visual field. In addition to this, they tended to be ipsilateral in the upper visual field and contralateral in the lower field, shifting laterally in these same directions as stimuli came closer to the vertical meridian within each quadrant. The only exception to this trend was the lower field of the 7.5 Hz SSVEP, which showed a surprising pattern that we will address later (see section “Phase shift in the 7.5 Hz SSVEP”).
The MRI-predicted topographies for V1 had a similar pattern, also showing polarity inversion across the horizontal meridian, ipsilateral upper-field topographies, and contralateral lower-field topographies, as well as within-quadrant patterns of upper-ipsilateral and lower-contralateral shifts (Fig. 2E). This is because in V1, locations near the vertical meridian are represented along the medial wall whereas those near the horizontal meridian are represented inside the Calcarine sulcus (Fig. 2E, also see Fig. 1B). V2 (Fig. 2F) and V3 (Fig. 2G) predicted polarity inversion across the horizontal meridian as well but they tended to predict contralateral topographies in the upper field, and V2 topographies tended to be ipsilateral in the lower visual field. Within quadrants, V2 topographies shifted in the opposite direction to those of V1 and for V3 they shifted little if at all (see also Fig. 6).
To measure these qualitative patterns quantitatively, correlations (Pearson’s r) were calculated between the empirical and MRI-predicted topographies (treating electrodes as observations). To capture changes in similarity across visual field locations, these correlations were calculated for each location separately (Fig. 3). The qualitative patterns outlined above were largely recapitulated. V1 showed positive correlations at all locations for all VEPs except for the 7.5 Hz SSVEP in the lower field; V2 showed negative correlations near the horizontal meridian and positive correlations near the vertical meridian reflecting its prediction of an opposite pattern of within-quadrant topography shifts; and V3 showed negative correlations at most, but not all, locations.
In order to quantify how well the predicted topographies captured the overall pattern of retinotopic topography shifts in the empirical VEP signals, topographic correlations were also calculated across all locations together by stacking observations into a single 2048 long vector (128 electrodes x 16 locations). To assess statistical significance for these correlations, location was shuffled for the predicted topographies (within each visual area) and the correlations recalculated 1,000,000 times to generate null distributions. These distributions, along with their corresponding point estimates and p-values, are shown in Figure 4A. V1 significantly correlated with all four VEPs positively, whereas V2 correlated only with the 7.5 Hz SSVEP (positively), and V3 correlated negatively with all VEPs except the 7.5 Hz SSVEP.
To specifically assess whether the within-quadrant topographic variations were captured by each MRI prediction, new null distributions were generated by shuffling locations only within their respective quadrants so that correlations that relied solely on between-quadrant topographic variations would no longer be significant. These distributions and the p-values for their corresponding point estimates are shown in Figure 4B. V1 continued to show significant positive correlations across the board, while neither V2 nor V3 showed correlations that significantly differed from the new null distributions. These within-quadrant-shuffled null distributions themselves served a second purpose. They captured the component of topographic correlations due purely to between-quadrant topographic variations of the VEP signals. These between-quadrant variations are the most relevant for typical perceptual/cognitive VEP studies in which it is important to infer whether VEP signals originate in V1, because most studies do not fully map retinotopy and only have a single stimulus location in a given quadrant. As such, we compared these distributions for V1, V2, and V3, using permutation tests to generate confidence intervals of the pair-wise differences between the means (Fig. 4C), to determine which visual area best accounts for the between-quadrant topographic variations in the VEP signals. These indicated that V1 provided a better match than V3 (p < .0001) and V2 (p < .0001), in all cases except the 7.5 Hz SSVEP for which V2 provided a better match than V1 (p < .0001). V2 also provided a better match for the transient C1 than V3 (p < .0001).
Since the above analysis demonstrated that V1 captures the between-quadrant topographic variation of most VEP signals better than V2/V3, we sought to identify the characteristic topographic features of each visual area and VEP signal that underlie this effect. As can be seen in Figure 2, the most prominent topographic features that varied across visual field locations were polarity and left-right lateralisation. To facilitate comparisons of these patterns across visual areas and VEP signals, we calculated a metric of left-right lateralisation (see Fig. 5 for calculation details). This demonstrated that topographic foci for all VEP signals were ipsilateral in the upper visual field and contralateral in the lower visual field, in accordance with previously observed characteristics of the C1 component (Di Russo et al., 2002). This was also true of V1 topographies whereas the reverse was true of V2 topographies, and V3 topographies were closer to midline. The topographies underlying these distinct patterns, along with the within-quadrant topography shifts for each visual area, are highlighted in Figure 6, which further highlights locations for each visual area that are the most divergent from the other two visual areas. Figures 5 and 6 illustrate a valuable extension to heuristics that are applied to identify a VEP signal or component as having a source in V1 as opposed to extrastriate areas such as V2/V3 in studies that do not measure VEPs from the entire visual field. Previously, such judgements have often been made solely on the basis of polarity reversal between the upper and lower visual field, which is an insufficient criterion since V2 and V3 also do this (J. M. Ales et al., 2010). However, the patterns outlined here reveal that by considering polarity reversal, within-quadrant shifts in topography, and slight overall differences in topography lateralisation across upper/lower hemifields, more robust criteria can be determined (see Fig. 6), especially when making judgements based on multiple stimulus locations.
3.2 Modelling signal contributions from V1, V2 and V3
Individually, V1 provided a more complete account of the pattern of VEP signal topographies observed around the visual field than did V2/V3. However, this does not rule out the possibility that all three visual areas contribute to some degree to generating these signals. To address the extent of such contributions to each signal, we modelled various combinations of visual areas using multiple linear regression. Stacking all 16 visual field locations, each regression model had 2048 observations (16 x 128) and the outcome variable was signal amplitude at each observed channel. The predictor variables were the MRI-predicted amplitudes for each visual area. All variables were converted to z-scores prior to carrying out the regressions. Seven such models were carried out for each VEP signal, one for each subset of the three visual areas: three models based on each visual area alone, three models based on each combination of two areas, and the model including all three visual areas.
The results of these models are shown in Figure 7, with details provided in Table 1. Among the single-area models, V1 explained the most variance (in line with the earlier analysis), explaining 52% for the transient C1, 53% for the multifocal C1, 40% for the 18.75 Hz SSVEP, and 25% for the 7.5 Hz SSVEP (compared with 6%, 8%, 25%, and 14% for the next best fitting visual area in each case). Including all three visual areas explained additional variance (Fig. 7A) and produced lower BIC values across the board (Fig. 7B). This resulted in an extra 2% variance explained for the transient C1, 3% for the multifocal C1, 19% for the 18.75 Hz SSVEP, and 6% for the 7.5 Hz SSVEP, compared with the V1-only model. In the three-area model, the beta coefficient for V1 was largest in all cases, being more than twice the magnitude of that for V2/V3 for all four VEPs (Table 1). The lower BIC values for the three-area model in each case indicated that these improvements in model fit justified the increased complexity. A follow-up model with area MT additionally included is given in the Supplementary Material. The increased complexity of this four-area model was also justified based on BIC compared with the above three-area model.
. | V1 . | V2 . | V3 . | V1&V2 . | V1&V3 . | V2&V3 . | V1-3 . | |
---|---|---|---|---|---|---|---|---|
Transient C1 | βV1 | 0.72 | - | - | 0.76 | 0.7 | - | 0.7 |
βV2 | - | 0.2 | - | -0.09 | - | 0.38 | -0.01 | |
βV3 | - | - | -0.25 | - | -0.14 | -0.42 | -0.14 | |
R2 | 0.52 | 0.04 | 0.06 | 0.53 | 0.54 | 0.18 | 0.54 | |
BIC | 4317 | 5735 | 5681 | 4294 | 4239 | 5412 | 4246 | |
Multifocal C1 | βV1 | 0.73 | - | - | 0.78 | 0.7 | - | 0.72 |
βV2 | - | 0.17 | - | -0.13 | - | 0.36 | -0.04 | |
βV3 | - | - | -0.28 | - | -0.17 | -0.44 | -0.15 | |
R2 | 0.53 | 0.03 | 0.08 | 0.54 | 0.56 | 0.18 | 0.56 | |
BIC | 4269 | 5757 | 5650 | 4214 | 4153 | 5409 | 4155 | |
SSVEP (18.75 Hz) | βV1 | 0.63 | - | - | 0.78 | 0.57 | - | 0.67 |
βV2 | - | -0.09 | - | -0.39 | - | 0.16 | -0.22 | |
βV3 | - | - | -0.5 | - | -0.41 | -0.57 | -0.3 | |
R2 | 0.4 | 0.01 | 0.25 | 0.53 | 0.56 | 0.27 | 0.59 | |
BIC | 4778 | 5799 | 5234 | 4286 | 4145 | 5187 | 4017 | |
SSVEP (7.5 Hz) | βV1 | 0.5 | - | - | 0.42 | 0.54 | - | 0.49 |
βV2 | - | 0.37 | - | 0.21 | - | .38 | 0.11 | |
βV3 | - | - | 0.15 | - | 0.23 | -0.02 | 0.18 | |
R2 | 0.25 | 0.14 | 0.02 | 0.29 | 0.31 | 0.14 | 0.31 | |
BIC | 5222 | 5511 | 5771 | 5122 | 5078 | 5517 | 5066 |
. | V1 . | V2 . | V3 . | V1&V2 . | V1&V3 . | V2&V3 . | V1-3 . | |
---|---|---|---|---|---|---|---|---|
Transient C1 | βV1 | 0.72 | - | - | 0.76 | 0.7 | - | 0.7 |
βV2 | - | 0.2 | - | -0.09 | - | 0.38 | -0.01 | |
βV3 | - | - | -0.25 | - | -0.14 | -0.42 | -0.14 | |
R2 | 0.52 | 0.04 | 0.06 | 0.53 | 0.54 | 0.18 | 0.54 | |
BIC | 4317 | 5735 | 5681 | 4294 | 4239 | 5412 | 4246 | |
Multifocal C1 | βV1 | 0.73 | - | - | 0.78 | 0.7 | - | 0.72 |
βV2 | - | 0.17 | - | -0.13 | - | 0.36 | -0.04 | |
βV3 | - | - | -0.28 | - | -0.17 | -0.44 | -0.15 | |
R2 | 0.53 | 0.03 | 0.08 | 0.54 | 0.56 | 0.18 | 0.56 | |
BIC | 4269 | 5757 | 5650 | 4214 | 4153 | 5409 | 4155 | |
SSVEP (18.75 Hz) | βV1 | 0.63 | - | - | 0.78 | 0.57 | - | 0.67 |
βV2 | - | -0.09 | - | -0.39 | - | 0.16 | -0.22 | |
βV3 | - | - | -0.5 | - | -0.41 | -0.57 | -0.3 | |
R2 | 0.4 | 0.01 | 0.25 | 0.53 | 0.56 | 0.27 | 0.59 | |
BIC | 4778 | 5799 | 5234 | 4286 | 4145 | 5187 | 4017 | |
SSVEP (7.5 Hz) | βV1 | 0.5 | - | - | 0.42 | 0.54 | - | 0.49 |
βV2 | - | 0.37 | - | 0.21 | - | .38 | 0.11 | |
βV3 | - | - | 0.15 | - | 0.23 | -0.02 | 0.18 | |
R2 | 0.25 | 0.14 | 0.02 | 0.29 | 0.31 | 0.14 | 0.31 | |
BIC | 5222 | 5511 | 5771 | 5122 | 5078 | 5517 | 5066 |
3.3 Temporal dynamics of model fits for the transient and multifocal VEP
The models for the transient and multifocal VEPs have thus far been static in the sense that they have modelled a single measurement window of 70–80 ms following stimulus onset to correspond with the C1 peak latency. However, to demonstrate that the results generalise beyond this chosen time window, we applied the modelling procedure to a series of 10-ms VEP measurement windows in 5-ms steps from -100 to 400 ms. This produces time courses of activation for V1, V2, and V3, analogous to those derived previously using RCSE (J. Ales et al., 2010; Hagler, 2014; Hagler & Dale, 2013; Hagler et al., 2009) and they help to inform judgements about what time window to use to measure activity in a particular visual area. For example, it has been suggested that the rising slope of the C1 may be the most appropriate to measure V1 activity without overlap from other visual areas (Foxe & Simpson, 2002; Plomp et al., 2010; Vanni et al., 2004). Tracing the relative dominance of the areas over time will provide further insight into this.
Figure 8 shows the time courses of R2 values and β-coefficients for both the single-area and three-area models of the transient and multifocal VEPs. Note that the scale of the β-coefficients is different to the earlier models simply because the signals were z-scored based on means and standard deviations from different measurement windows (70 to 80 ms before and -100 to 400 ms here). The time courses of R2 values indicated that the transient VEP began with a peak at 70-80 ms, consistent with the C1 component, that was mostly explained by V1 (Fig. 8A). This was also true of the mfVEP (Fig. 8D), although there were additional pre-stimulus R2 peaks, reflecting a small amount of auto-correlation within the multifocal pulse streams (r=0.1) incurred by the use of one out of every four pulses (see Methods). The initial peaks at 70–80 ms were followed by a sequence of recurring peaks, variably explained by striate and extrastriate areas (Fig. 8A&D). Correspondingly, the time courses of β-coefficients for both the single-area models (Fig. 8B&E) and full models (Fig. 8C&F) exhibited a sequence of positive and negative peaks in each visual area, the latencies of which are provided in Supplementary Tables S1-4. Although the dominance of V1 can be observed to persist through the C1 timeframe across the board, the nature and extent of contributions from areas V2 and V3 differ in the full models compared with the single-area models (compare Fig. 8B&E with Fig. 8C&F). In the single-area models, in which any variance possible must be attributed to that one area, the V1 model’s initial positive peak coincided with a negative peak in the V3 model of almost half the magnitude and a delayed positive peak in the V2 model (Fig. 8B&E). In the full model, where variance is free to be attributed to any area to achieve best fit, V2 showed an initial negative peak alongside V3, and these were approximately one fifth the size of the concurrent V1 peak (Fig. 8C&F). By contrast, the size of the V1 peak did not undergo any major change between the single-area and full models. Inferring the true degree of extrastriate contributions is complicated by the fact that the predicted topographies for the different visual areas are correlated to varying extents (Table 2), which can allow one area to account for explained variance that is really due to another area, and this “cross-talk” can in principle occur in both the single-area and full models. However, here, a comparison across signal types can provide valuable insight. In the mfVEP, V1 remains the dominant contributor throughout the waveform (Fig. 8D) and there exists at least one moment (e.g., approximately 120 ms) where all model coefficients pass through zero because there is no variance to pick up on (Fig. 8D-F; Fig. 10B). These features are highly inconsistent with a cascade of activity across multiple visual areas, and much more consistent with a multiphasic response emanating from a sole generator in V1 (Lalor et al., 2012; Slotnick et al., 1999). It would follow that any periods of non-zero amplitude in the extrastriate waveforms estimated from the mfVEP purely result from cross-talk due to the partial topographic correlations (Table 2). Consistent with this account, the non-zero extrastriate contributions during the C1 window (70–90 ms) in the single-area models of the mfVEP have the same sign as these correlations among predicted topographies, positive for V2 and negative for V3 (Table 2; Fig. 8E). Together, this all suggests that the extrastriate contributions in the mfVEP can serve as a benchmark for the degree of cross-talk that stands to occur in the transient VEP (Fig. 8C). Notably, the extrastriate areas make similar initial contributions to the transient VEP as to the mfVEP, suggesting that this too is the product of cross-talk (Fig. 8B&E). In both VEPs, these initial extrastriate contributions are also diminished in the multi-area model while V1 coefficients remain just as strong. Taken together, these comparisons suggest that cross-talk, whereby extrastriate areas account for V1-driven variations, occurs strongly in the single-area models where V1 is not itself included to account for them, and more weakly in the full models. Moreover, the estimated extrastriate contribution in the transient VEP appears commensurate with that expected from cross-talk according to the mfVEP during the C1 timeframe. However, in the transient VEP unlike the mfVEP, extrastriate contributions become clearly dominant beyond the C1 timeframe (Fig. 8A-C).
In light of the dominance of V1 over V2 and V3 during the C1 time-frame, both in terms of variance explained and the magnitude of β-coefficients, we calculated the difference in absolute β-coefficients between V1 and each other area to determine the C1-measurement window where V1 contributions are maximally stronger. These are plotted along with V1 β-coefficients in Figure 9 for both the single-area models and full models, and both the transient and multifocal VEP. These plots suggest that V1’s superiority over other areas hits its peak at approximately the same time as V1 itself hits its peak, at approximately 70–90 ms in line with C1 peak latency. Assuming that the bulk of EEG noise does not scale with amplitude, this suggests that strategies of measuring V1 activity during the rising slope of the C1 to avoid overlap with other signals (Foxe & Simpson, 2002; Plomp et al., 2010; Vanni et al., 2004) may not achieve the best possible SNR of V1 responses.
3.4 Phase shift in the 7.5 Hz SSVEP
The only signal that did not qualitatively match the pattern of topographies predicted by V1 across the full visual field was the 7.5 Hz SSVEP, which deviated from the V1 pattern in the lower visual field (Fig. 2D). This break in pattern was accompanied by a near-orthogonal shift in its phase between upper and lower visual field locations (see peak latency variability in the time domain, Figure 10D, and rotations in the complex plane, Fig. 10E). This stood in contrast to the relatively stable phase of the 18.75 Hz SSVEP (Fig. 10C&E) and a lack of clear variation in C1 latency across locations for the transient C1 (Fig. 10A) and the multifocal C1 (Fig. 10B). Topographies for the 7.5 Hz SSVEP at this orthogonal phase (calculated by projecting electrodes onto the plane orthogonal to the principal phase plane in Fig. 10E) appeared more aligned to V2 or V3 (compare the inner circle of Fig. 10F with Fig. 2E-G). Thus, although the static models establish that V1 is the dominant generator of this slower SSVEP, a further analysis accounting for temporal dynamics in the SSVEP was warranted to explore the nature of secondary contributions from V2/V3.
In what follows, we will first present simulations designed to provide a mechanistic intuition for how this phase shift might have emerged. In short, it will outline how this could happen if the SSVEP consists of oscillations from multiple visual areas operating at different phases, which could in principle stem from response latency delays across visual areas. In the subsequent section, we will then extend the earlier regression model to include a sinusoidal time component in order to explicitly model different phases across visual areas and demonstrate that this can indeed produce phase shifts of the sort seen empirically.
To explore how this phase shift in the scalp-recorded signal may have emerged, we considered how oscillations originating in V1 and V2/V3 that project to the same scalp electrode would sum together. We considered scenarios in which oscillations of opposite polarity operate at slightly different phases because V1 and V2/V3 likely produce signals of opposite polarity on the scalp due to their opposing geometry and because V2/V3 responses likely do lag V1 (Givre et al., 1994; Mitzdorf, 1987; Schroeder et al., 1991, 1998; Xing et al., 2009). In short, summing oscillations of this kind does produce orthogonal phase shifts. Without any transmission delays among visual areas at the cortical level, the summation would simply lead to destructive interference on the scalp with the larger signal prevailing. However, a transmission delay introduces a phase shift in the sum according to a classic trigonometric identity: . In other words, this states that the phase of the sum is the midpoint of the constituent phases (the “+π” element of the operand in the first term represents opposing polarity in the summed oscillations). If the cortical phase delay, d, is small, then the resulting phase shift at the scalp level is indeed close to 90o because (π±d)/2 is close to 90o. Because this fundamental mathematical principle offers an elegant way to produce an orthogonal phase shift, we reasoned that it could be at the heart of the observed phase shift. However, it assumes the summed oscillations have the same magnitude, whereas in reality, the signals in V1 versus V2/V3 likely differ in amplitude depending on visual field location (e.g., Ming et al., 2020), and even simply due to varying scalp proximity; for example, upper field stimuli project to ventral sites further from the scalp (Fig. 1B-C; Fig. 11A). What is more, the empirical phase shift was observed in a specific case only: the lower visual field for the 7.5 Hz SSVEP, not the upper visual field or the 18.75 Hz SSVEP. Therefore, we continued to consider how phase shifts may depend not only on latency differences among V1, V2, and V3, but also on magnitude differences and oscillation frequency.
To explore how, we simulated sums of sinusoids. These simulations do not model topography; they model only how oscillations that arrive at the same electrode from different visual sources would sum together. Two oscillating signals of slightly different phase and opposite polarity are summed together in the simulations (i.e., V1 and V2). In short, they demonstrate that the resultant phase shift depends on the frequency of oscillation and the relative magnitude of the constituent signals. This is shown in both time-domain plots (Fig. 11B) and phasor plots (Fig. 11C). At the extremes of complete dominance of V1 or V2, the phase of the sum is entirely aligned to that of the dominant area. But as the relative magnitude shifts from one area to the other, the phase of the sum shifts monotonically from one to the other extreme, suggesting that phase can shift when the relative magnitude of the constituent signals changes. At phase delays close to 0o (right-most plot in Fig. 11C), the bulk of this phase shift takes place within a tight range of relative magnitudes close to equality, with little additional change in phase as one oscillation becomes much larger than the other. By contrast, at phase delays further from 0o (the left-most panel in Fig. 11C), phase shifts more gradually across a broader range of relative magnitudes. The same principle is shown in a different way in the time plots (Fig. 11B). Here, it can be seen that for a given time delay, phase shifts more gradually across a broader range of relative magnitudes for the 18.75 Hz oscillation than for the 7.5 Hz oscillation. This is because for faster oscillations, a given time lag is a larger phase delay.
These dynamics offer a potential explanation for the pattern of phase shifts observed in the SSVEP because the relative magnitude of signals originating in V1, V2 and V3 may change somewhat across visual field locations, either due to magnitude changes at the cortical level or simply due to changing scalp proximity. A dramatic scalp proximity change takes place at the transition point between the upper and lower visual field where V2 and V3 flip from below V1 to above it (in a coronal view); this transition point was where the empirical phase shift was seen. It was also seen specifically in the 7.5 Hz SSVEP where a given time lag constitutes a smaller phase lag than the 18.75 Hz SSVEP. According to the above simulations, smaller phase lags produce larger phase shifts for a given change in relative magnitude. Thus, these simulations demonstrate a plausible mechanism for how the surprising phase shift in the 7.5 Hz SSVEP may have emerged. In the subsequent section, we present an extension of the earlier regression analysis that models the SSVEP as a sum of sinusoidal oscillations from V1, V2, and V3, allowing for time lags across visual areas to assess whether this mechanism could indeed reproduce the observed phase shift specifically in the 7.5 Hz SSVEP.
3.5 Modelling SSVEP phase shifts
The simulations above suggested that the phase shift in the 7.5 Hz SSVEP between the upper and lower visual field could, in principle, be explained by time-lagged oscillations among V1, V2, and V3. To test this idea, we extended the earlier regression models with a sinusoidal component to capture SSVEP phase. In this model, V1, V2, and V3 followed sinusoidal time courses, each with its own phase, allowing them to lag one another. In order to investigate whether the same time lags among visual areas could produce the observed phase variations in the 7.5 Hz SSVEP simultaneously with the lack of phase variation in the 18.75 Hz SSVEP, the two SSVEPs were modelled in tandem with shared time lags. We did, however, allow magnitude to vary between the SSVEP frequencies to allow for the possibility that response magnitude from visual areas may vary as a function of stimulation rate. Since SSVEPs are not strictly sinusoidal (Norcia et al., 2015), we used the Fourier transform estimates of the SSVEPs to transform them into their equivalent sinusoidal form for the purpose of model fitting. This is justified since the purpose of this modelling effort was not to explain deviations in the SSVEPs from sinusoidal oscillation and so this simply serves to remove a source of noise that is not centrally relevant to the modelling goal.
The structure of the data for these regressions is outlined in Figure 12. As before, all regressors and dependent variables were converted to z-scores before carrying out the regressions. To construct the dependent variable, SSVEP cycles for each electrode, retinotopic location, and SSVEP frequency were stacked in a single vector to model them together (see Fig. 12A-B). Note that the precise ordering of data within this vector is not important, so long as it is done in the same way for the predictor variables. A predictor variable was then generated in the same way for each visual area and both SSVEPs (6 total), with values for a given predictor set to 0 for observations related to the other SSVEP (Fig. 12B). These regressors estimated overall magnitudes for each visual area in each SSVEP separately. The impact of time lags across visual areas was assessed by repeating this regression analysis for various different temporal lags and comparing the resultant R2 values. The regression was repeated for a series of time lags for V2 and V3 relative to V1 in steps equal to the EEG sample rate (approximately 2 ms) in the range ±66 ms (half the cycle duration of the 7.5 Hz SSVEP). Because the “baseline phase” of V1 itself is unknown a priori and has no predictable relationship across different oscillatory frequencies, we also manipulated V1’s baseline phase independently for the two SSVEP frequencies across separate regression models, spanning a full cycle in both cases. Thus, each regression model had one instantiation of four parameters: baseline phase for each SSVEP frequency, a V2 time lag, and a V3 time lag. This formulation allowed the baseline phases of the two SSVEPs to differ while still constraining them to have the same time lags across visual areas.
Because baseline phases, V2/3 time lags, and V1-3 magnitudes all varied independently, models could emerge with different combinations of these variables that were actually equivalent, as reflected in the identity . For ease of interpretation therefore, only models with all-positive magnitudes were considered. The distribution of R2 values, time lags, and magnitudes among the top 1% best-fitting of these models is shown in Figure 13. The overall best-fitting model is shown by larger circles within these distributions. This model had a lag of 20 ms for V2 and 24 ms for V3. Its weights were highest for V1 for both the 18.75 Hz SSVEP (V1: β=0.84, V2: β=0.33, V3: β=0.38) and for the 7.5 Hz SSVEP (V1: β=0.68, V2: β=0.13, V3: β=0.41). Overall, this model had an R2 of 0.39, and breaking this down by SSVEP gave an R2 of 0.5 for the 18.75 Hz SSVEP and an R2 of 0.35 for the 7.5 Hz SSVEP (note, the skew here is because 71.5% of the observations to be fit came from the 7.5 Hz SSVEP due to the different cycle lengths). By comparison, the best model out of those where V2 and V3 lags were 0 ms (analogous to the earlier static models) had an overall R2 of 0.31, which broke down as 0.25 for the 7.5 Hz SSVEP and 0.45 for the 18.75 Hz SSVEP, reflecting improvements in the overall best model of 0.1 and 0.05, respectively. As shown in Figure 14, this improvement in model fit, which was particularly high for the 7.5 Hz SSVEP, was achieved by capturing the qualitative phase dynamics of the empirical SSVEPs. The model generated phase shifts in the lower visual field for the 7.5 Hz SSVEP (Fig. 14B&D), culminating in a delayed signal (Fig. 14B), while no major phase shifts appeared in the 18.75 Hz SSVEP (Fig. 14D). There were some discrepancies however. In the empirical data, the bulk of the phase shifts took place in the 2nd and 3rd locations from the lower vertical meridian (Fig. 14A) whereas in the model they took place in the 3rd and 4th locations (Fig. 14B). Moreover, orthogonal to the principal SSVEP phase, the empirical data demonstrated a disparity in magnitudes between the upper and lower visual fields (Fig. 14E), being approximately 3 times larger in the lower field, while this difference was less obvious in the model (Fig. 14F). In fact, the difference in the model was comparable in magnitude to upper-lower field differences seen in the empirical 18.75 Hz SSVEP (approximately 2:1) and so may simply reflect general magnitude differences across hemifields rather than the exaggerated difference brought about by phase shifts in the empirical 7.5 Hz SSVEP.
4 Discussion
Visual-evoked potentials (VEPs) are widely used in Cognitive Neuroscience to gain insight into a range of brain operations from low-level visual encoding to higher-order cognitive processes. These various operations are studied using a mix of three complementary VEP techniques, and it is vital to determine the shared and distinct cortical sources of these VEP signals if we are to fully understand those operations and how they relate to one another. The goal of this study was to shed new light on the cortical origins of three low-level visual signals, the C1 component of the transient and multifocal VEP and the steady-state VEP (SSVEP), elicited at a fast (18.75 Hz) and slow (7.5 Hz) contrast flicker rate. Using regressions of grand-average data against MRI-predicted scalp topographies for stimuli across the visual field, we found that V1 provided a substantially better account of scalp data than V2 or V3 for all four signals, and outlined a set of key V1-predicted topographic features which underlie this result. While the explained variance of the VEP data was incrementally improved with the addition of each of a series of extrastriate regions, multi-area models indicated that the bulk of extrastriate activations come substantially later in time, with those contributions emerging distinctly after the C1 peak in the transient VEP, with a most probable time lag of 20–30 ms in the SSVEP, and potentially not emerging at all in the multifocal VEP. In the following, we recap and integrate the findings across the VEP signals and analyses to lay out the basis for these conclusions.
Our forward-modelled topographies for V1, V2, and V3 predicted polarity inversion across the horizontal meridian in all cases, consistent with the forward models of J. M. Ales et al. (2010), thus bolstering their conclusion that polarity inversion alone is an insufficient criterion for identifying a V1 source. However, above and beyond this polarity inversion, all four of the empirical VEP signals showed components of topographic variation both between and within visual field quadrants that each strongly distinguished V1 from extrastriate sources. Specifically, topographic foci were ipsilateral in the upper visual field and contralateral in the lower visual field, becoming more so as stimuli approach the vertical meridian. This is consistent with previous C1 reports (Clark et al., 1994; Di Russo et al., 2002), including in the multifocal VEP (Zhang & Hood, 2004), and also with simulations carried out by J. M. Ales et al. (2013). Only V1 could account for the within-quadrant topography shifts because its extension from the Calcarine sulcus out onto the medial wall is uniquely consistent with them (Kelly, Vanegas, et al., 2013). V1 also captured the between-quadrant variations of the VEP signals better than did V2 or V3. The V1-predicted topographies were ipsilateral in upper visual field locations and contralateral in lower visual field locations, like the VEP signals, and this pattern of lateralisation was reversed in V2 predictions while V3 predictions showed little change in lateralisation across visual field quadrants. These broad lateralisation trends across visual quadrants offer an important amendment to the polarity-inversion heuristic that is often used to diagnose a V1 source for VEP signals. Because V2 and V3 also predict this polarity inversion (J. M. Ales et al., 2010), to make the heuristic truly V1-specific, it should further assert that signal polarity be slightly ipsilateral in the upper visual field and contralateral in the lower visual field, becoming more so as stimuli approach the vertical meridian. Further heuristics to identify V2 and V3 sources were also suggested by the results, with V2 topographies expected to be slightly contralateral in the upper field and slightly ipsilateral in the lower field, with little if any difference expected for V3.
These simple heuristics offer a means to distinguish sources in V1, V2, and V3 with some confidence based on a single stimulus location, as is commonly available in cognitive and perceptual studies. However, the heuristics are stronger when more stimulus locations are included since the most striking distinctions between these visual areas are seen in the within-quadrant topography shifts, which are opposite in direction between V1 and V2 and absent in V3. At this juncture, a few limitations of the scope of the analyses in this study should be noted to understand the contexts in which they apply. Firstly, the results apply to the grand-average and not to individual subjects, the topographies of whom are much more heterogeneous. They also apply only to vision in the periphery as the stimuli did not extend into the foveal region. As such, they do not apply to signals resulting from central stimulation such as the foveal C1 (Dougherty et al., 2003; Hansen et al., 2016), which do not follow the cruciform model in the same way because the activated areas lie at the occipital pole, away from the calcarine sulcus (Zhang & Hood, 2004). Finally, while our primary interest in areas V1, V2, and V3 was motivated by their specific geometry in relation to the well-known polarity reversal of the C1 component across the horizontal meridian, it is important to bear in mind that these analyses did not include all visual areas that could reasonably be expected to contribute to early visual EEG signals (Cottereau et al., 2015).
Overall, V1 explained approximately 5 times as much variance in the transient and multifocal C1 components and twice as much variance in the SSVEPs as V2/V3 and had regression weightings that were approximately 4 and 3 times stronger when visual areas were modelled together. Thus, among areas V1, V2, and V3, V1 clearly makes the dominant contribution to these VEP signals. However, V2, V3, and also MT did explain significant additional variance when included alongside V1, raising the question of whether any of these signals are sufficiently isolated from extrastriate contributions to serve as a proxy for V1 activity measurements. This question has been of longstanding concern, particularly for the C1 component, and comes down to the relative strength and latencies of activation across visual areas. Although many animal electrophysiology studies find overlapping neural response latencies across visual areas, they tend to find an average temporal delay between responses in V1 and extrastriate areas (Bullier & Nowak, 1995; Chen et al., 2007; Maunsell & Gibson, 1992; Nowak et al., 1995; Raiguel et al., 1989; Robinson & Rugg, 1988; Schroeder et al., 1998). Previous attempts to model activation time courses of visual areas in humans using retinotopically constrained source estimation (RCSE) have yielded inconsistent results, with some finding simultaneous activation of V1 and V2 (J. Ales et al., 2010) and others finding that V2 and V3 both lag V1 (Hagler, 2014; Hagler & Dale, 2013). This discrepancy may be in part due to cross-talk that persists despite rich retinotopic constraints that are applied to each visual area. Our own time-resolved models of transient and multifocal VEPs found simultaneous initial activations across visual areas but there are a number of reasons to suspect that this is due to cross-talk. In fact, converging results across these and the SSVEP models are instead suggestive of extrastriate activations that peak at a considerable lag to V1. In support of the cross-talk account for the initial extrastriate activations, they were stronger in single-area models than multi-area models and were also congruent with the signs of correlations among predicted topographies of the visual areas. Moreover, key features of the multifocal VEP were strongly indicative of a sole source in V1 rather than a cascade of multiple areas, including the presence of time points following initial activation where all measures of activity passed through zero and the absence of recurrent V2 and V3 activations, suggesting that any estimated non-zero extrastriate contributions could be attributed to cross-talk. The transient VEP showed estimated extrastriate contributions that matched these inferred cross-talk contributions in polarity and magnitude during the C1 timeframe, whereas the later, first positive peaks of V2, V3, and MT were more consistent with true responses as they did not change between single- and multi-area models and were absent in the multifocal VEP. These positive-polarity peaks indicated response lags of 35 ms, 55 ms, and 70 ms, respectively, relative to V1. Long V2 and V3 lags were also predicted by models of the SSVEP (20 ms and 24 ms), where they selectively accounted for a shift in phase that was specific to the lower visual field of the 7.5 Hz SSVEP and substantially improved on a zero-lag model. Although these lag estimates are not particularly close (35 vs. 20 and 55 vs. 24 ms), the shorter lags for the SSVEP are consistent with its greater improvement in model fit with inclusion of extrastriate areas and it is possible that the steady-state nature of the SSVEP facilitates shorter delays across visual areas. Moreover, it is difficult to say what impact, if any, offset-evoked responses in the transient VEP may have had on the estimated latencies of activation across visual areas. Nevertheless, the overall level of agreement between the transient and steady-state models in pointing to a substantial extrastriate lag is remarkable because these were different signals evoked by different kinds of stimulation and the V2 and V3 lags explained an empirical phenomenon in the SSVEP that did not pertain to the transient or multifocal VEP (the phase shift). Thus, even allowing for some uncertainty in the precise estimation of these activation lags, our data from multiple stimulation protocols converge on extrastriate lags that are considerably far from 0 ms. The implication for the transient VEP is that there is a stretch of time on the order of tens of milliseconds during which V1 is the dominant contributor. Moreover, an analysis of the relative strength of beta weights over time indicated that the C1 peak represents the best measurement window for V1 activity in terms of signal-to-noise ratio.
However, the above considerations suggest a considerable temporal delay between V1 and extrastriate visual areas in both the transient VEP and the SSVEP, V1 is not as easy to isolate in the SSVEP. Whereas in the transient VEP one can simply take the C1 time window to measure V1 activity, it is not clear that one can choose a particular phase of the SSVEP to achieve the same thing. The nature of time-lagged oscillations to combine into a single oscillation when summed may fundamentally limit our ability to separate visual sources in the SSVEP even if they oscillate at different phases as indicated in this study. Consistent with this, the “static” regression models suggested considerable contributions from V2/V3 in the SSVEP (between half and a third of the amplitude of V1 and adding between 6 and 19% explained variance). That extrastriate areas contribute to the SSVEP is also strongly implicated by our finding that time-lagged oscillations across visual areas could account for phase shifts between visual field locations and stimulation frequencies in the SSVEP. Di Russo et al. (2007) also found in their source modelling work that the SSVEP was generated in both striate and extrastriate sources such as MT. One notable aspect of our results to compare with this is that we also found MT contributions in the 7.5 Hz SSVEP, a frequency almost as low as used in that study (6 Hz), but not in the 18.75 Hz SSVEP. This may reflect differing levels of MT involvement at different SSVEP frequencies but further investigation will be needed to address this question more comprehensively. Compared with the SSVEP, we found more modest extrastriate contributions to the C1 (explaining an additional 2% variance in the transient and 3% in the multifocal VEP). With a comparably greater extrastriate contribution to the SSVEP than the C1, comparisons between these two signals should always be made with this caveat in mind. One example in the literature that underscores this point is the divergence between the C1 and the SSVEP in terms of spatial attention. While SSVEP magnitude routinely increases with spatial attention (Morgan et al., 1996; Müller et al., 1998; Toffanin et al., 2009), similar effects in the C1 are much more rare (Hillyard & Anllo-Vento, 1998; Mohr et al., 2020; Qin et al., 2022). In addition to potential differences in recurrent activity contributions, the stronger extrastriate contribution to the SSVEP may partially explain this discrepancy, since higher level areas tend to show more robust spatial attention effects (Treue, 2001).
The modelling approach taken in this study was rooted in the cruciform model, determining cortical sources for visual signals based on the geometry of visual cortical areas. This approach shares similarities with a recently developed method—retinotopically constrained source estimation (RCSE)—that uses individual structural and/or functional retinotopic maps derived from MRI to constrain dipole models (J. M. Ales et al., 2010; Cottereau et al., 2015; Hagler, 2014; Hagler & Dale, 2013; Hagler et al., 2009; Inverso et al., 2016). This method was developed to address the issue in source analysis of non-unique source estimates and, to our knowledge, represents the state of the art of source analysis in the visual cortex. It does this by addressing the central reason why estimated sources are uncertain: neighbouring cortical areas that have similar aggregate surface orientation can masquerade as one another, making it impossible to distinguish them fully. By leveraging the assumption that visual responses in a visual area will be similar at different locations in the visual field, one can model a single visual response across multiple retinotopic visual field locations. Because there are systematic differences in the cortical folding patterns of different areas, this reduces the ability of neighbouring areas to mimic one another. By using individual MRI, these previous RCSE approaches further leverage individual-specific, finer variations in these cortical folding patterns for a potentially richer set of area-specific constraints. However, this key advantage of the approach is also its main challenge because small undulations in individual cortical surfaces mean that small misspecifications in the retinotopy (due to noisy fMRI measurements) can lead to large variability in dipole orientation, which impacts heavily on forward model estimates. This challenge has been addressed in a number of ways. J. M. Ales et al. (2010) slightly repositioned equivalent current dipoles based on EEG data fit to correct for fMRI misspecification. Hagler and Dale (2013) argued that this gave dipoles too much freedom to vary in orientation and instead opted to model extended cortical patches that would be less sensitive to position misspecification. Because this alone did not eliminate the problem of fMRI noise, they implemented an iterative reweighting approach to dampen the influence of visual locations whose residual error stood as outliers. Hagler (2014) developed the approach further by allowing the extended cortical patches to reposition slightly to better match grand average source estimates, yielding more consistent estimates.
In this study, we adopted a distinct grand-average based approach with the main advantage that it eliminates the need for EEG-informed adjustments of model parameters such as generator location and orientation and the weighting of visual field locations in models, reducing flexibility which, in principle, may contribute to cross-talk to some degree. Despite this benefit, there are some drawbacks to the grand-average approach that are important to highlight. The cortical surface of grand-average MRI smooths out the finer details of cortical folding, which may reduce the number of area-specific constraints available. Another issue is that the topography predicted from a grand-average brain is not the same as the grand average of topographies generated by individual brains. This is because when topographies are averaged, all the individual topographic variability remains present in the average, producing a diffuse scalp distribution. By contrast, when the cortical surface is averaged first, then this variability is removed before the topography is produced and therefore does not feature in it. Thus, the use of grand-average MRI retinotopy may misestimate grand-average topographies to some extent. However, the consequences of this are unlikely to be prohibitive as it is difficult to see how it would lead to predicted topographies that match VEP topographies artifactually. It is more likely that they would fail to predict some aspects of VEP topographies that would have been predicted by accurately specified individual retinotopy. Therefore, we would contend that this limitation affects the sensitivity but not the specificity of the approach. In relation to the results presented in this study, we therefore suggest that the dominance of V1 in explaining the VEP signals is robust, that the latency separation of V1 from extrastriate areas is robust (if perhaps uncertain in degree), but, as always, that there are no extrastriate contributions at all during the C1 time window cannot be concluded with certainty. Although these limitations could be mitigated to some extent by using individual retinotopy, particularly the second limitation, our choice of approach was ultimately motivated by the goal of the analysis. This was to highlight the topographical features that are uniquely diagnostic of early visual areas so that EEG researchers can make informed source judgements in the absence of fMRI data. In so doing, we identified a heuristic that can distinguish a V1 source from V2/V3 sources—reversing polarity between the upper and lower visual field along with a tendency towards increasingly ipsilateral topographies as stimuli are placed closer to the upper vertical meridian and increasingly contralateral topographies as stimuli are placed closer to the lower vertical meridian. Our results suggest similar heuristics to identify V2 and V3 sources as well. In both cases, signals should reverse in polarity between the upper and lower visual field, as with V1, but in the case of V2, topographies should be slightly contralateral in the upper field and ipsilateral in the lower field, while for V3, topographies should be closer to midline but slightly contralateral at all visual field locations. By highlighting these heuristics, our study allows for certain inferences about visual area sources to be made using EEG alone, broadening the scope of source inferences to a wider range of studies.
Data and Code Availability
Task scripts and MATLAB code for carrying out analyses are available at https://osf.io/gx8vm/?view_only=b014821842564e28812a11a33d72934c.
Author Contributions
Kieran S. Mohr: Conceptualisation, Data curation, Methodology, EEG analysis & forward modelling, Visualisation, Writing—first draft, Writing—review & editing, Funding acquisition, and Project administration. Anna C. Geuzebroek: Retinotopy analysis, and Writing—review & editing. Simon P. Kelly: Conceptualisation, Methodology, Writing—review & editing, Funding acquisition, and Supervision.
Declaration of Competing Interest
The authors declare no competing financial interests.
Acknowledgements
This work was funded by the Irish Research Council (GOIPG/2016/123 - to Kieran S. Mohr) and The Wellcome Trust (219572/Z/19/Z - to Simon P. Kelly). The funders had no involvement in the study.
Supplementary Materials
Supplementary material for this article is available with the online version here: https://doi.org/10.1162/imag_a_00152.