Many aspects of perception and cognition are supported by activity in neural populations that are tuned to different stimulus features (e.g., orientation, spatial location, color). Goal-directed behavior, such as sustained attention, requires a mechanism for the selective prioritization of contextually appropriate representations. A candidate mechanism of sustained spatial attention is neural activity in the alpha band (8–13 Hz), whose power in the human EEG covaries with the focus of covert attention. Here, we applied an inverted encoding model to assess whether spatially selective neural responses could be recovered from the topography of alpha-band oscillations during spatial attention. Participants were cued to covertly attend to one of six spatial locations arranged concentrically around fixation while EEG was recorded. A linear classifier applied to EEG data during sustained attention demonstrated successful classification of the attended location from the topography of alpha power, although not from other frequency bands. We next sought to reconstruct the focus of spatial attention over time by applying inverted encoding models to the topography of alpha power and phase. Alpha power, but not phase, allowed for robust reconstructions of the specific attended location beginning around 450 msec postcue, an onset earlier than previous reports. These results demonstrate that posterior alpha-band oscillations can be used to track activity in feature-selective neural populations with high temporal precision during the deployment of covert spatial attention.
The visual system is composed of interacting populations of neurons that are specialized for processing distinct stimulus features, such as orientation or spatial location (Riesenhuber & Poggio, 1999). An emerging hypothesis is that oscillatory mechanisms have a role in routing information through networks of these feature-selective neural populations to effect goal-directed behavior (Akam & Kullmann, 2014; Zumer, Scheeringa, Schoffelen, Norris, & Jensen, 2014; Saalmann, Pinsk, Wang, Li, & Kastner, 2012; Jensen & Mazaheri, 2010; Fries, 2005). During covert spatial attention, the scalp distribution of power in the alpha-band (8–13 Hz) of human M/EEG recordings has been observed to covary with attended locations (Capotosto, Babiloni, Romani, & Corbetta, 2009; Van Gerven & Jensen, 2009; Rihs, Michel, & Thut, 2007; Kelly, Lalor, Reilly, & Foxe, 2006; Sauseng et al., 2005; Worden, Foxe, Wang, & Simpson, 2000). Furthermore, stronger phase synchronization between alpha oscillations over frontal sensors and posterior sensors contralateral to the attended hemisphere has also been observed (Sauseng et al., 2005), suggesting that specific topographic phase relationships may code the attended location.
Given the tight correspondence between these alpha-band dynamics and the focus of spatial attention, we investigated whether topographic patterns of alpha power and/or phase could be used to reconstruct activity in neural populations tuned to spatial location using a multivariate inverted encoding model (IEM). Multivariate IEMs model the relationship between neural activity and stimulus features using hypothesized response profiles, which can then be used to reconstruct the neural representation of novel stimuli that vary with respect to the trained feature (Sprague & Serences, 2015). Previous work applying IEMs to fMRI data has been able to reconstruct the perception of and STM for basic visual features, such as color, orientation, and spatial location (Ester, Sprague, & Serences, 2015; Sprague, Ester, & Serences, 2014; Ester, Anderson, Serences, & Awh, 2013; Kok, Brouwer, van Gerven, & de Lange, 2013; Sprague & Serences, 2013; Ho et al., 2012; Scolari, Byers, & Serences, 2012; Brouwer & Heeger, 2009, 2011). Applying this approach to EEG provides estimates of neural representation on a finer temporal scale (Garcia, Srinivasan, & Serences, 2013) and can be used to assess hypotheses about the functional specificity of oscillatory activity within particular frequency bands.
To test whether neural activity in the alpha-band tracks the focus of spatial attention, we took two approaches. First, we trained a linear support vector machine (SVM) to decode which of six possible spatial locations participants were attending using the topography of different frequency bands. Decoding was only successful for the alpha band. Next, using the time-resolved scalp topography of alpha power (but not phase), we were able to reconstruct the temporal evolution of the deployment of selective attention to six different visual field locations. These results demonstrate the utility of IEMs to measure feature-selective neural representations encoded over multivariate population level activity in near real time.
Data were taken from the publicly available BNCI Horizon 2020 research consortium database (www.bnci-horizon-2020.eu) and were first reported by Treder, Bahramisharif, Schmidt, van Gerven, and Blankertz (2011). Eight healthy volunteers (seven men; aged 18–27 years) participated in this study. All had normal or corrected-to-normal vision. All participants gave written consent, and the study was performed in accordance with the Declaration of Helsinki.
Stimuli and Procedures
Each trial began with a central fixation dot surrounded by six white disks (eccentricity: 9°, size: 3.27° of visual angle) lasting 1000 msec. A cue indicating the location to attend appeared in the center of the screen for 200 msec. After a variable duration of 500–2000 msec, a 200-msec target appeared in either the attended circle (80% of trials) or a different, randomly selected location (20% of the trials). To increase difficulty, a 200-msec mask (“*”) was presented after each target (see Figure 1). Stimuli were presented on a 24-in. thin-film transistor LCD screen with a refresh rate of 60 Hz and a resolution of 1920 × 1200 pixels. The experiment was implemented in Python using the open-source BCI framework Pyff (Venthur et al., 2010) with Pygame (www.pygame.org).
Targets were either a “×” or a “+” and were determined randomly on each trial. To reduce lateralized preparatory motor activity, participants had to indicate which target they saw via a thumb press on a button box in their right or left hand. To ensure that the cue did not have an orientation that covaried with the orientation of the attended location (otherwise decoding and reconstructions could be based on cue orientation, rather than spatial attention, per se), a symbolic, omnidirectional color cue was used. This was a hexagon with each of the six faces pointing to one of the target discs. Three of the faces were gray, and the other three were colored blue, red, and green. One of these colors was assigned as the cue (color counterbalanced across participants). In 50% of trials, the cue–target interval was 2000 msec; it varied randomly between 500 and 2000 msec on the remaining trials. This helped ensure that participant's attention was sustained throughout the trial. The task was 600 trials spread over six blocks of 100 trails each, with ∼2-min rests in between blocks. Participants were instructed to maintain fixation throughout the trial and to respond as quickly and accurately as possible.
EEG Acquisition and Analysis
EEG was recorded at 1000 Hz from a Brain Products (Munich, Germany) 64-channel actiCAP. Electrodes were referenced against a nose electrode, and impedances were kept below 20 kΩ. Eye movements were monitored with an EOG electrode placed below the right eye. Referencing Fp2 against the EOG and F10 against F9 created vertical and horizontal bipolar EOG channels, respectively. Data were downsampled offline to 200 Hz.
Only trials where the cue–target interval was 2000 msec were analyzed. Data from these trials were cut into −800:2500 msec epochs, relative to cue onset. Each epoch was carefully screened for eye-related or muscle-related artifacts, leaving an average of 241 artifact-free trials per participant, and an average of 40 observations for each attended location per participant. Time–frequency decomposition was performed using EEGLAB (Delorme & Makeig, 2004) and custom code, running in a MATLAB environment (The Mathworks, Natick, MA). Single trial data were convolved with a family of wavelets, spanning 4–30 Hz, in 1-Hz steps, with wavelet cycles increasing linearly between three and eight cycles as a function of frequency. Power and phase were extracted from the resulting complex time series by squaring the absolute value of the time series (μV2) in the case of power and by taking the phase angle (Matlab function angle.m) in the case of phase. Alpha power was defined by averaging power between 8–13 Hz. The circular average of this frequency range created the phase time series. Power data were normalized by dividing power at each time point by the average power from the precue baseline period of −350 to −50 msec. Because the general linear model used in the IEM may not optimally capture relationships with circular variables-like phase, we also conducted the same IEM phase analysis (see below Multivariate IEM) using the complex values from the wavelet decomposition after normalizing the amplitude components by dividing each complex number by the absolute value of the real component of the number. This analysis resulted in channel responses virtually identical to those we report based on phase angles only (see Results).
To assess whether the spatial distribution of power during the cue–target interval could be used to decode the attended location, we applied an SVM classifier to our data (implemented in the MATLAB Machine Learning Toolbox). SVMs were constructed with a linear kernel and default parameters supplied by the software (box constraint = 1, sequential minimal optimization separating algorithm). For each electrode, data were averaged into different frequency bands between 3 and 28 Hz (see Figure 2B) and then averaged over the cue–target interval 1000–1900 msec. We focused on this interval to avoid cue-evoked responses and temporal smearing of target-evoked responses because of wavelet decomposition. This procedure resulted in one power value per channel, per frequency band, as input to the classifier. Leave-one-trial-out cross-validation, iterated over every trial, ensured that training and testing data were independent. We evaluated the statistical significance of classification against theoretical chance (1/6), but also because this comparison can overestimate significance with a small number of observations (Combrisson & Jerbi, 2015), we also compared performance against classification obtained from a prestimulus baseline (−350 to −50 msec).
To statistically assess the robustness of the reconstructed tuning functions, we compared the amplitude of the on-channel response (i.e., the attended location) averaged over the covert attention interval 1000–1900 msec to both the amplitude of the response during a prestimulus baseline (−350 to −50) and to the amplitude of the most distant nonattended channel (i.e., 180°) during attention using a nonparametric permutation procedure. Condition labels were randomly shuffled, and condition differences were computed and saved. This was repeated 20,000 times, forming a distribution of difference scores under the null hypothesis that there is no amplitude difference between conditions. If the true difference between conditions exceeded the 95 percentile of this distribution, it was considered statistically significant at α < 0.05. The selectivity of the tuning functions was also analyzed across time (Figures 3A and 4A) by using linear regression to estimate the slope of responses of channels tuned to equidistant locations at each time point (Foster, Sutterer, Serences, Vogel, & Awh, 2015). More positive slopes indicate greater selectivity to the attended location, whereas a large negative slope would indicate selectivity to the location opposite the one cued. 99% confidence intervals were generated empirically with a bootstrap analysis in which data from all participants were pooled and sampled with replacement 80,000 times, generating a distribution of slopes around the group mean slope. If zero (the null hypothesis of no selectivity) fell outside the lower 99th percentile of this distribution, it was considered statistically significant at p < .01, one-tailed.
Mutual Information Analysis
To quantify which electrodes contained information about the attended location, we computed the mutual information (MI) between each electrode and the cued location using code provided in Cohen (2014). MI describes the extent to which uncertainty about the state of one variable (e.g., cued location) is reduced by knowledge of the state of another variable (e.g., electrical brain activity; Shannon, 1948). When MI equals 0 bits, the two variables are statistically independent. MI was computed for each electrode using a number of bins determined by the rule of Freedman and Diaconis (1981).
RTs to targets were significantly lower on validly versus invalidly cued trials. (t(1, 7) = 5.62, p < .001; see Figure 1B), which also held true when using a log transform (t(1, 7) = 5.92, p < .001), and when no trials were excluded based on oculomotor artifacts (t(1, 7) = 4.49, p < .01). Mean discrimination accuracy was 95% correct (SEM = 1.8%) following valid cues and 79% correct (SEM = 11%) following invalid cues. Although this difference did not reach statistical significance (p = .23), it suggests that there was no speed–accuracy tradeoff. These results indicate that participants were indeed allocating attention to the cued location.
Decoding Attended Location
The topography of alpha power varied systematically with attended location (Figure 2A), which allowed for significant classification, as compared against both theoretical chance (i.e., 1/6, t(1, 7) = 3.80, p = .006) and an empirical baseline (t(1, 7) = 3.46, p = .010; Figure 2B). Extending prior work (Treder et al., 2011; Bahramisharif, van Gerven, Heskes, & Jensen, 2010; Van Gerven & Jensen, 2009), we found that successful decoding was specific for the alpha band; no other frequency range analyzed allowed for significant classification (ps > .177).
Reconstructing the Focus of Attention
To determine whether patterns of alpha-band oscillations could be used to reconstruct population level, feature-selective tuning profiles, we implemented two IEMs, one based on the topography of alpha power and one based on alpha phase topographies. IEMs can be thought of as a form of targeted data reduction that allows for quantification of characteristics of the underlying feature-specific neural population, such as its response selectivity (Sprague & Serences, 2015). Figure 3A shows the temporal evolution of reconstructed channel tuning functions, with time points with significant selectivity marked with black squares. The response shows a characteristic Gaussian profile, with maximal amplitude for the attended location (0°) and a steady reduction in responses from neural populations tuned to locations further away, with this profile beginning around 450 msec and persisting throughout the sustained attention interval. To further quantify these reconstructions, we compared the amplitude of the response in the attended channel with the response in that channel during the baseline interval using a permutation test (see above Multivariate IEM). This revealed significantly higher amplitude responses from neural populations tuned to the attended location during attention as compared with baseline (Mdiff = 0.125 μV2, p = .017) as well as significantly higher responses in the attended channel than the channel tuned 180° away, (Mdiff = 0.279 μV2, p = .001), indicating that the reconstructions tracked the deployment and the location of spatial attention (Figure 3B). An MI analysis revealed a posterior distribution of informative electrodes (Figure 3B), consistent with the topographies in Figure 2A.
Because spatial attention may also be supported by long-distance phase synchronization in the alpha band (Sauseng et al., 2005), we determined whether phase topographies could be used to reconstruct the attended location. If, for example, the phase relationship between frontal and posterior regions reliably covaries with attended location, reconstructions should be successful. In contrast to this proposal, however, we did not observe robust reconstructions based on alpha-band phase topography. Figure 4A shows the reconstructed tuning functions over time, which suggests a transient, cue-evoked response but no sustained tuning during the attention interval. This was supported statistically via comparison with both the baseline attended channel response (p = .54) and the response in the channel tuned 180° away from the attended location (p = .78; Figure 4B). These results were virtually identical when using amplitude-normalized complex values as input to the phase-based IEM (see Methods). As anticipated, the MI analysis revealed no clear scalp topography of location-informative electrodes.
To assess whether eye movements may have been a confound in these results, we tested for a correspondence between EOG activity and cued location in several ways. Horizontal EOG amplitudes during sustained attention did not differentiate attention to horizontal from vertical locations (t(1, 7) = 1.8, p = .12). SVM classifiers trained on horizontal and vertical EOG activity at each sample during the epoch resulted in above-baseline classification at only one sample (out of 200) during the sustained attention interval. Because this sample was not neighbored by other significant samples, it is likely spurious. Lastly, EOG-based classification accuracy did not reliably predict EEG-based alpha decoding accuracy (r = 0.37, p = .36), in line with a previously published decoding analysis of this same data set (Treder et al., 2011) that also found no correlation (r = 0.029, p = .75) when examining EEG and EOG accuracies to the best classifiable pairs of locations. These analyses suggest that eye movements do not explain the EEG results.
We present evidence that the topography of alpha-band power allows for tracking the focus of spatial attention with relatively fine temporal resolution. Our decoding analysis revealed that spatial location information was predominantly carried in the alpha band. Our IEM revealed that population level neural activity in the alpha band showed robust spatial tuning during covert attention. Mechanistically, it has been proposed that alpha oscillations play a role in coordinating information flow between feature-selective neural populations (Saalmann et al., 2012; Siegel, Donner, & Engel, 2012; Jensen & Mazaheri, 2010). The fact that the topography of our MI analysis was maximal over occipital and parietal sensors is consistent with the idea that alpha rhythms may be responsible for coordinating activity between spatial receptive field maps in visual and parietal cortex (Saygin & Sereno, 2008; Silver, Ress, & Heeger, 2005), rather than within only a single feature-selective region.
That the IEM based on alpha-band phase topography failed to reconstruct the focus of sustained attention suggests that phase relationships across electrodes may not contain feature specific information. Prior findings of long-distance phase coupling during spatial attention (Sauseng et al., 2005) may reflect a more general signature of the engagement of top–down control mechanisms in frontoparietal attentional systems (Kundu, Johnson, & Postle, 2014; Capotosto et al., 2009). Although alpha phase may be controlled in other ways (Samaha, Bauer, Cimaroli, & Postle, 2015; Bonnefond & Jensen, 2012), these findings suggest a privileged role for the dynamics of alpha-band power in the implementation of covert spatial attention.
Multivariate, information-based techniques, such as IEMs, are offering new insight about the mechanisms that underlie core cognitive functions (Naselaris & Kay, 2015; Serences & Saproo, 2012). For covert spatial attention, results from univariate analyses have been interpreted as evidence for multistage accounts of attentional control, with early bilateral “alpha desynchronization” (emerging within 700 msec postcue) reflecting a reorienting operation and a later pattern of sustained asymmetric “alpha synchronization,” beginning around 700 msec postcue, corresponding to sustained attention (Rihs, Michel, & Thut, 2009). Results from our IEM, however, do not follow this pattern. Instead, the time course of the power-based reconstructions in Figure 3A showed a short-lived window of significant selectivity concurrent with cue onset (likely driven by a cue-evoked response, rather than spatial attention) with relativity sustained tuning emerging around 450 msec. This estimate of 450 msec is congruent with the well-characterized behavioral time course of cued shifts of spatial attention, which reveals benefits around 400–500 msec postcue (Müller, Teder-Sälejärvi, & Hillyard, 1998; Cheal, Lyon, & Gottlob, 1994). Future work will be needed to reconcile the interpretations arising from univariate versus multivariate analyses. These results demonstrate that IEMs provide a useful new technique to track the deployment of spatial attention in near real time from extracranial signals.
We would like to thank Dr. Barry Giesbrecht and Dr. Javier Garcia for code and analytic guidance. This study was funded by MH095984 to Bradley R. Postle.
Reprint requests should be sent to Jason Samaha, 1202 W. Johnson Street, Madison, WI 53703, or via e-mail: email@example.com.