Abstract
How does the human brain encode visual object categories? Our understanding of this has advanced substantially with the development of multivariate decoding analyses. However, conventional electroencephalography (EEG) decoding predominantly uses the mean neural activation within the analysis window to extract category information. Such temporal averaging overlooks the within-trial neural variability that is suggested to provide an additional channel for the encoding of information about the complexity and uncertainty of the sensory input. The richness of temporal variabilities, however, has not been systematically compared with the conventional mean activity. Here we compare the information content of 31 variability-sensitive features against the mean of activity, using three independent highly varied data sets. In whole-trial decoding, the classical event-related potential (ERP) components of P2a and P2b provided information comparable to those provided by original magnitude data (OMD) and wavelet coefficients (WC), the two most informative variability-sensitive features. In time-resolved decoding, the OMD and WC outperformed all the other features (including the mean), which were sensitive to limited and specific aspects of temporal variabilities, such as their phase or frequency. The information was more pronounced in the theta frequency band, previously suggested to support feedforward visual processing. We concluded that the brain might encode the information in multiple aspects of neural variabilities simultaneously such as phase, amplitude, and frequency rather than mean per se. In our active categorization data set, we found that more effective decoding of the neural codes corresponds to better prediction of behavioral performance. Therefore, the incorporation of temporal variabilities in time-resolved decoding can provide additional category information and improved prediction of behavior.
1 Introduction
How does the brain encode information about visual object categories? This question has been studied for decades using different neural recording techniques, including invasive neurophysiology (Hung, Kreiman, Poggio, & DiCarlo, 2005) and electrocorticography (ECoG); Majima et al., 2014; Watrous, Deuker, Fell, & Axmacher, 2015; Rupp et al., 2017; Miyakawa et al., 2018; Liu, Agam, Madsen, & Kreiman, 2009), as well as noninvasive neuroimaging methods such as functional magnetic resonance imaging (fMRI; Haxby et al., 2001), magnetoencephalography (MEG; Contini, Wardle, & Carlson, 2017; Carlson, Tovar, Alink, & Kriegeskorte, 2013) and electroencephalography (EEG; Kaneshiro, Guimaraes, Kim, Norcia, & Suppes, 2015; Simanova, Van Gerven, Oostenveld, & Hagoort, 2010) or a combination of them (Cichy, Pantazis, & Oliva, 2014). There has been great success in reading out or decoding neural representations of semantic object categories from neuroimaging data. However, it is still unclear if the conventional decoding analyses effectively detect the complex neural codes. Critically, one potential source of neural codes in high-temporal-resolution data (e.g., EEG) can be the “within-trial/window temporal variability” of EEG signals, which is generally ignored through temporal averaging in decoding. The use of such summarized “mean” activity can hide the true spatiotemporal dynamics of neural processes such as object category encoding, which is still debated in cognitive neuroscience (Grootswagers, Robinson, & Carlson, 2019; Majima et al., 2014; Karimi-Rouzbahani, Bagheri, & Ebrahimpour, 2017b; Isik, Meyers, Leibo, & Poggio, 2014; Cichy et al., 2014; Karimi-Rouzbahani, 2018).
Here, we quantitatively compare the information content and the temporal dynamics of a large set of features from EEG time series, each sensitive to a specific aspect of within-trial temporal variability. We then evaluate the relevance of these features by measuring how well each one predicts behavioral performance. Sensory neural codes are multiplexed structures containing information on different timescales and about different aspects of the sensory input (Panzeri, Brunel, Logothetis, & Kayser, 2010; Wark, Fairhall, & Rieke, 2009; Gawne, Kjaer, & Richmond, 1996). Previous animal studies have shown that the brain encodes the sensory information not only in the neural firing rates (i.e., average number of neural spikes within specific time windows), but also in more complex patterns of neural activity, such as millisecond-precise activity and phase (Kayser, Montemurro, Logothetis, & Panzeri, 2009; Victor, 2000; Montemurro, Rasch, Murayama, Logothetis, & Panzeri, 2008). It was shown that stimulus contrast was represented by latency coding at a temporal precision of about 10 ms, whereas the stimulus orientation and the spatial frequency were encoded at a coarser temporal precision (30 ms and 100 ms, respectively; Victor, 2000). It was shown that spike rates on 5 ms to 10 ms timescales carried complementary information to the phase of firing relative to low-frequency (1–8 Hz) local field potentials (LFPs) about epoch of naturalistic movie (Montemurro et al., 2008). Therefore, the temporal patterns and variabilities of neural activity are enriched platforms of neural codes.
Recent computational and experimental studies have proposed that neural variability provides a separate and additional channel to the mean activity for the encoding of general aspects of the sensory information—for example, its “uncertainty” and “complexity” (Orbán, Berkes, Fiser, & Lengyel, 2016; Garrett, Epp, Kleemeyer, Lindenberger, & Polk, 2020). Specifically, uncertainty about the stimulus features (e.g., orientations of lines in the image) was directly linked to neural variability in monkeys' visual area (Orbán et al., 2016) and human EEG (Kosciessa, Lindenberger, & Garrett, 2021): wider inferred range of possible feature combinations in the input stimulus corresponded to wider distribution of neural responses. This could be applied to both within- and across-trial variability (Orbán et al., 2016). Moreover, temporal variability was directly related to the complexity of input images: higher neural variability for house (i.e., more varied) versus face (i.e., less varied) images (Garrett et al., 2020) and provided a reliable measure of perceptual performance in behavior (Waschke, Tune, & Obleser, 2019). The uncertainty- and complexity-dependent modulation of neural variability, which is linked to the category of input information, has been suggested to facilitate neural energy saving and adaptive and effective encoding of the sensory inputs in changing environments (Garrett et al., 2020; Waschke, Kloosterman, Obleser, & Garrett, 2021).
Despite the richness of information encoded by neural variabilities, the unclear transformation of such neuronal codes into EEG activity has led to divergent approaches used for decoding information from EEG. For example, the information in neural firing rates might appear in phase patterns rather than amplitude of EEG oscillations (Ng, Logothetis, & Kayser, 2013). Generally three families of features have been extracted from EEG time series to detect neural codes from temporal variabilities (Waschke et al., 2021): variance-, frequency- and information theory-based features, each detecting specific aspects of variability. In whole-trial decoding, components of event-related potentials (ERPs) such as N1, P1, P2a, and P2b, which quantify time-specific variabilities of within-trial activation, have provided significant information about object categories (separately and in combination; Chan, Halgren, Marinkovic, & Cash, 2011; Wang, Xiong, Hu, Yao, & Zhang, 2012; Qin et al., 2016). Others successfully decoded information from more complex variance- and frequency-based features such as signal phase (Behroozi, Daliri, & Shekarchi, 2016; Watrous, Deuker, Fell, & Axmacher, 2015; Torabi, Jahromy, & Daliri, 2017; Wang, Wang, & Yu, 2018; Voloh, Oemisch, & Womelsdorf, 2020), signal power across frequency bands (Rupp et al., 2017; Miyakawa et al., 2018; Majima et al., 2014), time-frequency wavelet coefficients (Hatamimajoumerd & Talebpour, 2019; Taghizadeh-Sarabi, Daliri, & Niksirat, 2015), interelectrode temporal correlations (Karimi-Rouzbahani, Bagheri, & Ebrahimpour, 2017a), and information-based features (e.g., entropy; Joshi, Panigrahi, Anand, & Santhosh, 2018; Torabi et al., 2017; Stam, 2005). Therefore, the neural codes are generally detected from EEG activity using a wide range of features sensitive to temporal variability.
While insightful, previous studies have also posed new questions about the relative richness, temporal dynamics, and behavioral relevance of different features of neural variability. First, can the features sensitive to temporal variabilities provide additional category information to the conventional mean feature? While several of the above studies have compared multiple features (Chan et al., 2011; Taghizadeh-Sarabi et al., 2015; Torabi et al., 2016), none of them compared their results against the conventional mean activity, which is the dominant feature, especially in time-resolved decoding (Grootswagers, Wardle, & Carlson, 2017). This comparison will not only validate the richness of each feature of neural variability but will also show if the mean activity detects a large portion of the neural codes produced by the brain. We predicted that the informative neural variabilities, if properly decoded, should provide additional information to the mean activity, which overlook the analysis window.
Second, do the features sensitive to temporal variabilities evolve over similar time windows to the “mean” feature? Among all the studies mentioned above, only a few investigated the temporal dynamics of features, other than the mean in time-resolved decoding (Majima et al., 2014; Stewart, Nuthmann, & Sanguinetti, 2014; Karimi-Rouzbahani et al., 2017a), where the temporal evolution of information encoding is studied (Grootswagers et al., 2017). As distinct aspects of sensory information (e.g., contrast versus spatial frequency) are represented on different temporal scales (Victor, 2000; Montemurro et al., 2008) and different variability features are potentially sensitive to distinct aspects of variability, we might see differential temporal dynamics for different features.
Third, do the features sensitive to temporal variabilities explain the behavioral recognition performance more accurately than the mean feature? One important question, which was not covered in the above studies, was whether the extracted information was behaviorally relevant or just epiphenomenal to the experimental conditions. One way of validating the relevance of the extracted neural codes is to check if they could predict the relevant behavior (Williams, Dang, & Kanwisher, 2007; Grootswagers, Cichy, & Carlson, 2018; Woolgar, Dermody, Afshar, Williams, & Rich, 2019). We previously found that the decoding accuracies obtained from mean signal activations could predict the behavioral recognition performance (Ritchie, Tovar, & Carlson, 2015). However, it remains unknown whether (if at all) the information obtained from temporal variabilities can explain more variance of the behavioral performance. Our prediction was that as the more informative features access more of the potentially overlooked neural codes, they should also explain the behavioral performance more accurately.
In this study, we address the above questions to provide additional insights into what aspects of neural variabilities might reflect the neural codes more thoroughly and how we can extract them most effectively using multivariate decoding analyses.
2 Methods
The data sets used in this study and the code are available online at https://osf.io/wbvpn/. The EEG and behavioral data are available in Matlab .mat format and the code in Matlab .m format. All the open-source scripts used in this study were compared against other implementations of identical algorithms in simulations and used only if they produced identical results. All open-source implementation scripts of similar algorithms produced identical results in our simulations. To evaluate different implementations, we tested them using 1000 random (normally distributed with unit variance and zero mean) time series, each including 1000 samples.
2.1 Overview of Data Sets
We chose three previously published EEG data sets in this study which differed across a wide range of parameters including the recording setup (e.g., amplifier, number of electrodes, preprocessing steps), characteristics of the image-set (e.g., number of categories and exemplars within each category, colorfulness of images), and task (e.g., presentation length, order and the participants' task; see Table 1). All three data sets previously successfully provided object category information using multivariate analyses.
Data Set . | # Electrodes . | Bandpass Filtering . | Notch Filtering . | # Object Categories . | # Stimulus Repetition . | Stimulus Presentation Time . | Stimulus Size (Periphery) . | Task . | Participants' Accuracy . | Participants' Age (median) . | Participants' Gender . | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | Karimi-Rouzbahani et al., 2017a | 31 | 0.03–200 Hz | 50 Hz | 4 | 3 | 50 ms | 2 (0.7) | Color matching (passive) | 94.68% | 22.1 | 7 male 3 female |
2 | Karimi-Rouzbahani, Vahab, Ebrahimpour, & Menhaj, 2019 | 31 | 0.03–200 Hz | 50 Hz | 4 | 6 | 900 ms | 8 8 (0) | Object category detection (active) | 94.65% | 26.4 | 6 male 4 female |
3 | Kaneshiro et al., 2015 | 128 | 0.03–50 Hz | No | 6 | 12 | 500 ms | 7.0 6.5 (0) | No task (fixation) | N/A | 30.5 | 7 male 3 female |
Data Set . | # Electrodes . | Bandpass Filtering . | Notch Filtering . | # Object Categories . | # Stimulus Repetition . | Stimulus Presentation Time . | Stimulus Size (Periphery) . | Task . | Participants' Accuracy . | Participants' Age (median) . | Participants' Gender . | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | Karimi-Rouzbahani et al., 2017a | 31 | 0.03–200 Hz | 50 Hz | 4 | 3 | 50 ms | 2 (0.7) | Color matching (passive) | 94.68% | 22.1 | 7 male 3 female |
2 | Karimi-Rouzbahani, Vahab, Ebrahimpour, & Menhaj, 2019 | 31 | 0.03–200 Hz | 50 Hz | 4 | 6 | 900 ms | 8 8 (0) | Object category detection (active) | 94.65% | 26.4 | 6 male 4 female |
3 | Kaneshiro et al., 2015 | 128 | 0.03–50 Hz | No | 6 | 12 | 500 ms | 7.0 6.5 (0) | No task (fixation) | N/A | 30.5 | 7 male 3 female |
2.1.1 Data Set 1
2.1.2 Data Set 2
We collected data set 2 in an active categorization experiment, in which participants pressed a button if the presented object image was from a target category (go/no-go), which was cued at the beginning of each block of 12 stimuli (Karimi-Rouzbahani, Vahab, Ebrahimpour, & Menhaj, 2019; see Figure 1, middle row). The object images, which were cropped from photographs, were part of the well-established benchmark image set for object recognition developed by Kiani, Esteky, Mirpour, and Tanaka (2007). This image set has been previously used to extract object category information from both human and monkey brain using MEG (Cichy et al., 2014), fMRI (Cichy et al., 2014; Kriegeskorte et al., 2008), and single-cell electrophysiology (Kriegeskorte et al., 2008; Kiani et al., 2007).
2.1.3 Data Set 3
We also used another data set (data set 3), which was not collected in our lab. This data set was collected by Kaneshiro et al. (2015) on six sessions for each participant. We used the first session only because it could represent the whole data set (the next sessions were repetition of the same stimuli to increase the signal-to-noise ratio) and we preferred to avoid a potential effect of extended familiarity with the stimuli on neural representations. The EEG data were collected during passive viewing (participants had no task but to keep fixating on the central fixation cross; see Figure 1, bottom row) of six categories of objects with stimuli chosen from Kiani et al. (2007) as explained above. We used a preprocessed (i.e., bandpass-filtered in the range 0.03 to 50 Hz) version of the data set, which was available online.1
All three data sets were collected at a sampling rate of 1000 Hz. For data sets 1 and 2, only the trials that led to correct responses by participants were used in the analyses. Each data set consisted of data from 10 participants. Each object category in each data set included 12 exemplars. To make the three data sets as consistent as possible, we preprocessed them differently from their original papers. Specifically, the bandpass filtering range of data set 3 was 0.03 to 50 Hz, and we did not have access to the raw data to increase the upper cutting frequency to 200 Hz. Data sets 1 and 2 were bandpass-filtered in the range 0.03 to 200 Hz before the data were split into trials. We also applied 50 Hz notch filters to data sets 1 and 2 to remove line noise. Next, we generated different versions of the data by bandpass-filtering the data in delta (0.5–4 Hz), theta (4–8 Hz), alpha (8–12 Hz), beta (12–16 Hz), and gamma (16–200 Hz) bands to see if there is any advantage for the suggested theta or delta frequency bands (Watrous et al., 2015; Behroozi et al., 2016; Wang, Wang, & Yu, 2018). We used finite-impulse-response (FIR) filters with 12 dB roll-off per octave for bandpass-filtering of data sets 1 and 2 and when evaluating the sub-bands of the three data sets. All the filters were applied before splitting the data into trials.
We did not remove artifacts (e.g., eye related and movement related) from the signals, as we and others have shown that sporadic artifacts have minimal effect in multivariate decoding (Grootswagers et al., 2017). To increase signal-to-noise ratios in the analyses, each unique stimulus was presented to the participants 3, 6, and 12 times in data sets 1, 2, and 3, respectively. Trials were defined in the time window from 200 ms before to 1000 ms after the stimulus onset to cover most of the range of event-related neural activations. The average prestimulus (200–0 ms relative to the stimulus onset) signal amplitude was removed from each trial of the data. For more information about each data set, see Table 1 and the references to their original publications.
2.2 Features
EEG signals are generated by inhibitory and excitatory postsynaptic potentials of cortical neurons. These potentials extend to the scalp surface and are recorded through electrodes as amplitudes of voltage in units of microvolts. Researchers have been using different aspects of these voltage recordings to obtain meaningful information about human brain processes. The main focus of this study is to compare the information content of features that are sensitive to temporal variabilities of neural activations against the mean of activity within the analysis window, which is conventionally used in decoding analysis (Grootswagers et al., 2017). Below we explain the mathematical formulas for each feature used in this study. We also provide brief information about potential underlying neural mechanisms that can lead to the information content provided by each feature.
We classified the features into five classes based on their mathematical similarity to simplify the presentation of the results and their interpretations: moment, complexity, ERP, frequency domain, and multivalued features. However, the classification of the features is not strict, and the features might be classified based on other criteria and definitions. For example, complexity itself has different definitions (Tononi & Edelman, 1998), such as degree of randomness or degrees of freedom in a large system of interacting elements. There are also recent studies that split the variability features into the three categories of variance-, frequency- and information theory-based categories (Waschke et al., 2021). Therefore, each definition may exclude or include some of our features in the class. It is of note that we used only the features that were previously used to decode categories of evoked potentials from EEG signals through multivariate decoding analysis. Nonetheless, there are definitely other features available, especially those extracted from EEG time series collected during long-term monitoring of human neural representations in health and disorder (Fulcher & Jones, 2017). In presenting the features' formulas, we avoided repeating the terms from the first feature to the last one. Therefore, readers might need to go back a few steps or features to find the definitions of the terms. Note that in this study, the analyses are performed in either 1000 ms time windows (i.e. number of samples used for feature extraction: ) in the whole-trial analysis or 50 ms time windows () in time-resolved analysis.
2.2.1 Moment Features
These features are the most straightforward and intuitive ones from which we might be able to extract information about neural processes. Mean, variance, skewness, and kurtosis are the first to fourth moments of EEG time series and can provide information about the shape of the signals and their deviation from stationarity which is the case in evoked potentials (Rasoulzadeh et al., 2017; Wong Galka, Yamashita, & Ozaki, 2006). These moments have been shown to be able to differentiate visually evoked responses (Pouryzdian & Erfanian, 2010; Alimardani, Cho, Boostani, & Hwang, 2018). The second to fourth moments are also categorized as variance-based features in recent studies (Waschke et al., 2021).
2.2.2 Complexity Features
There can potentially be many cases in which simple moment statistics such as mean, median, variance, skewness, and kurtosis, which rely on distributional assumptions, provide equal values for distinct time series (e.g., series A: 10, 20, 10, 20, 10, 20, 10, 20 versus series B: 20, 20, 20, 10, 20, 10, 10, 10) for both of which the five features provide equal results. Therefore, we need more complex and possibly nonlinear measures that can detect subtle but meaningful temporal patterns from time series. The analysis of nonlinear signal features has recently been growing, following the findings showing that EEG reflects weak but significant nonlinear structures (Stam, 2005; Stêpieñ, 2002). Importantly, many studies have shown that the complexity of EEG time series can significantly alter during cognitive tasks such as visual (Bizas et al., 1999) and working memory tasks (Sammer, 1999; Stam, 2000). Therefore, it was necessary to evaluate the information content of nonlinear features for our decoding of object categories. As mentioned above, the grouping of these nonlinear features as “complexity” here is not strict, and the features included in this class are those that capture complex and nonlinear patterns across time series. Although the accurate detection of complex and nonlinear patterns generally needs more time samples compared to linear patterns (Procaccia, 1988), it has been shown that nonlinear structures can be detected from short EEG time series as well (i.e., through fractal dimensions; Preissl, Lutzenberger, Pulvermüller, & Birbaumer, 1997). Nonetheless, we extract these features from both time-resolved (50 samples) and whole-trial data (1000 samples) to ensure we do not miss potential information represented in longer temporal scales.
Lempel-Ziv complexity (LZ Cmplx). Lempel-Ziv complexity measures the complexity of time series (Lempel & Ziv, 1976). Basically, the algorithm counts the number of unique sub-sequences within a larger binary sequence. Accordingly, a sequence of samples with a certain regularity does not lead to a large LZ complexity. However, the complexity generally grows with the length of the sequence and its irregularity. In other words, it measures the generation rate of new patterns along a digital sequence. In a comparative work, it was shown that compared to many other frequency metrics of time series (e.g., noise power, stochastic variability), LZ complexity has the unique feature of providing a scalar estimate of the bandwidth of time series and the harmonic variability in quasi-periodic signals (Aboy, Hornero, Abásolo, & Álvarez, 2006). It is widely used in biomedical signal processing and has provided successful results in the decoding of visual stimuli from neural responses in primary visual cortices (Szczepański, Amigó, Wajnryb, & Sanchez-Vives, 2003). We used the code by Quang Thai2 implemented based on “exhaustive complexity,” which is considered to provide the lower limit of the complexity as explained by Lempel and Ziv (1976). We used the signal median as a threshold to convert the signals into binary sequences for the calculation of LZ complexity. The LZ complexity provided a single value for each signal time series.
Fractal dimension. In signal processing, fractal is an indexing technique that provides statistical information about the complexity of time series. A higher fractal value indicates more complexity for a sequence as reflected in more nesting of repetitive sub-sequences at all scales. Fractal dimensions are widely used to measure two important attributes: self-similarity and the shape of irregularity. A growing set of studies has been using fractal analyses for the extraction of information about semantic object categories (such as living and nonliving categories of visual objects; Ahmadi-Pajouh, Ala, Zamanian, Namazi, & Jafari, 2018; Torabi et al., 2017), as well as simple checkerboard patterns (Namazi, Ala, & Bakardjian, 2018) from visually evoked potentials. In this study, we implemented two of the common methods for the calculation of fractal dimensions of EEG time series, which have been previously used to extract information about object categories as explained below. We used the implementations by Jesús Monge Álvarez3 for fractal analysis.
Entropy. Entropy can measure the perturbation in time series (Waschke et al., 2021). A higher value for entropy suggests a higher irregularity in the given time series. Precise calculation of entropy usually requires a considerable number of samples and is also sensitive to noise. Here we used two methods for the calculation of entropy, each of which has advantages over the other.
Approximate entropy (Apprx Ent) was initially developed to be used for medical data analysis (Pincus & Huang, 1992), such as heart rate, and then was extended to other areas such as brain data analysis. It has the advantage of requiring a low computational power, which makes it perfect for real-time applications on low sample sizes (50). However, the quality of this entropy method is impaired on lower lengths of the data. This metric detects changes in episodic behavior, which are not represented by peak occurrences or amplitudes (Pincus & Huang, 1992). We used an open-source code5 for calculating approximate entropy. We set the embedded dimension and the tolerance parameters to 2% and 20% of the standard deviation of the data, respectively, to roughly follow a previous study (Shourie, Firoozabadi, & Badie, 2014), which compared approximate entropy in visually evoked potentials and found differential effects across artist versus nonartist participants when looking at paintings.
Sample entropy (Sample Ent), a refinement of the approximate entropy, is frequently used to calculate the regularity of biological signals (Richman & Moorman, 2000). Basically, it is the negative natural logarithm of the conditional probability that two sequences (subset of samples) that are similar for points remain similar at the next point. A lower sample entropy also reflects a higher self-similarity in the time series. It has two main advantages to the approximate entropy: it is less sensitive to the length of the data and is simpler to implement. However, it does not focus on self-similar patterns in the data. We used the Matlab entropy function for the extraction of this feature, which has already provided category information in a previous study (Torabi et al., 2017). (See Richman & Moorman, 2000, and Subha, Joseph, Acharya, & Lim, 2010, for the details of the algorithm.)
2.2.3 ERP Components (N1, P1, P2a, and P2b)
An ERP is a measured brain response to a specific cognitive, sensory, or motor event that provides an approach to studying the correlation between the event and neural processing. According to the latency and amplitude, ERP is split into specific subwindows called components. Here, we extracted ERP components by calculating the mean of signals in specific time windows to obtain the P1 (80–120 ms), N1 (120–200 ms), P2a (150–220 ms), and P2b (200–275 ms) components, which were shown previously to provide significant amounts of information about visual object and face processing in univariate (Rossion et al., 2000; Rousselett, Husk, Bennett, & Sekuler, 2007) and multivariate analyses (Chan et al., 2011; Jadidi, Zargar, & Moradi, 2016; Wang et al., 2012). As these components are calculated in limited and specific time windows, in the whole-trial analysis, they reflect the mean of activity in their specific time windows, rather than the whole post-stimulus window. They will be also absent from time-resolved analyses by definition.
2.2.4 Frequency-Domain Features
Neural variability is commonly analyzed in frequency domain by calculating spectral power across frequency bands. Specifically, as data transformation from time to frequency domain is almost lossless using Fourier transform, oscillatory power basically reflects frequency-specific variance (with the total power reflecting the overall variance of the time series; Waschke et al., 2021). Motivated by previous studies showing signatures of object categories in the frequency domain (Behroozi et al., 2016; Rupp et al., 2017; Iranmanesh & Rodriguez-Villegas, 2017; Joshi et al., 2018; Jadidi et al., 2016) and the representation of temporal codes of visual information in the frequency domain (Eckhorn et al., 1988), we also extracted frequency-domain features to see if they could provide additional category-related information to time-domain features. It is of note that while the whole-trial analysis allows us to compare our results with previous studies, the evoked EEG potentials are generally nonstationary (i.e., their statistical properties change along the trial) and potentially dominated by low-frequency components. Therefore, the use of time-resolved analysis, which looks at more stationary subwindows of the signal (e.g., 50 samples here), will allow us to detect subtle high-frequency patterns of neural codes.
Median frequency (Med Freq). This is the median normalized frequency of the power spectrum of a time-domain signal. It is calculated similar to the signal median in the time domain; however, here the values are the power intensity in different frequency bins of the PSD. This feature was calculated using Matlab medfreq function.
Power and phase at median frequency (Pw MdFrq and Phs MdFrq). Interestingly, apart from the median frequency itself, which reflects the frequency aspect of the power spectrum, the power and phase of the signal at the median frequency have also been shown to be informative about aspects of human perception (Joshi et al., 2018; Jadidi et al., 2016). Therefore, we also calculated the power and phase of the frequency-domain signals at the median frequency as features.
Average frequency (Avg Freq). Evoked potentials show a few positive and negative peaks after the stimulus onset, and they might show deviation in the positive or negative directions depending on the information content (Mazaheri & Jensen, 2008). Therefore, we also evaluated the average (zero-crossing) frequency of the ERPs by counting the number of times the signal swapped signs during the trial. Note that each trial is baselined according to the average amplitude of the same trial in the last 200 ms immediately before the stimulus onset. We calculated the average frequency on the poststimulus time window.
Spectral edge frequency (SEF 95%). This is a common feature used in monitoring the depth of anesthesia and stages of sleep using EEG (Iranmanesh & Rodriguez-Villegas, 2017). It measures the frequency that covers X percent of the PSD. X is usually set in the range of 75% to 95%. Here we set X to 95%. Therefore, this reflects the frequency observed in a signal that covers 95% of a signal power spectrum.
2.2.5 Multivalued Features
The main hypothesis of this study is that we can potentially obtain more information about object categories as well as behavior if we take into account the temporal variability of neural activity within the analysis window (i.e., trial) rather than averaging the samples as in conventional decoding analyses. While the above variability-sensitive features return a single value from each individual time series (analysis window), a more flexible feature would allow as many informative patterns to be detected from an individual time series. Therefore, we extracted other features, which provide more than one value per analysis window, so that we can select the most informative values from across electrodes and time points simultaneously (see “Dimensionality reduction” below). We also included the original magnitude data as our reference feature, so that we know how much (if at all) our feature extraction and selection procedures improved decoding.
Amplitude and phase locking (Amp Lock and Phs Lock). Although interelectrode correlated variability (Cross Corr), which is interpreted as inter-area connectivity, has successfully provided object category information (Majima et al., 2014; Karimi-Rouzbahani et al., 2017a), previous studies suggested that neural communication is realized through amplitude and phase locking and coupling (Bruns, Eckhorn, Jokeit, & Ebner, 2000; Siegel, Donner, & Engel, 2012; Engel, Gerloff, Hilgetag, & Nolte, 2013). More recently, researchers have quantitatively shown that amplitude and phase locking detect distinct signatures of neural communication across time and space from neural activity (Siems & Siegel, 2020; Mostame & Sadaghiani, 2020). Therefore, in line with recent studies, which successfully decoded object categories using inter-area-correlated variability as neural codes (Tafreshi et al., 2019), we extracted amplitude and phase locking as two major connectivity features that might contain object category information as well. Briefly, amplitude locking refers to the coupling between the envelopes of two signals (electrodes) and reflects the correlation of activation amplitude. To estimate the amplitude locking between two signals, we extracted the envelopes of the two signals using Hilbert transform (Gabor, 1946; explained below), then estimated the Pearson correlation between the two resulting envelopes as amplitude locking.
Original magnitude data (Orig Mag). We also used the poststimulus original magnitude data (1000 or 50 samples for the whole-trial and sliding time windows, respectively) to decode object category information without any feature extraction. This provided a reference to compare the information content of the mean and variability features to see if the former provided any extra information.
2.3 Multivariate Decoding
We used multivariate decoding to extract information about object categories from our EEG data sets. Basically, multivariate decoding, which has been dominating neuroimaging studies recently (Haynes & Rees, 2006; Grootswagers et al., 2017; Hebart & Baker, 2018), measures the cross-condition dissimilarity or contrast to quantify information content in neural representations. We used linear discriminant analysis (LDA) classifiers in multivariate analysis to measure the information content across all possible pairs of object categories within each data set. Specifically, we trained and tested the classifiers on animal versus car, animal versus face, animal versus plane, car versus plane, face versus car and plane versus face categories, then averaged the six decoding results and reported them for each participant.
The LDA classifier has been shown to be robust when decoding object categories from M/EEG (Grootswagers et al., 2017, 2019), has provided higher decoding accuracies than Euclidean distance and correlation-based decoding methods (Carlson et al., 2013), and was around 30 times faster to train in our initial analyses compared to the more complex classifier of support vector machines (SVM). We ran our initial analysis and found similar results for the LDA and SVM and used LDA to save the time. We used a 10-fold cross-validation procedure in which we trained the classifier on 90% of the data and tested it on the left-out 10% of the data, repeating the procedure 10 times until all trials from the pair of categories participated once in the training and once in the testing of the classifiers. We repeated the decoding across all possible pairs of categories within each data set, which were 6, 6, and 15 pairs for data sets 1, 2, and 3, which consisted of 4, 4, and 6 object categories, respectively. Finally, we averaged the results across all combinations and reported them as the average decoding for each participant.
In the whole-trial analyses, we extracted the above-mentioned features from the 1000 data samples after the stimulus onset (from 1 to 1000 ms). In the time-resolved analyses, we extracted the features from 50 ms sliding time windows in steps of 5 ms across the time course of the trial (200 to 1000 ms relative to the stimulus onset time). Therefore, in time-resolved analyses, the decoding rates at each time point reflect the results for the 50 ms window around the time point, from 25 to 24 ms relative to the time point. Time-resolved analyses allowed us to evaluate the evolution of object category information across time as captured by different features.
2.4 Dimensionality Reduction
The multivalued features (e.g. interelectrode correlation, wavelet, Hilbert amplitude and phase, amplitude and phase locking, and original magnitude data) resulted in more than a single feature value per trial per sliding time window. This could provide higher decoding values compared to the decoding values obtained from single-valued features merely because of including a higher number of features. Moreover, when the features outnumber the observations (trials here), the classification algorithm can overfit to the data (Hart, Stork, & Duda, 2000). Therefore, to obtain comparable decoding accuracies across single-valued and multivalued features and to avoid potential overfitting of classifier to the data we used principal component analysis (PCA) to reduce the dimension of the data in multivalued features. Accordingly, we reduced the number of the values in the multivalued features to one per time window per trial, which equaled the number of values for the single-valued features.
To avoid potential leakage of information from testing to training (Pulini, Kerr, Loo, & Lenartowicz, 2019), we applied the PCA algorithm on the training data (folds) only and used the training PCA parameters (eigenvectors and means) for both training and testing sets for dimension reduction in each cross-validation run separately. We applied the dimension-reduction procedure only on the multivalued features. Note that we did not reduce the dimension of the neural space (columns in the dimension-reduced data matrix) to below the number of electrodes “” (opposite of Hatamimajoumerd, Talebpour, & Mohsenzadeh, 2020) as we were interested in qualitatively comparing our results with the vast literature currently using multivariate decoding with all sensors (Grootswagers et al., 2017; Hebart & Baker, 2018). Also, we did not aim at finding more than one feature per trial, per time window, as we wanted to compare the results of multivalued features with those of single-valued features, which had only a single value per trial, per time window.
One critical point here is that we applied the PCA on the concatenated data from all electrodes and values obtained from each individual feature (e.g., wavelet coefficients in wavelet) within each analysis window (e.g., 50 ms in time-resolved decoding). Therefore, for the multivalued features, the “e” selected dimensions, were the most informative spatial and temporal patterns detected across both electrodes and time samples. Therefore, it could be the case that within a given time window, two of the selected dimensions were from the same electrode (because two elements from the same electrode were more informative than the other electrode), which would lead to some electrodes not having any representatives among the selected dimensions. This is in contrast to the single-valued features (e.g., mean) from which we obtained only one value per analysis window per electrode, limiting the features to only the spatial patterns within the analysis window, rather than both spatial and temporal patterns.
2.5 Statistical Analyses
2.5.1 Bayes Factor Analysis
As in our previous studies (Grootswagers et al., 2019; Robinson, Grootswagers, & Carlson, 2019), to determine the evidence for the null and the alternative hypotheses, we used Bayes analyses as implemented by Bart Krekelberg based on Rouder, Morey, Speckman, and Province (2012). We used standard rules of thumb for interpreting levels of evidence (Lee & Wagenmakers, 2005; Dienes, 2014): Bayes factors of 10 and 1/10 were interpreted as strong evidence for the alternative and null hypotheses, respectively, and 3 and 1/3 were interpreted as moderate evidence for the alternative and null hypotheses, respectively. We considered the Bayes factors that fell between 3 and 1/3 as suggesting insufficient evidence either way. In the whole-trial decoding analyses, we asked whether there was a difference between the decoding values obtained from all possible pairs of features and also across frequency bands within every feature. Accordingly, we performed the Bayes factor analysis and calculated the Bayes factors as the probability of the data under an alternative (i.e., difference) relative to the null (i.e., no difference) hypothesis between all possible pairs of features and also across frequency bands within every feature and data set separately. The same procedure was used to evaluate evidence for difference (i.e., alternative hypothesis) or no difference (i.e., null hypothesis) in the maximum and average decoding accuracies and the time of maximum and above-chance decoding accuracies across features for each data set separately.
We also evaluated evidence for the alternative of above-chance decoding accuracy versus the null hypothesis of no difference from chance. For that purpose, we performed Bayes factor analysis between the distribution of actual accuracies obtained and a set of 1000 accuracies obtained from random permutation of class labels across the same pair of conditions (null distribution) on every time point (or only once for the whole-trial analysis), for each feature and data set separately. No correction for multiple comparisons was performed when using Bayes factors as they are much more conservative than frequentist analysis in providing false claims with confidence (Gelman & Tuerlinckx, 2000; Gelman, Hill, & Yajima, 2012). The reason for the lower susceptibility of Bayesian analysis compared to classical statistics is the use of priors, which, if chosen properly (here using the data-driven approach developed by Rouder et al. (2012), significantly reduce the chance of making type I (false-positive) errors.
The priors for all Bayes factor analyses were determined based on Jeffrey-Zellner-Siow priors (Jeffreys, 1998; Zellner & Siow, 1980), which are from the Cauchy distribution based on the effect size that is initially calculated in the algorithm (Rouder et al., 2012). The priors are data driven and have been shown to be invariant with respect to linear transformations of measurement units (Rouder et al., 2012), which reduces the chance of being biased toward the null or alternative hypotheses.
2.5.2 Random Permutation Testing
To evaluate the significance of correlations between decoding accuracies and behavioral reaction times, we calculated the percentage of the actual correlations that were higher (when positive) or lower (when negative) than a set of 1000 randomly generated correlations. These random correlations were generated by randomizing the order of participants' data in the behavioral reaction time vector (null distribution) for every time point and feature separately. The true correlation was considered significant if it surpassed 95% of the randomly generated correlations in the null distribution in either positive or negative directions () and the -values were corrected for multiple comparisons across time using Matlab's mafdr function which works based on fix rejection region (Storey, 2002).
3 Results
3.1 Can the Features Sensitive to Temporal Variabilities Provide Additional Category Information to the Conventional Mean Feature?
To answer this first question, we compared decoding accuracies in the whole-trial time span (0 to 1000 ms relative to stimulus onset) across all features and for each data set separately (see the complete results in Supplementary Figures 1 and 2 and summary results in Figure 2, black bars). There was not enough (BF 3) evidence for above-chance decoding for majority of features (e.g., moment features, complexity, and frequency-domain features; see Supplementary Figure 1, black bars and their Bayesian analyses). However, consistently across the three data sets, there was moderate (3 BF 1 0) or strong (BF 10) evidence for above-chance decoding for all ERP components (N1, P1, P2a, and P2b), wavelet coefficients (Wavelet), and original magnitude data (Orig Mag), which were either targeted at specific time windows within the trial (ERPs) or could detect temporal variabilities within the trial (Wavelet and Orig Mag; see Figure 2A, black bars).
Importantly, in all three data sets, there was moderate (3 BF 10) or strong (BF 10) evidence that ERP components of N1 and P2a provided higher decoding values than the mean (see Figure 2B, black boxes in Bayes matrices). There was also strong evidence (BF 10) that the wavelet and Orig Mag features outperformed the mean feature in data sets 2 and 3 (see Figure 2B, blue boxes in Bayes matrices). This shows that simply using the earlier ERP components of N1 and P2a can provide more information than using the mean activity across the whole trial. This was predictable, as the mean across the whole trial simply ignores within-trial temporally specific information. Interestingly, even ERPs were outperformed by wavelet and Orig Mag features in data set 3 (but not the opposite across the three data sets; see Figure 2B, violet boxes in Bayes matrices). This suggests that even further targeting the most informative elements (wavelet) and/or data samples (Orig Mag) within the trial can lead to improved decoding. Note that the wavelet and Orig Mag features provided the most informative temporal patterns and samples on the dimension reduction procedure applied on their extracted features (see section 2).
Following previous observations about the advantage of delta (Watrous et al., 2015; Behroozi et al., 2016) and theta (Wang et al., 2018) frequency bands, we compared the information content in the delta (0.5–4 Hz), theta (4–8 Hz), alpha (8–12 Hz), beta (12–16 Hz), gamma (16–200 Hz), and broad frequency bands. We predicted the domination of theta frequency band, following suggestions about the domination of the theta frequency band in feedforward visual processing (Bastos et al., 2015). For our top-performing ERP, wavelet, and Orig Mag features, we saw consistent domination of theta followed by the alpha frequency band (see Figure 2A). Interestingly, for the ERP components, the decoding in the theta band even outperformed the broad band (BF 3 for P2b), which contained the whole frequency spectrum. Note that as opposed to previous suggestions (Karakaş, Erzengin, & Başar, 2000), the domination of the theta frequency band in ERP components could not be trivially predicted by their timing relative to the stimulus onset. If this was the case here, the P2b component (200–275 ms) should have elicited its maximum information in the delta (0.5–4 Hz) and theta (4–8 Hz), rather than the theta and alpha (8–12 Hz) frequency bands. For the mean feature, the delta band provided the highest information level comparable to the level of the broad band activity. This confirms that broad band whole-trial mean activity, reflects the general trend of the signal (low-frequency component).
Together, we observed that the features that are targeted at informative windows of the trial (ERP components), and those sensitive to informative temporal variabilities (wavelet and Orig Mag) could provide additional category information to the conventionally used mean of activity. We observed that the theta frequency band, which has been suggested to support feedforward information flow, is also dominant in our data sets, which are potentially dominated by feedforward processing of visual information during object perception. Next, we compare the temporal dynamics of information encoding across our features.
3.2 Do the Features Sensitive to Temporal Variabilities Evolve over Similar Time Windows to the Mean Feature?
One main insight that EEG decoding can provide is to reveal the temporal dynamics of cognitive processes. However, the mean activity, which has dominated the literature (Grootswagers et al., 2017), might hide or distort the true temporal dynamics as it ignores potentially informative temporal variabilities (codes) within the analysis window. Therefore, we systematically compared the information content of a large set of features that are sensitive to temporal variabilities using time-resolved decoding (50 ms sliding time windows in steps of 5 ms; see the rationale for choosing the 50 ms windows in Supplementary Figure 3A). By definition, we do not have the time-resolved decoding results for the ERP components here.
Time-resolved decoding analyses showed that for all features, including the complexity features, which were suggested to need large sample sizes (Procaccia, 1988), there was moderate (3 BF 10) or strong (BF 10) evidence for above-chance decoding at some time points consistently across the three data sets (see Supplementary Figure 5A). However, all features showed distinct temporal dynamics to each other and across data sets. The dissimilarities between data sets could be driven by many data set–specific factors, including duration of image presentation (Carlson et al., 2013). However, there were also similarities between the temporal dynamics of different features. For example, the time points of first strong (BF 10) evidence for above-chance decoding ranged from 75 ms to 195 ms (see Supplementary Figures 5A and 5E), and the decoding values reached their maxima in the range between 150 ms and 220 ms (see Supplementary Figures 5A and 5D) across features. This is consistent with many decoding studies showing the temporal dynamics of visual processing in the brain (Isik et al., 2014; Cichy et al., 2014; Karimi-Rouzbahani, Woolgar, & Rich, 2021). There was no feature that consistently preceded or followed other features to suggest the existence of very early or late neural codes (see Supplementary Figures 5D and 5E). There was more information decoded from features of mean, median, variance, and several multivalued features, especially wavelet and Orig Mag, compared to other features across the three data sets (see Supplementary Figure 5A). The mentioned features dominated other features in terms of both average and maximum decoding accuracies (see Supplementary Figures 5B and 5C). A complementary analysis suggested a potential overlap between the neural codes that different features detected (see Supplementary Figure 6).
We then directly compared the mean and the most informative variability-sensitive features (wavelet and Orig Mag). Consistently across the data sets, there was moderate (3 BF 10) or strong (BF 10) evidence for higher decoding obtained by wavelet and Orig Mag compared to the mean feature on time points before 200 ms poststimulus onset (see Figure 3A). After 200 ms, this advantage sustained (data set 3), disappeared (data set 1), or turned into disadvantage (data set 2). Except for few very short continuous intervals, during which wavelet provided higher decoding values compared to Orig Mag, the two features provided almost the same results (see Figure 3, yellow dots on bottom panels). Comparing the parameters of the decoding curves, we found moderate (3 BF 10) or strong (BF 10) evidence for higher maximum decoding for the wavelet and Orig Mag features than the mean feature in data sets 1 and 3 (see Figure 3B). There was also moderate (3 BF 10) evidence for higher maximum decoding accuracy for wavelet versus Orig Mag (see Figure 3B). There was also strong (BF 10) evidence for higher average decoding accuracy for the wavelet and Orig Mag features over the mean feature in data set 3 (see Figure 3C). There was also moderate (3 BF 10) evidence for higher maximum decoding for wavelet versus Orig Mag in data sets 2 and 3. These results show that the wavelet feature provides the highest maximum (in data set 3) and average (in data sets 2 and 3) decoding accuracies among the three features, followed by the Orig Mag feature. The measures of maximum and average decoding accuracies were calculated in the poststimulus onset (0–1000 ms) for each participant separately. We also compared the timing parameters of the decoding curves (i.e., the time to the first above chance and maximum decoding relative to stimulus onset) obtained for the three features (see Figures 3D and 3E), but found insufficient evidence (0.3 BF 3) for their difference.
Together, these results suggest that the inclusion of temporal variabilities of activity can provide additional information about object categories to what is conventionally obtained from the mean of activity. Note that the advantage of wavelet and Orig Mag features cannot be explained by the size or dimensionality of the feature space, as the numbers of dimensions were equalized across features. Importantly, however, the decoding of information from temporal variabilities did not lead to different temporal dynamics of information decoding. This can be explained by either the common cognitive processes producing the decoded neural codes (i.e., object categorization), the overlap between the information (neural codes) detected by our features, or a combination of both.
3.3 Do the Features Sensitive to Temporal Variabilities Explain the Behavioral Recognition Performance More Accurately than the Mean Feature?
Although we observed an advantage for the features that were sensitive to temporal variability (e.g. wavelet) over other, more summarized features (e.g. mean), this can all be a by-product of more flexibility (e.g. inclusion of both temporal and spatial codes) in the former over the latter, and not read out by downstream neurons that support behavior. To validate the behavioral relevance of the detected neural codes we calculated the correlation between the decoding accuracies of features and the reaction times of participants (Vidaurre, Myers, Stokes, Nobre, & Woolrich, 2019; Ritchie et al., 2015). Participants' reaction times in object recognition have been previously shown to be predictable from decoding accuracy (Ritchie et al., 2015). We expected to observe negative correlations between the features' decoding accuracies and participants' reaction times in the poststimulus span (Ritchie et al., 2015). This suggests that greater separability between neural representations of categories might lead to categorizing them faster in behavior, supporting that the decoded neural codes might be used by neurons that drive behavior. We used only data set 2 in this analysis, as it was the only data set with an active object detection task, so relevant reaction times were available. The (Spearman's rank-order) correlations were calculated across the time course of the trials between the 10-dimensional vector of neural decoding accuracies obtained on every time point and the 10-dimensional vector of behavioral reaction times, both obtained from the group of 10 participants (Cichy et al., 2014). This resulted in a single correlation value for each time point for the whole group of participants.
All features except Katz FD showed negative trends after the stimulus onset (see Figure 4A). The correlations showed more sustained negative values for the multivalued versus single-valued features ( 0.05). There were also larger negative peaks (generally 0.5) for multivalued features, especially wavelet compared to other features (generally 0.5). Specifically, while higher-order moment features (variance, skewness, and kurtosis), as well as many complexity features, showed earlier negative peaks at around 150 ms, mean, median, frequency-domain features, and multivalued features showed later negative peaks after 300 ms. Therefore, the multivalued features, especially wavelet, which were sensitive to temporal variabilities of the signals, showed the most sustained and significant correlations to behavior.
Visual inspection suggests that features that provided a higher decoding accuracy (e.g., wavelet, Figure 3) also did better at predicting behavioral performance (e.g., wavelet, Figure 4). To quantitatively see if such a relationship exists, we calculated the correlation between parameters of the decoding curves (introduced in Figures 3B to 3D) and the average correlation to behavior obtained by the same features (see Figure 4A). Specifically, we used the average decoding and maximum decoding accuracies, which we hypothesized to predict average correlation to behavior. The rationale behind this hypothesis was that more effective decoding of neural codes, as reflected in higher average decoding and maximum decoding accuracies (see Figure 3), should facilitate better prediction of behavior by detecting subtle but overlooked behavior-relevant neural codes. As a control, we also evaluated the time of first above-chance decoding and time of maximum decoding accuracies, which we hypothesized not to correlate with average correlation to behavior. Our rationale behind this prediction was that we already observed relatively similar temporal dynamics for more and less informative features of neural activity (see Figures 3D and 3E), suggesting that all those features detect some aspects of the codes produced by similar neural mechanisms.
To obtain the parameter of average correlation to behavior, we simply averaged the correlation to behavior in the poststimulus time span for each feature separately (see Figure 4A). Results showed that (see Figure 4B) while the temporal parameters of time of first above-chance decoding and time of maximum decoding (our control parameters) failed to predict the level of average correlation to behavior (, , and , , respectively), the parameters of maximum decoding and average decoding accuracies significantly ( and respectively, with 0.0001; Pearson's correlation) predicted the average correlation to behavior. Note the difference between the Spearman's correlation to behavior calculated in Figure 4A and the correlations reported in Figure 4B. While the Spearman's correlation to behavior is obtained by correlating the time-resolved decoding rates and corresponding reaction times across participants, the average correlation to behavior is calculated by correlating the poststimulus average of the former correlations and their corresponding decoding parameters across features rather than participants. This result suggests that the more effective the decoding of the neural codes, the better the prediction of behavior. This is not a trivial result; higher decoding values for the more informative features do not necessarily lead to higher correlation to behavior, as “correlation” normalizes the absolute values of input variables.4
4 Discussion
Temporal variability of neural activity has been suggested to provide an additional channel to the mean of activity for the encoding of several aspects of the input sensory information. This includes complexity (Garrett et al., 2020), uncertainty (Orbán et al., 2016), and variance (Hermundstad et al., 2014) of the input information. It is suggested that the brain optimizes the neuronal activation and variability to avoid overactivation (energy loss) for simple, familiar, and less informative categories of sensory inputs. For example, face images, which have less variable compositional features, evoked less variable responses in fMRI compared to house images, which were more varied, even in a passive viewing task (Garrett et al., 2020). This automatic and adaptive modulation of neural variability can result in more effective and accurate encoding of the sensory inputs in changing environments–for example, by suppressing uninformative neuronal activation for less varied (more familiar) stimuli such as face versus house images (Garrett et al., 2020). Despite the recent evidence about the richness of information in temporal variability, which is modulated by the category of the sensory input (Garrett et al., 2020; Orbán et al., 2016; Waschke et al., 2021), the majority of EEG studies still ignore variability in decoding. Specifically, they generally either extract variability (e.g., entropy and power) from the whole-trial activity (e.g., for brain-computer interface) or use the simple mean (average) magnitude data within subwindows of the trial (e.g., for time-resolved decoding; Grootswagers et al., 2017). The former can miss the informative within-trial variabilities and fluctuations of the trial in the highly dynamical and nonstationary evoked potentials. The latter may overlook the informative variabilities within the sliding time windows as a result of temporal averaging.
Here, we quantified the advantage of the features sensitive to temporal variabilities over the conventional mean activity. In whole-trial analysis, we observed that the features, which targeted informative subwindows and samples of the trial (e.g., ERP components, wavelet coefficients (wavelet), and original magnitude data (Orig Mag), could provide more category information than the mean feature, which ignored temporal variabilities. Interestingly, ERP components (N1, P2a, and P2b) provided comparable results to those obtained by informative samples (Orig Mag) or wavelet transformation (except for data set 3). That could be the reason for the remarkable decoding results achieved in previous studies that used ERPs (Wang et al., 2012; Qin et al., 2016) and wavelet (Taghizadeh-Sarabi et al., 2015). These results also propose that we might not need to apply complex transformations (e.g., wavelet) on the data in whole-trial analysis (Taghizadeh-Sarabi et al., 2015), as comparable results can be obtained using simple ERP components or original magnitude data. However, inclusion of more dimensions of the features in decoding or combining them (Karimi Rouzbahani & Daliri, 2011; Qin et al., 2016) could potentially provide higher decoding accuracies for multivalued (e.g., wavelet; Taghizadeh-Sarabi et al., 2015) than ERP features (i.e., we equalized the dimensions across features here).
The wavelet and Original magnitude data not only outperformed all the variability-sensitive features, but also the conventional mean feature. Importantly, while features such as Hilbert phase and amplitude, phase- and amplitude-locking, and interelectrode correlations also had access to all the samples within the sliding analysis window, they failed to provide information comparable to the Wavelet and Orig Mag features. The reason for the success of the original magnitude data seems to be that it basically makes no assumptions about the shape or pattern of the potential neural codes, as opposed to Hilbert phase (Hilb Phs), amplitude (Hilb Amp), and correlated variability (Cross Corr), each of which is sensitive to one specific aspect of neural variability (phase, amplitude, correlation). The reason for the success of the wavelet feature seems to be its reasonable balance between flexibility in detecting potential neural codes contained in the amplitude, phase, and frequency/scale and a relatively lower susceptibility to noise as a result of filtering applied on different frequency bands (Guo et al., 2009). Together, these observations support the idea that neural codes are complex structures reflected in multiple aspects of EEG data such as amplitude, phase, and frequency/scale (Panzeri et al., 2010; Waschke et al., 2021).
The advantage of theta over broad band in our data (see Supplementary Figures 1 and 3) is consistent with previous monkey studies suggesting that theta and gamma frequency bands played major roles in feedforward processing of visual information in the brain (Bastos et al., 2015), which also seemed dominant here (see Supplementary Figure 4). One potential reason for the encoding of feedforward information in the theta band can be that bottom-up sensory signals transfer information about ongoing experiences, which might need to be stored in long-term memory for future use (Zheng & Colgin, 2015). Long-term memories are suggested to be encoded by enhanced long-lasting synaptic connections. The optimal patterns of activity that can cause such changes in synaptic weights were suggested to be successive theta cycles that carry contents in fast gamma rhythms (100 Hz; Larson, Wong, & Lynch, 1986). While direct correspondence between invasive versus noninvasive neural data remains unclear (Ng, Logothetis, & Kayser, 2013), this study provides additional evidence for the major role of the theta frequency band in human visual perception (Wang et al., 2012; Qin et al., 2016; Jadidi et al., 2016; Taghizadeh-Sarabi et al., 2015; Torabi et al., 2017). It also suggests that the BCI community might benefit from concentrating on specific frequency bands relevant to the cognitive or sensory processing undergoing in the brain, that is, investigating the theta band when stimulating the visual system.
One critical question for cognitive neuroscience has been whether (if at all) neuroimaging data can explain behavior (Williams et al., 2007; Ritchie et al., 2015; Woolgar et al., 2019; Karimi-Rouzbahani et al., 2019; Karimi-Rouzbahani, Ramezani, Woolgar, Rich, & Ghodrati, 2021). We extended this question by asking whether more optimal decoding of object category information can lead to better prediction of behavioral performance. We showed in data set 2 that this can be the case. Critically, here we observed for the same data set that there seems to be a linear relationship between the obtainable decoding accuracy and the explanatory power of the features. It implies that in order to bring neuroimaging observations closer to behavior, we might need to work on how we can read out the neural codes more effectively.
It has been suggested that neural variability is modulated not only by sensory information (as focused on here) but also by other top-down cognitive processes such as attention, expectation, memory, and task demands (Waschke et al. 2021). For example, attention decreased low-frequency neural variabilities/power (2–10 Hz, referred to as “desynchronization”) while increasing high-frequency neural variabilities/power (Wyart & Tallon-Baudry, 2009). Therefore, in the future, it will be interesting to know which features best detect the modulation of neural variability in other cognitive tasks. Moreover, it is interesting to know how (if at all) a combination of the features used in this study could provide any additional information about object categories or behavior. In other words, although all of the individual features evaluated here covered some variance of category object information, to detect the neural information more effectively, it might be helpful to combine multiple features using supervised and unsupervised methods (Karimi Rouzbahani & Daliri, 2011; Qin et al., 2016).
The cross-data set, large-scale analysis methods implemented in this study align with the growing trend toward meta-analysis in cognitive neuroscience. Recent studies have also adopted and compared several data sets to facilitate forming more rigorous conclusions about how the brain performs different cognitive processes such as sustained attention (Langner & Eickhoff, 2013) or working memory (Adam, Vogel, & Awh, 2020). Our results provide evidence supporting the idea that neural variability seems to be an additional channel for information encoding in EEG, which should not be simply ignored.
Notes
Acknowledgments
H.K.-R. was funded by the Royal Society's Newton International Fellowship (SUAI/059/G101116) and MRC Cognition and Brain Sciences Unit.