Cognitive processes, such as learning and cognitive flexibility, are both difficult to measure and to sample continuously using objective tools because cognitive processes arise from distributed, high-dimensional neural activity. For both research and clinical applications, that dimensionality must be reduced. To reduce dimensionality and measure underlying cognitive processes, we propose a modeling framework in which a cognitive process is defined as a low-dimensional dynamical latent variable—called a cognitive state, which links high-dimensional neural recordings and multidimensional behavioral readouts. This framework allows us to decompose the hard problem of modeling the relationship between neural and behavioral data into separable encoding-decoding approaches. We first use a state-space modeling framework, the behavioral decoder, to articulate the relationship between an objective behavioral readout (e.g., response times) and cognitive state. The second step, the neural encoder, involves using a generalized linear model (GLM) to identify the relationship between the cognitive state and neural signals, such as local field potential (LFP). We then use the neural encoder model and a Bayesian filter to estimate cognitive state using neural data (LFP power) to generate the neural decoder. We provide goodness-of-fit analysis and model selection criteria in support of the encoding-decoding result. We apply this framework to estimate an underlying cognitive state from neural data in human participants () performing a cognitive conflict task. We successfully estimated the cognitive state within the 95% confidence intervals of that estimated using behavior readout for an average of 90% of task trials across participants. In contrast to previous encoder-decoder models, our proposed modeling framework incorporates LFP spectral power to encode and decode a cognitive state. The framework allowed us to capture the temporal evolution of the underlying cognitive processes, which could be key to the development of closed-loop experiments and treatments.
Recent technological and experimental advances have improved our capability to stimulate and record simultaneous neural activity from multiple brain areas in humans (Stevenson & Kording, 2011; Lebedev & Nicolelis, 2017; Sani et al., 2018; Ezzyat et al., 2017). To meet the potential of online recording and stimulation across the brain, new technological advances must be matched by complementary advances in analytical methods that can characterize complex dynamics of high-dimensional neural data and uncover relationships among neural data, behavior, and other physiological signals (Krakauer, Ghazanfar, Gomez-Marin, MacIver, & Poeppel, 2017; Jorgenson et al., 2015).
Classically, neural representations have been investigated through stimulus-response experiments, which typically examine the effect of simple, fixed stimuli on a particular feature of the neural response (Sidman, 1960; Zarahn, Aguirre, & D'Esposito, 1997). These experiments have been amenable to standard statistical modeling methods, like thresholding, averaging, and linear regression. For instance, classic experiments in motor cortical coding measured the number of spikes fired by M1 neurons during arm-reaching movements in a discrete set of directions (Schwartz, Kettner, & Georgopoulos, 1988). Simple statistical models relating the spiking in each neuron to the movement direction were then used for both encoding and decoding analyses (Hochberg et al., 2012; Serruya, Hatsopoulos, Paninski, Fellows, & Donoghue, 2002; Flint, Lindberg, Jordan, Miller, & Slutzky, 2012; Nuyujukian et al., 2018). While such brain region–specific statistical models may capture coding in certain neural systems, many other aspects of brain function, such as cognition, planning, and attention, are shaped by the interaction and cooperation of networks across brain areas. These functions are encoded in complex and distributed neural activity that spans multiple time and frequency scales (Haynes & Rees, 2006; Haxby, Hoffman, & Gobbini, 2000; Voytek & Knight, 2015; Avena-Koenigsberger, Misic, & Sporns, 2018). Furthermore, the neural representation of some cognitive processes might be understood only as a complex function of other behavioral or physiological signals. For instance, learning—as a cognitive process—typically is not directly measurable on its own and must be assessed through features of behavioral responses to a task (Brouwer et al., 2015; Martinez-Rubio, Paulk, McDonald, Widge, & Eskandar, 2018). Therefore, a model of the neural representation of learning might relate the joint neural activity across multiple brain regions and multiple recording modalities to specific attributes of a complicated task. Characterizing such relationships requires statistical and analytical tools to integrate neural recordings from many brain areas at varying temporal and spatial scales and to associate those recordings with relevant, low-dimensional features of the cognitive process being modeled. Finally, cognitive processes and their associated neural activity patterns evolve over time; thus, the dynamics of neural activity and behavior must be properly incorporated into the models (Davidson, Putnam, & Larson, 2000; Bressler & Menon, 2010; Braun et al., 2015).
Classically, encoding models focus on characterizing how external stimuli are encoded in the brain through patterns of neural activity. Subsequent neural decoding methods then aim to reconstruct the behavioral intentions from just the observed neural activity patterns. Neural encoding and decoding are challenging modeling and analysis problems, given the complexity of brain dynamics and the diverse and miscellaneous functions of brain networks. The encoding and decoding problem becomes even more challenging in the cognitive domain, as cognitive processes are typically not directly measurable and often are only loosely defined (Widge et al., 2017).
We present an encoder-decoder modeling paradigm to overcome some of the limitations associated with applying existing neural decoder models to estimate hidden cognitive states from behavior and multidimensional neural data. We define a cognitive state process as a dynamic measure of cognitive function during a behavioral task. We then provide tools to estimate the cognitive state at each instant and associate its value with high-dimensional neural activity. The modeling framework accounts for the dynamics of the cognitive process and the associated temporal changes in the neural and behavioral signals. The proposed framework is scalable to high-dimensional neural and behavioral data and is applicable to many types of cognitive processes. Here, we applied this framework to the problem of estimating cognitive flexibility in participants performing a multisource interference task (MSIT; Bush, Shin, Holmes, Rosen, & Vogt, 2003). The results support the ability of the framework to infer a consistent association between neural and behavioral data and to estimate accurately the underlying cognitive state at each moment. This modeling framework not only provides a tool to study biomarkers of changes in the brain function(s) but also constructs an analytical platform for brain stimulation systems, such as closed-loop deep brain stimulation (DBS), where a reliable and real-time estimate of a cognitive state may be needed for cognitive control (Widge et al., 2017).
Cognition is expressed in many aspects of behavior, and these behaviors are complex and interrelated (Barrett & Satpute, 2013; LeDoux, 2000; Davidson et al., 2000; Bressler & Menon, 2010). Despite the fact that the behavioral impact of cognition is multifold, features of cognition—or a cognitive state—can often be modeled using a low-dimensional manifold relative to the complex and distributed neural activity that generates the response to an ongoing cognitive demand (Shine et al., 2018; Vyas et al., 2018). Under this modeling assumption, the cognitive state becomes a low-dimensional latent variable that links brain and behavior, which are both adapting dynamically in response to the task demands (Glaser & Kording, 2016; Skrondal & Rabe-Hesketh, 2007; Suzuki & Brown, 2005). Approaches using latent variable analysis are widely applied in the field of statistics and machine learning, and its theory and applications can be extended to the analysis of neural and behavioral data (Loehlin & Beaujean, 2016).
We define a cognitive state variable through its influence on observable behavior. Typically, this involves having a subject perform a controlled cognitive task that is designed to assess aspects of cognitive processing over multiple trials (Crandall, Klein, Klein, & Hoffman, 2006; Cohen, 2014). Analysis of data from this task involves characterizing changes in behavior using a low-dimensional cognitive state process addressable using a state-space modeling framework. State-space modeling has been shown previously to successfully model dynamical behavioral signals to estimate underlying cognitive or emotional processes (Schöner, 2008; Wang et al., 2005; Smith et al., 2004; Yousefi et al., 2015). State-space models can optimally process multimodal, complex, dynamical behavioral signals such as reaction time and decision choice in estimating cognitive processes as in learning or attention (Prerau et al., 2008).
Under the state-space modeling framework, the unobserved cognitive state is modeled as a low-dimensional dynamical latent variable, and the behavioral signals are defined as functions of both the latent variable and trial-to-trial factors relevant to the task (Bentler & Weeks, 1980; Wang et al., 2005; Bishop, 2006). The state-space estimation problem involves simultaneously fitting the models and computing the dynamics of the cognitive state from the observed behavior. Its general solution is a combination of a Kalman-type filter/smoother with an expectation-maximization (EM) algorithm (Smith et al., 2004; Dempster, Laird, & Rubin, 1977), which have been developed for a large class of behavioral signals with different distribution models (Yousefi et al., 2015; Prerau et al., 2008). The state-space modeling framework therefore can be used for a low-dimensional behavioral decoder, combining fine and slow temporal dynamics that capture the essential features of the behavior (Smith et al., 2004; Yousefi et al., 2015).
Importantly, this formulation makes a core assumption about the cognitive state: that it exists at all times when neural data are measured, but that its ground truth (behavioral output) can be observed only intermittently. That is, in a standard trial-structured behavioral task, we can update the value of the cognitive state only once we observe the participant's response to each trial. The neural data, however, are usually sampled hundreds to thousands of times per second. We can assume that the observed decision was influenced by the state even before we knew the state value, in that the preceding neural activity effectively determined and encoded the behavior. Thus, once the value of the state on a given trial is known, it can be regressed against neural data that occurred before that decision to understand where and when these cognitive variables are encoded. We used this approach in a recent paper where a learning process was formulated as a latent state variable (Martinez-Rubio et al., 2018). Alternatively, if we wish to predict the behavior before it occurs, we can directly infer the cognitive state from neural activity, as in the decoding analyses presented here and in recent related work by other groups (Ezzyat et al., 2017; Sani et al., 2018).
Cognitive function emerges from ongoing and distributed brain activity, and thus, a full description of the cognitive state(s) should capture the link between cognitive state and neural data (Haynes & Rees, 2006; Haxby et al., 2000; Mitchell et al., 2004; Deco, Jirsa, & McIntosh, 2011; Poldrack, 2011). Taking into account the fact that brain activity is dynamic and stochastic, we use dynamical and statistical modeling to characterize the neural activity relative to cognition (Li, O'Doherty, Lebedev, & Nicolelis, 2011; Fischer & Peña, 2011; Kloosterman, Layton, Chen, & Wilson, 2013; Koyama, Eden, Brown, & Kass, 2010; Poldrack, 2011; Turner et al., 2013; Mukamel et al., 2005; Lawhern, Wu, Hatsopoulos, & Paninski, 2010; Shlens, 2014). A variety of statistical modeling approaches like stochastic dynamical modeling, Bayes theory, and joint and conditional probabilities of neural features are widely used in the analysis of neural data (Knill & Pouget, 2004; Wu et al., 2003; Wu, Kulkarni, Hatsopoulos, & Paninski, 2009; Paninski, Pillow, & Lewi, 2007; Truccolo, Eden, Fellows, Donoghue, & Brown, 2005). The core assumption behind these approaches is that changes in the cognitive state(s) are encoded in a subset of features of neural activity, and thus these states can be decoded from multidimensional neural recordings (Pillow, 2007; Calabrese et al., 2011). A common class of statistical models used to relate neural activity and behavioral signals is the generalized linear model (GLM) (McCullagh & Nelder, 1989). GLMs provide an extension of linear regression that can be flexibly applied to a variety of different neural data modalities, including spiking activity, functional magnetic resonance imaging (fMRI), and local field potential (LFP; Pillow, 2007; Calabrese et al., 2011; Mukamel et al., 2005; Lawhern et al., 2010; Shlens, 2014). Here, we use GLMs to characterize the relationship between the cognitive state (estimated from behavior) and neural data recorded as LFP.
Together, this modeling framework comprises three models: one that characterizes the dynamics of the central cognitive state process, one that captures the influence of the cognitive state on features of the behavioral task, and one that describes the neural representation of the cognitive state as it evolves. Once these models are defined and fit, we use multiple goodness-of-fit metrics to assess their quality and use statistical inference methods to determine the statistical significance of any neural and behavioral associations with the cognitive state process (Brown, Barbieri, Ventura, Kass, & Frank, 2002; Wilcox, 2005; Venables & Ripley, 2013; Gordon, Salmond, & Smith, 1993). In addition to defining the model framework and deriving the associated estimation and inference algorithms, we illustrate its application in decoding a hidden cognitive state by using recordings of neural activity in human participants performing a cognitive flexibility task. Application of the encoding-decoding framework to these data allows us to solve the problem of estimating a low-dimensional hidden cognitive state from both behavior and LFP power (as decomposed into multiple spectral components) from multiple brain regions.
This framework is scalable and can be applied in many other domains of cognitive state estimation to link high-dimensional behavioral and neural data. Most important, the modeling framework is defined independent of the modality of neural activity and corresponding neural features; thus, other neural features, including dynamic functional connectivity and spatiotemporal neural dynamics (Sporns, Chialvo, Kaiser, & Hilgetag, 2004; Makeig, Debener, Onton, & Delorme, 2004; Buzsáki et al., 2012), can also be studied in building the relationship between brain functions and behavior.
In section 3, we first provide the theoretical foundations of the encoding-decoding modeling framework. We then describe how each of the encoder and decoder models is built and propose corresponding model identification procedures. We describe the behavioral decoder model and discuss how the underlying cognitive state(s) can be estimated using behavioral signals. We then describe the neural encoder model and propose how the neural features representing the cognitive state are selected. We continue by demonstrating the neural decoder model structure and how the decoder performance in estimating the cognitive state can be evaluated. We also discuss a pruning procedure to select the parsimonious subset of neural features to build a more robust neural decoder model.
In section 4, we demonstrate an application of the proposed encoder-decoder modeling framework in decoding the cognitive state of task participants performing a cognitive task using simultaneous recordings of their behavior and intracranial LFP. We describe the multisource interference task (MSIT) as performed by participants undergoing simultaneous intracranial neural recordings for clinical purposes. We discuss the specific behavioral decoder model being developed for MSIT. We then discuss the encoder model structure and model selection procedure for the cognitive state being estimated using the behavioral decoder. We discuss why the identified neural encoders representing cognitive states are validated and study the performance of the neural decoder in estimating the cognitive state. We conclude this section by describing the consistency of the neural encoder features identified by the encoder model across patients.
In section 5, we further elaborate on different aspects of the proposed modeling framework. We also discuss its similarity and distinct features to other modeling frameworks used in characterizing the relationship between brain and behavior. We also talk about the limitations of the framework and future research direction, which can be built on the modeling framework being proposed and studied in this research.
We developed an encoding-decoding (E-D) modeling framework linking behavioral and neural signals under the hypothesis that there exists an underlying cognitive state conditioned on which the neural activity and behavioral signals can be independently characterized. In other words, the behavior can be predicted by the cognitive state without knowing the neural activity, and, similarly, the neural activity can be predicted using estimates of the cognitive state without observing behavior (see Figure 1). In the E-D modeling framework, the cognitive state is modeled as a dynamic process, and the conditional distribution of both behavioral signals and neural activity is modeled as functions of both the cognitive state and any additional explanatory variables. Explanatory variables include factors of the behavioral experiments designed to assess cognitive state, like how hard or easy a task is, and current and previous features of the neural activity.
Under this modeling framework, we assume that both neural and behavioral data (the “observed signals”) are recorded during a cognitive task. The cognitive task runs over many trials, and the task trials are designed to both evoke and capture changes in one or multiple cognitive processes. We further assume that the cognitive process of interest might evolve over time and that its progression correlates with changes in behavior and brain activity.
The model development starts by characterizing the underlying cognitive process as a function of behavioral readout and then using the estimated cognitive process to find the neural correlates that encode changes in the cognitive state (see Figure 1). The modeling goal is to estimate low-dimensional cognitive processes by capturing the essential features of neural or behavioral data. Using our proposed modeling framework, we assume both neural and behavioral data are high-dimensional signals; thus, we can combine information from different modalities of behavior to better capture the cognitive state and study neural activity from many brain areas to build the neural encoder-decoder model. In section 4, we discuss a multisource interference task (MSIT), which is designed to assess the task participants' cognitive flexibility (Bush et al., 2003; Bush & Shin, 2006; Sheth et al., 2012; Van Veen, Cohen, Botvinick, Stenger, & Carter, 2001). Many cognitive tasks are designed with similar objectives, and our proposed modeling framework would be applicable to estimating the underlying cognitive state in these types of experiments as well.
2.1 Model of Behavior
We utilized a state-space modeling approach to describe the dynamical features of the behavioral signal. Here, we focus on modeling behavioral signals recorded on trial-structured cognitive tasks, where the task participant's response is captured per task trial. The model consists of a state equation and an observation equation. The state equation defines the temporal evolution of the cognitive state from one trial to the next, and the observation equation defines the influence of the cognitive state on the observed behavioral signals.
2.2 Neural Encoding Model
In building these neural encoder models, we used generalized linear modeling (GLM), an extension of the classical linear regression model for observed variables that may not be correctly described by a normal distribution (McCullough & Nelder, 1989). For instance, for a coherence feature, a beta distribution is often preferable to the normal distribution (Miranda de Sá, 2004), and power features are typically better described by log-normal or gamma distributions than by a normal distribution (Barbieri, Quirk, Frank, Wilson, & Brown, 2001; Yousefi et al., 2015). GLM generalizes the linear regression model for these distributions, and thus, it was used in building the encoder model.
Generally there are multiple features per trial index; thus, the conditional probability defined in equation 3.5 will be extended over the whole feature set: .
2.2.1 Model Selection
Neural data were recorded from over 200 electrode contacts and transformed into multiple frequency bands and coherence features (Widge et al., 2017). Not every feature encodes the cognitive state, and if it encodes the state, the encoding relationship might be unique to each neural feature. We used GLMs to build the encoder models and then assessed the statistical significance of each of the models.
We ran the analysis of deviance to identify the set of neural features showing a significant relationship to the estimated cognitive state process. For practical reasons, like reducing the computational cost of neural feature extraction and building a more robust decoder estimator, we sought a decoder model with a small number of neural features. To address this, we used a secondary model selection step, which groups neural features, identified by the model selection step, by their performance in decoding the cognitive state. We will discuss this step of the model selection after describing the decoding procedure.
2.3 Stochastic Cognitive State and Encoder Model
2.3.1 Neural Decoder
2.3.2 Pruning the Neural Features
In the model selection step, we use an -test with a fixed p-value threshold to identify an initial set of candidate features. One option would be to simply select all features with significant associations with the cognitive state when correcting for multiple comparisons. Instead, we minimize a cross-validated objective function, such as the root-mean-squared-error (RMSE) or model deviance, to prune the encoder model neural features. To do so, we use a backward elimination model selection process. We start with all the candidate neural features and examine all decoder models with features to identify the one with the lowest cross-validated RMSE. We then repeat this procedure with the selected features and drop features sequentially. Using this selection process, the optimal number of features is the number with the lowest cross-validated RMSE. This process produces a constrained encoder model with a subset of the neural features used to build the full encoder model. Given the one-at-a-time nature of dropping neural features, it is possible for the backward model to miss the optimal subset of neural features. The step-wise model selection techniques, including the forward and backward methods, tend to amplify the statistical significance of the variables that stay in the model, and thus, the model selection has a tendency to have a higher type 1 error. To have a more robust pruning step, we could use more advanced subset selection techniques, which could also be adopted in the neural feature pruning step (Chipman et al., 2001; Miller, 2002). For instance, regularization methods including Lasso, group Lasso (Friedman, Hastie, & Tibshirani, 2010b) and bridge Lasso (Huang, Ma, Xie, & Zhang, 2009), widely utilized techniques for model selection in high-dimensional data, can be applied in the pruning step. The neural decoder provides the posterior distribution of the cognitive state per each trial; thus, other measures of the decoder performance built on the distribution of estimation like coverage area or estimation skewness can be used in the feature pruning step.
2.3.3 Decoder Performance Analysis
The decoder performance can be analyzed using different metrics. The first metric we used is a 95% high-density region (HDR) measure (Hyndman, 1996), in which we can check how many points of a specific trajectory of the cognitive state lie in the 95% HDR of the neural decoder posterior estimate. The 95% HDR provides a good sense of how similar the cognitive state's true distribution—derived using behavioral signals—and its estimate are. The other measure we use is the RMSE between the mean of the cognitive state estimated from the behavioral data and the mean of the decoder posterior over the test-neural data set. The 95% HDR and RMSE provide a collection of metrics to assess the individual decoder model. We can also use the K-S statistic, which provides a test to examine whether the test data set neural features are samples of their corresponding trained encoder model or not. Here, the null hypothesis is the test data set, and neural features are samples of the trained encoder model. Using this test, we can also check whether the encoder model assumption is valid beyond the training set. We can also use the K-S test to check whether the cognitive state, estimated using behavioral model, is a sample of the neural decoder posterior distribution.
2.4 Further Note on the Model Structure
In section 3, after deriving the mathematical solution of the encoder-decoder model, we discussed a series of goodness-of-fit analysis methods to identify a parsimonious decoder model and assess the prediction result. In section 4, we demonstrate the methodology through application to a specific cognitive task performed by human participants with intractable epilepsy undergoing clinically indicated intracranial recordings. We demonstrate the decoding of a cognitive state using brain activity.
3 Application: Decoding Cognitive State from Behavior and Local Field Potential Data during Performance of a Cognitive Task
We applied this modeling framework to neural and behavioral data recorded from eight human participants while they performed the multi-source interference task (MSIT; Bush et al., 2003; Bush & Shin, 2006). The objective was to build an E-D model to estimate the dynamics of a set of underlying cognitive processes related to baseline task difficulty and to the effect of interference stimuli using recorded behavioral and neural data. We first built a behavioral model to estimate these state processes and then identified a neural E-D model expressing their neural correlates. We estimated the mapping from each neural feature to the estimated cognitive states. We then performed feature pruning to determine a subset of neural features that are ultimately used to build a neural decoder model to predict the cognitive state (see Figure 2). For each step, we addressed goodness-of-fit analyses and performance results and validated the modeling result using assessment criteria.
3.1 Experimental Data
Human participants consisted of eight patients with long-standing pharmaco-resistant complex partial seizures who voluntarily participated after fully informed consent according to NIH and Army HRPO guidelines as monitored by the Partners Institutional Review Board. Intracranial EEG (iEEG) recordings were made over the course of clinical monitoring for spontaneous seizures. The decision to implant electrodes, and the number, types, and location of the implantations were all determined on clinical grounds by a team of caregivers independent of this study. Participants were informed that their involvement in the study would not alter their clinical treatment in any way and that they could withdraw at any time without jeopardizing their clinical care.
Depth electrodes (Ad-tech Medical, Racine WI, or PMT, Chanhassen, MN) had diameters of 0.8 to 1.0 mm and consisted of 8 to 16 platinum/iridium-contact leads at 2.4 mm long. Electrodes were localized by using a volumetric image coregistration procedure. Using Freesurfer scripts (Reuter, Rosas, & Fischl, 2010; Reuter, Schmansky, Rosas, & Fischl, 2012; http://surfer.nmr.mgh.harvard.edu), the preoperative T1-weighted MRI (showing the brain anatomy) was aligned with a postoperative CT (showing electrode locations). Electrode coordinates were manually determined from the coregistered CT (Dykstra et al., 2012). Mapping to brain regions was performed using the electrode labeling algorithm (Peled et al., 2017). Intracranial recordings were made using a recording system with a sampling rate of 2 kHz (Neural Signal Processor, Blackrock Microsystems). At the time of acquisition, depth recordings were referenced to an EEG electrode placed on the skin (cervical vertebrae 2 or Cz). LFP analysis was performed using custom analysis code in Matlab (MathWorks) and Fieldtrip, an open source software implemented in Matlab (http://www.ru.nl/neuroimaging/fieldtrip; Oostenveld, Fries, Maris, & Schoffelen, 2011). All LFP data were subsequently decimated to 1000 Hz, de-meaned relative to the entire recording, and line noise and its harmonics up to 200 Hz were removed by subtracting the bandpassed notched filtered signals from the raw signal. All LFP channels were bipolar re-referenced to account for volume conduction (Bastos & Schoffelen, 2016).
Participants performed the MSIT with simultaneous recordings of behavior and LFPs from both cortical and subcortical brain structures. MSIT was designed to reliably and robustly activate the dorsal anterior cingulate cortex (dACC), which plays a critical role in cognitive processing in healthy individuals. It combines sources of cognitive interference (Stroop, 1935; Eriksen & Eriksen, 1974; Simon & Berbaum, 1990) with factors known to increase dACC activity so as to maximally activate dACC within individuals (Bush et al., 2003, Bush & Shin, 2006). The MSIT has been used to identify the cognitive/attention network in normal individuals and those with neuropsychiatric disorders (Bush & Shin, 2006).
MSIT trials consist of presentation of images containing three numbers from 0 to 3, where two of the numbers have the same value and one differs (see Figure 3A). The images are presented for 1.75 seconds and jittered within 2 to 4 seconds between images on a computer screen with either Presentation software (Neurobehavioral Systems) or Psychophysics toolbox (Matlab; Brainard & Vision, 1997; Kleiner et al., 2007; Pelli, 1997). The participant has to identify, via button press, the identity of the number that is unique, not its position. The trials contained two levels of cognitive interference: high-conflict or incongruent trials had the position of the unique number different from the keyboard position, inducing some cognitive incongruence (incongruent or interference trials), while low-conflict trials had the unique number in the same numerical position (congruent or noninterference trials). Trials were presented in a pseudo-random order such that the identity of the pictures was random but the congruence changes were balanced.
The behavioral data per trial consist of the reaction time (RT) (see Figure 3B) and response accuracy. For the interference or incongruent trials, participants responded around 200 ms more slowly on average than on noninterference or congruent trials (see Figure 3C), which was in agreement with previous reports (Bush & Shin, 2006). Furthermore, when the effect of noninterference trials is controlled for, RT showed a slow change over time, and this reflected slow improvement in the participants' moment-to-moment cognitive control (Yousefi et al., 2015).
The neural features used in the E-D model were extracted from LFP recorded across electrode pairs that were localized to cortical structures comprising the dorsal anterior cingulate cortex (dACC), dorsolateral prefrontal cortex (dlPFC), rostral anterior cingulate cortex (rACC), lateral orbitofrontal cortex (OFC), dorsomedial prefrontal cortex (dmPFC), and ventrolateral prefrontal cortex (vlPFC). The choice of these regions was based on brain regions activated in fMRI studies (see Figure 3D) during cognitive conflict processing (Bush et al., 2003; Bush & Shin, 2006; Sheth et al., 2012).
3.2 Behavioral Model of the Cognitive State
Notably, in this example case, the participant got faster (decreased RTs) as the task proceeded over 242 trials (see Figure 4A). Another simple way of estimating would be to use a moving average filter to get a smoother estimate of the RT (see Figure 4A). Since incongruent trials increased reaction time and added noise to the RT (see Figure 4B), we estimated by subtracting the smoothed log RT (see Figure 4A) from the log RT of incongruent trials. When we modeled the RT using equation 4.1, we found that the baseline state () (see Figure 4C) had a similar trend as the smoothed RT (see Figure 3A) while the conflict state () (see Figure 4D) showed a rapid rise between trials 30 and 50, a trend that is more difficult to detect in the noisy smoothed conflict RT (see Figure 4B). Thus, using the behavioral model, we could combine different behavioral trial types—for instance, baseline and interference components—to estimate the underlying cognitive state. We next focused on finding the neural correlates, which could also describe these changes in the cognitive state.
3.3 Model of Neural Encoding and Decoding
We constructed a collection of neural features consisting of spectral power in the theta (4–8 Hz), alpha (8–15 Hz) and high gamma (65–200 Hz) extracted from cortical LFPs, bipolar re-referenced to account for volume conduction (Bastos & Schoffelen, 2016). We considered these frequency bands because theta and alpha bands from scalp EEG have been shown to modulate with the RT during MSIT (Gonzalez-Villar & Carrillo-de-la-Peña, 2017), and we found a high correlation between LFP high gamma power and RT in some of our preliminary data analysis. For each MSIT trial, the spectral power in the theta, alpha, and high gamma frequency bands was calculated using a Morlet wavelet transform over a 2 s time window starting at the MSIT image onset. This procedure produced, for each trial and each participant data set, a total number of neural features three times the number of channel pairs included. The number of neural features and the number of MSIT trials ranged from 216 to 321 and 242 to 447, respectively, across eight data sets. Thus, we arrive at a large matrix with trials as rows and neural features as columns (see Figure 5).
Using the modified -test (see equation 3.9), we selected candidate features in our example data set that showed significant association with the estimated state (neural features shown circled in orange in Figure 6A). Since we used a linear model, we also examined the values for each neural feature and found that they were higher for those features that were selected by the modified -test (see Figure 6A). The neural feature exhibiting the highest (green asterisks in Figures 6A and 6B) demonstrated that there can be a clear predictive power, or at least a significant correlation, between even single neural features and the cognitive state estimate (see Figure 6C). On the other hand, a feature with low (see the red asterisk, Figure 6A) did not show any correlation with the cognitive state estimate (see Figure 6D).
To avoid the multiple comparison problem, we tested if the encoder selected the neural features that had a significant functional relationship with the baseline state by shuffling the trial order and fitting encoder models to the baseline state of these shuffled trials. We performed 100 shuffles and determined the number of neural features that were selected to have significant relationships (-test, ) with the shuffled trials. We found that the actual number of features selected was significantly higher before shuffling trials (see Figure 7). This showed that the features selected using the modified F-test have a significant encoding relationship with the baseline state and that they were not selected by chance. To address the multiple comparison issue, we can also use the Bonferroni correction procedure by setting the significance cutoff at where is the number of neural features (Nakagawa, 2004). However, depending on the correlation structure of the tests, the Bonferroni correction could be extremely conservative, leading to a high rate of false negatives. Note that the neural features are extracted from different brain areas and different frequency bands; thus, they tend to be less correlated signals. This suggests that the significant value of 0.01 chosen here is a reasonable choice to avoid the multiple comparison problem.
To identify candidate features for final neural decoding, we performed cross-validation with a training set comprising either the first half or the last half of the trials in the participant data set. The training set was determined as the half that showed a greater range of the baseline state decoded from RT. Thus, the neural features identified via the modified -test were the ones that showed a significant predictive power of the baseline cognitive state during the training set trials of the experiment. The same trials were used for neural feature pruning. For this second step, we used an RMSE measure between the mean cognitive state estimated from RT and that decoded from the neural features of the training set. The first step of the feature selection procedure applied to a single participant example data set (see Figures 4 and 6) using the modified -test (as described in section 3.4.1) identified 36 out of 249 neural features. The pruning step was then applied to these 36 features, which produced a subset of 14 neural features with the lowest RMSE (see Figure 8A).
We then used the resulting encoder models to decode the cognitive baseline state at each trial using only neural activity. In doing so, we computed the instantaneous likelihood of the cognitive state as a function of the neural features for each trial (see equation 3.10). The instantaneous likelihood is the product of individual likelihoods of the selected encoder models over corresponding neural features (see Figures 8B and 8C). We then used equations 3.11 and 3.12 to find the filter estimate——of the cognitive state. We estimated the mean baseline state using the encoder models corresponding to both the extended feature set (selected via -test) and the smaller constrained set, selected after pruning (see Figure 8D).
The mean baseline state decoded from the constrained model had a lower RMSE and higher correlation coefficient with that decoded from RT when compared to the one decoded from the full model (RMSE 0.081, 0.149; Corr 0.803, 0.698). Furthermore, the former had a higher HPD than the latter (0.974, 0.707). Thus, the constrained model produced a more robust and accurate estimate of the baseline state than the full model that included more neural features.
To compare the decoder performance using the full encoder model and the constrained encoder model, we used three metrics: (1) RMSE between the mean decoded states using RT and neural features selected by the modified F-test and pruning procedure, divided by the range of the mean decoded state from RT; (2) ratio of the number of trials for which the neural decoded state was within the confidence bounds of the state estimated from the behavioral data; and (3) correlation coefficient between the mean decoded states using RT and neural features. For a reliable decoder performance, we aimed for a low RMSE, a high ratio of decoded states within the confidence interval, and a high correlation between the decoded states. The constrained model using neural features selected by the stepwise identification procedure performed better in terms of achieving a lower RMSE (-test, ), a higher correlation with the behavior decoded mean state (-test, ), and a higher ratio (-test, ) than the full model using a bigger pool of neural features selected by the modified -test (see Figure 9A). Visual inspection of the neural decoded states also showed that their trajectory closely followed the behavior decoded state (see Figure 9B).
In online appendix B, we show the decoding result in another data set. While the baseline state in Figure 8 has a monotonic trend, the one shown in the appendix has a nonmonotonic trend, and thus it further validates the neural decoding result.
3.4 Neural Features used in Decoding of Flexibility State
We based our choice of the candidate neural features on regions known to be involved in interference tasks like the MSIT (Bush et al., 2003; Bush & Shin, 2006). We considered three frequency bands: theta, alpha, and high gamma. Using the model pruning procedure, we found that dlPFC neural features were correlated with the baseline (reaction time) state in six of eight participant data sets while vlPFC neural features were used in five of eight data sets (see Table 1). High gamma power in dACC was selected in 4 of 8 data sets.
|Region .||Low (4–8 Hz, 8–15 Hz) .||High (65–200 Hz) .|
|Region .||Low (4–8 Hz, 8–15 Hz) .||High (65–200 Hz) .|
We have developed a new encoding-decoding framework to estimate the dynamics of cognitive processes using combinations of behavioral and neural signals. This framework is different from classical state-space paradigms, which are applied to directly relate behavior and neural activity in low-dimensional and observed data. It is also fundamentally different from those state-space models used in the neural motor decoders. In the domain of neural motor decoders, the state-space models are generally used to improve the accuracy of the neural decoder and potentially build a more robust decoder model by “denoising” the neural spike trains; using the state-space model to build a low-dimensional representation of neural spike trains motivated by the idea that recording a larger number of cells' activity does not necessarily improve the decoding accuracy (Kao et al., 2015; Aghagolzadeh & Truccolo, 2016). Our framework is scalable in neural encoding-decoding problems when both behavioral and neural data become high-dimensional and multifaceted, whereas developing neural encoder-decoder models for the classical methods becomes a challenging modeling problem. For instance, in many cognitive tasks, both reaction time and accuracy carry information about brain function; as a result, neither a neural decoder model for reaction time nor accuracy can properly capture the behavior. Appendix A shows an example where both RT and accuracy carry mutual information about the behavior and underlying cognitive process. Building mathematical models between neural features and multidimensional behavioral signals—specifically, each with a different characteristic—becomes a hard modeling problem. Using the proposed modeling framework, we demonstrated how a cognitive state could be estimated using the cohort of neural features identified through the neural encoding step. Under this modeling framework, the behavior can be estimated using distributed neural activity without any explicit low-dimensionality assumption on brain dynamics; thus, we can estimate those components of behavior that are shaped by a distributed neural activity. The modeling framework has been successfully applied on a data set of eight participants performing the MSIT task.
There are multiple challenges in building neural decoder models for cognitive state processes. Cognitive states are generally abstract features that influence measurable signals rather than being directly measurable themselves. This contrasts with decoder models of signals such as those in motor cortex, where the associated movements can be measured (Serruya et al., 2002; Flint et al., 2012). The outcomes of a cognitive state may be observed across a range of behavioral signals, and our proposed behavioral encoder-decoder model demonstrates how different signals can be optimally combined in decoding the underlying cognitive state. Indeed, emergence of different cognitive processes requires the involvement of different brain networks and time courses; thus, the neural decoder model must be defined as a function of collective neural activity across multiple brain regions rather than a specific neural feature of an individual brain structure. For instance, cognitive flexibility is linked to the frontal cortical and anterior cingulate network (Sheth et al., 2012; Kim et al., 2011). Building a model to articulate behavior and neural activity is a hard problem given the multimodality and dynamicity of the behavior and the large dimension of the neural features (Paninski et al., 2007; Koyama et al., 2010).
The idea of using a low-dimensional dynamical variable that captures the essential features of the behavior has certain advantages that can be applied to other domains of cognitive research, including learning, affective processes, and decision making. Previous work has sought to define static latent variables to link neural and behavioral data (Turner et al., 2013), whereas in our framework, we use a dynamical latent variable to link changes between behavior and neural activity over time, which could be useful in other behavioral domains Sarma and Sacré (2018) applied a state-space modeling framework to link the behavioral strategies observed during a gambling task to internal cognitive state processes and subsequently link the estimated values of this state to spiking activity in orbital frontal cortex and insula. The goal of linking neural activity across multiple brain areas to complex behaviors is similar to ours, but we extend this approach by developing techniques to identify a subset of neural features from a candidate set that optimizes cross-validated decoding accuracy. Ezzyat et al. (2017) used patterns of spectral power across electrodes to train a penalized logistic regression classifier used to discriminate encoding activity during subsequently recalled words from subsequently forgotten words. There, they demonstrated the possibility of using high-dimensional neural data to encode brain states; however, their internal state is observable and categorical rather than continuous valued. We built the neural encoder model using spectral power features of LFP recorded from the human brain; however, the proposed modeling framework is applicable to other modalities of neural data, including fMRI and EEG. It is also applicable to other neural features such as a coherence measure representing coupling of different brain areas and measures representing spatio-temporal neural dynamics. The other modality of neural activity that can be used in building the neural decoder model is cell ensemble spiking activity. The neural decoder model can be built using a point-process modeling framework and GLMs on the encoder side (Truccolo et al., 2005; Kass & Ventura, 2001; Deng et al., 2015). The dynamical behavioral model proposed here can be extended to include other physiological signals including heart rate, skin conductance, pupil dilation, movement activity, or voice (Munia, Islam, Islam, Mostafa, & Ahmad, 2012; Sun et al., 2010).
Though the main scope of this work focuses on linking multi-dimensional and multimodal behavioral and neural data using a low-dimensional dynamical latent variable(s), the idea of using state-space modeling frameworks to factorize different cognitive processes and build a low-dimensional dynamical representation of cognitive process(es) can be used in a one-dimensional behavioral readout, like RT. This is because most cognitive tasks incorporate different stimuli to trigger a specific cognitive process, and thus the RT is not merely the outcome of a single cognitive process but a complex combination of various processes, such as attention or adaptation. An accurate estimation of a desired cognitive process requires removing other confound cognitive components present in the readout signal. In other words, even a low-dimensional behavioral readout like RT can be a complex function of multiple cognitive processes, and we can use the state-space modeling framework to regress out unnecessary and confounding cognitive processes to build a more accurate estimation of the desired cognitive processes. This suggests that smoothing the behavioral readout is not necessarily the optimal choice to characterize the relationship between brain and behavior, since the smoothed metric would include the processes of no interest. Indeed, we argue that the framework proposed here can be used to build a more accurate representation of behavior and its connection to the neural activity by separating out these hidden states as features in the modeling framework.
In section 4, when we discussed the MSIT behavioral signal, we built the behavioral decoder using two cognitive process components, baseline and conflict, and focused on decoding the baseline state using neural data. In online appendix C, we demonstrate how these two states, the conflict and baseline states, can be combined to predict RT. This combined state estimate model could better predict RT than a decoder model built solely based on (smoothed) RT, showing how the state-space modeling framework may more optimally predict behavior from the underlying component cognitive processes.
In developing the E-D framework, we applied different goodness-of-fit analyses to build and validate the optimal encoder model, assess the decoding performance, and select neural correlates with significant statistical power. In future work, we could also use other model selection processes including step-wise regression or regularized GLM estimation like GLMNet (Claeskens & Hjort, 2008; Friedman, Hastie, & Tibshirani, 2010a). Similarly, this framework has the flexibility to examine different behavioral models and check the encoding result of other state variables of the model providing more insight into the model to arrive at a more robust neural decoder model.
The analysis of the decoding results on the MSIT data set of eight participants suggested the involvement of dACC, vlPFC, and dlPFC. These results support previous studies suggesting that performance on interference/conflict tasks correlates with function within the frontal cortical region (Perret, 1974; Sheth et al., 2012; Cavanagh & Frank, 2014) and the anterior cingulate cortex (Carter et al., 1997; Gonzalez-Villar & Carrillo-de-la-Peña, 2017). The encoder selection procedure not only helped create a decoder model but could also be used to select parsimoniously features for real-time applications. Though the results are consistent with the prior literature (Bush & Shin, 2006), there are other questions that might be answered, such as whether the neural decoder model is consistent across different sessions separated by days and how the changes in the neural model over time can be addressed. Another question is whether we can apply the framework to a more general problem of mental and cognitive estimation where the behavioral outcome is not conditioned on a specific cognitive task such as MSIT. The other important question is whether the E-D framework can be used as a diagnostic tool to assess healthy and abnormal brain dynamics and how the results can be validated using established psychometric assessment tools (Nasreddine et al., 2005). Finally, a trending effort in neuroscience is to intervene in brain dynamics to alleviate neurological problems (Ezzyat et al., 2017; Deadwyler et al., 2017); therefore, it will be important to understand how neuromodulatory effects can be addressed in this encoding-decoding modeling framework.
The modeling framework proposed here may allow investigators to characterize a causal relationship between neural activity and the behavior, particularly when the behavior or the neural activity is dynamic and high-dimensional. We envision that the modeling framework will play a future role in the development of novel brain computer interfaces for modulating behavior. The proposed methodology allows investigators to identify and manipulate a low-dimensional correlate of cognitive function, which can lead to altering neural activity related to behavior. This is an important step in treating mental diseases like posttraumatic stress disorder and obsessive-compulsive disorders.
This research was funded in part by the Defense Advanced Research Projects Agency (DARPA) under Cooperative Agreement Number W911NF-14-2-0045, issued by the Army Research Office contracting office in support of DARPA'S SUBNETS program. The views, opinions, and findings expressed are our own and should not be interpreted as representing the official views or policies of the Department of Defense or the U.S. government. We thank Gio Piantoni, J. B. Eichenlaub, Pariya Salami, Nir Nossenson, Erica Johnson, Mia Borzello, Kara Farnes, Deborah Vallejo-Lopez, Gavin Belok, Rina Zelman, Sam Zorowitz, and Britni Crocker for their help in recording the human data and task preparation.
A copy of the source code used in this research, along with sample data, can be found at the following GitHub link: https://github.com/TRANSFORM-DBS/Encoder-Decoder-Paper.