Abstract
An important approach for studying the human brain is to use functional neuroimaging combined with a task. In electrophysiological data, this often involves a time-frequency analysis, in which recorded brain activity is time-frequency transformed and epoched around task events of interest, followed by trial-averaging of the power. While this simple approach can reveal fast oscillatory dynamics, the brain regions are analysed one at a time. This causes difficulties for interpretation and a debilitating number of multiple comparisons. In addition, it is now recognised that the brain responds to tasks through the coordinated activity of networks of brain areas. As such, techniques that take a whole-brain network perspective are needed. Here, we show how the oscillatory task responses from conventional time-frequency approaches can be represented more parsimoniously at the network level using two state-of-the-art methods: the HMM (Hidden Markov Model) and DyNeMo (Dynamic Network Modes). Both methods reveal frequency-resolved networks of oscillatory activity with millisecond resolution. Comparing DyNeMo, HMM, and traditional oscillatory response analysis, we show DyNeMo can identify task activations/deactivations that the other approaches fail to detect. DyNeMo offers a powerful new method for analysing task data from the perspective of dynamic brain networks.
1 Introduction
Non-invasive neuroimaging techniques, such as functional magnetic resonance imaging and magneto/electroencephalography (M/EEG), provide a means for viewing the human brain during function. These imaging techniques are often used in a task paradigm. Here, a participant is asked to perform a particular task while their brain activity is recorded1. The task is designed to isolate a particular cognitive process, therefore studying the brain activity in response to the task can reveal insights into how the brain performs the necessary cognition for the task.
A well-known property of the brain is the emergence of oscillatory activity from neuronal populations, which can occur both at rest and in response to a task (Buzsáki, 2006). M/EEG is a particularly useful modality for studying these oscillations because of its high temporal resolution and the fact that it is a direct measure of neuronal activity.
A very popular method for analysing task M/EEG data is to compute oscillatory responses using time-frequency (TF) decompositions (Luck, 2014). This approach looks at the trial-averaged, oscillatory response that occurs in a task at a single spatial location, for example, a brain region. However, such an approach neglects the interaction of a particular region with the rest of the brain.
More recently, the widespread adoption of large-scale neural recordings and whole-brain imaging techniques has prompted neuroscientists to develop new frameworks for understanding brain activity at the level of inter-regional interactions, rather than focusing solely on region-specific responses (Lin et al., 2022; Urai et al., 2022). A potentially more powerful approach for studying M/EEG data would be to take a large-scale network perspective. After all, it is widely believed that the human brain performs cognition via large-scale functional networks (Bressler & Menon, 2010). Therefore, identifying the functional networks that activate in response to a task may yield a better understanding of the cognitive processes occurring in the brain. This also alleviates the burden of doing separate analyses on each region, in terms of both the difficulties of interpretation and the multiple comparison correction.
A popular hypothesis for the mechanism of communication in the brain is via the phase locking of oscillatory activity in different neuronal populations (‘communication via coherence’ (Fries, 2015)). Given this hypothesis, we expect the brain to exhibit large-scale functional networks of coherent activity. Indeed, this has been shown to be the case: fast transient networks of coherent activity have been identified in task (Quinn et al., 2018) and at rest (Vidaurre et al., 2018). Both Quinn et al. (2018) and Vidaurre et al. (2018) used an HMM (Hidden Markov Model) (Rabiner & Juang, 1986) to identify transient networks in M/EEG data. This is an unsupervised method, meaning the model learns the timing of network activations directly from the data without any input from the user.
An important assumption made by the HMM is that only one network is active at a given time point. More recently, a more flexible unsupervised dynamic network model called DyNeMo (Dynamic Networks Modes) has been introduced (Gohil et al., 2022). Crucially, this model allows co-activations of multiple networks to occur. Multiple cognitive processes in the brain are likely to occur simultaneously. Each cognitive process may be underpinned by its own large-scale functional network (Bressler & Menon, 2010). Therefore, we hypothesise that DyNeMo could provide a better model for understanding cognitive processes in the brain, and could be particularly useful in the context of task data analysis.
Here, we show how we can use the network descriptions of the HMM and DyNeMo to analyse task M/EEG data. This can be thought of as doing a task response analysis at the network level rather than computing event-related responses for the activity in every brain area separately. We compare conventional TF methods for computing oscillatory responses when analysing task data, to the dynamic network-level approaches of the HMM and DyNeMo. Using simulations with overlapping network activations, we show that DyNeMo can identify co-activations/deactivations that the HMM cannot. Using a publicly available real MEG task dataset, we show qualitatively that the network-based perspective provided by the HMM and DyNeMo can identify task responses that conventional oscillatory response methods fail to detect. Finally, we show that DyNeMo can provide a more precise dynamic network description compared to the HMM, due to its ability to infer brain networks with temporally overlapping dynamics.
2 Methods
2.1 Datasets
2.1.1 Simulation data
To evaluate the performance of the HMM and DyNeMo in identifying oscillatory networks, we trained both models on simulated data containing bursting oscillatory activity. Bursts were simulated using a two-state Markov chain with the transition probability matrix
where one state corresponds to an ‘on’ state and the other to an ‘off’ state. The burst time course for each network was independently sampled from this transition probability matrix. The observed data were simulated at each region of interest (parcel) in source space as
where is the amplitude, is the oscillatory frequency of the network, is time, is a random phase, and is Gaussian noise with variance . A random phase lag was simulated between regions, and symmetric orthogonalisation (Colclough et al., 2015) was applied to remove zero-lag correlations. At a sampling frequency of 250 Hz, with the transition probability matrix in Eq. (1), this leads to an average lifetime of approximately 40 ms. The amplitude of the bursts alternated between a value of for low-amplitude events and for high-amplitude events.
Two networks were simulated: a visual network with activity at Hz in six parcels in the occipital lobe and a motor network with activity at Hz in four parcels in the motor cortex. There were 38 parcels in total, and we simulated 25600 time points. We applied time-delay embedding with lags2 and principal component analysis (PCA) to reduce the dimensionality down to 120 channels (approximately 85% explained variance); see Section 2.3 for further details. Finally, we standardised (z-transform) the data before training a model.3
2.1.2 Real task MEG data
In this report, we will study a simple visual perception task. Nineteen participants (11 male, 8 female, 23–37 years old) were presented with an image of famous or unfamiliar face or a scrambled image. The participants were asked to judge the symmetry of the image via a button press. This button press was included to ensure the participant focused on the images presented. Figure 1 shows a schematic of a single trial. Each participant completed six sessions4 that contained approximately 200 trials evenly split between famous, unfamiliar, and scrambled images. See Wakeman and Henson (2015) for further details regarding the experimental paradigm and dataset.
Single-trial schematic of the task. Participants were presented with an image of a face (famous or unfamiliar) or a scrambled image and pressed a button in response to indicate their judgement on the symmetry of the image. See Wakeman and Henson (2015) for further details.
Single-trial schematic of the task. Participants were presented with an image of a face (famous or unfamiliar) or a scrambled image and pressed a button in response to indicate their judgement on the symmetry of the image. See Wakeman and Henson (2015) for further details.
The public dataset provides the MaxFiltered data. The following steps were applied to the data using the OHBA Software Library (OSL) (Quinn et al., 2022).
2.1.2.1 Preprocessing
Starting from the MaxFiltered data, we apply some basic preprocessing. This includes filtering the data, downsampling, and automated artefact removal. The exact preprocessing steps we applied are summarised in Supplementary Information (SI) Figure S3A.
2.1.2.2 Source reconstruction
Starting from the preprocessed sensor-level data, we perform source reconstruction using a Linearly Constrained Minimum Variance (LCMV) beamformer (Hillebrand & Barnes, 2005). This involves projecting the preprocessed sensor data onto a 8 mm dipole grid inside the inner skull. We use the brain/skull surfaces extracted from each subject’s structural MRI for the head model. The noise covariance matrix used to calculate the beamformer was estimated using the sensor-level data for a subject and regularised to a rank of 60 using PCA. Note, the MaxFiltering step reduces the rank of the sensor-level data to approximately 64 and there is an additional reduction in the rank between 1-4 due to artefact removal with independent component analysis (ICA). This resulted in the choice of 60 for the rank of the sensor-level data covariance. We use a unit-noise-gain invariant LCMV beamformer (Westner et al., 2022), which normalises the beamformer weights to ensure depth does not impact the amplitude of source signals.
2.1.2.3 Parcellation
Using the dipole time courses from source reconstruction, parcel time courses were calculated using a 38 region of interest (non-binary, probabilistic) parcellation. First, the dipole time courses are weighted (multiplied) by the probability of belonging to a parcel. Then, PCA is performed on the weighted dipole time courses for each parcel and we take the first principal component as the parcel time course. Following this, we apply symmetric orthogonalisation to the parcel time courses (Colclough et al., 2015) to minimise spatial leakage and dipole sign flipping to align the sign of each parcel time course across subjects. The exact steps we applied are summarised in SI Figure S3B.
2.2 Conventional single-channel methods
2.2.1 TF response
A common approach for studying task MEG data is to compute oscillatory responses using TF transformations (Luck, 2014). While the ordering of steps can somewhat vary, this involves calculating the TF response as shown in Figure 2: first, the continuous data are epoched around the occurrence of an event, a TF transformation (typically a wavelet (Luck, 2014)) is calculated using the epoched data, and the absolute value is taken. Finally, we trial average to give the full TF response. The full TF oscillatory response contains both the fixed latency (task-phase-locked) response, known as the evoked response, and the jittered latency (non-task-phase-locked) response, known as the induced response (Tallon-Baudry & Bertrand, 1999). We can estimate the evoked TF response by TF transforming the trial-averaged time series rather than each trial individually. Subtracting the evoked TF response from the full TF response, we can estimate the induced TF response. This calculation can be done at the sensor (outside of the brain) or source (within the brain) level. The TF response is calculated for each channel separately. Note that in this paper, channels can correspond to sensors or to parcels in brain space. The result of such an analysis is a 2D array (time by frequency) for the evoked/induced TF response for each subject and channel.
Calculation of a time-frequency (TF) response in conventional single-channel analyses. (A) Individual trials (calculated by epoching the data around a particular event). The individual trials can contain both an evoked (phase-locked to the task, i.e., fixed latency) and induced (not phase-locked to the task, i.e., jittered latency) response. (B) TF transform and calculation of oscillatory power at each time and frequency (e.g., a wavelet transform) of each trial. (C) Full TF oscillatory response calculated by averaging the trial-specific TF responses; note that this contains both the evoked and induced response. (D) Average response over the trials of the data prior to the TF transform. (E) Evoked TF response calculated by TF transforming the trial average from (D). (F) Induced TF response calculated by subtracting the evoked TF response from the full TF response.
Calculation of a time-frequency (TF) response in conventional single-channel analyses. (A) Individual trials (calculated by epoching the data around a particular event). The individual trials can contain both an evoked (phase-locked to the task, i.e., fixed latency) and induced (not phase-locked to the task, i.e., jittered latency) response. (B) TF transform and calculation of oscillatory power at each time and frequency (e.g., a wavelet transform) of each trial. (C) Full TF oscillatory response calculated by averaging the trial-specific TF responses; note that this contains both the evoked and induced response. (D) Average response over the trials of the data prior to the TF transform. (E) Evoked TF response calculated by TF transforming the trial average from (D). (F) Induced TF response calculated by subtracting the evoked TF response from the full TF response.
In this paper, we will epoch the data using a window from ms to 1 s, where s is when an event occurs and is time. We will use a Morlet wavelet (Luck, 2014) with a length of four cycles to calculate the TF transformation for the frequency range 6–30 Hz using a frequency bin width of 0.5 Hz. To reduce memory requirements, we will slide the Morlet wavelet with a spacing of three samples, that is, use a decimation factor of three, which lowers the temporal resolution of TF transformation by a factor of three.
Often, it is necessary to baseline correct the TF response to isolate the change in activity due to the event (Luck, 2014). In this paper, we will subtract the mean TF response from ms to 0 s for each frequency separately. This is the last step after calculating the evoked/induced TF response.
2.2.2 Statistical significance testing
Once we have calculated a TF response for each subject and channel, we need to test for statistical significance to check whether the response we have observed can be simply due to chance. We test if the average TF response across subjects (group average) is significant for each time point and channel. The null hypothesis is that the event does not evoke/induce a TF response, that is, a TF response of zero. We calculate a -value for the TF response we have observed given this null hypothesis and if the response is sufficiently unlikely (has a -value ), we conclude it is significant. In this report, we use non-parametric permutation testing with a GLM (General Linear Model) (Winkler et al., 2014) to do this. We use the maximum COPE (Contrast of Parameter Estimate) across time and channels to control for multiple comparisons (familywise error rate). See SI Section 1.1 for further details.
2.3 Dynamic network models and post-hoc analysis
Here, we introduce the two state-of-the-art methods we will use for identifying dynamic networks in the task MEG data: the HMM (Vidaurre et al., 2016) and DyNeMo (Gohil et al., 2022). Both methods infer networks without any knowledge of the task. We used the OHBA Software Library Dynamics toolbox (osl-dynamics) (Gohil et al., 2024) implementation of these models in this work.
2.3.1 Hmm
Here, we use an HMM with a multivariate normal observation model. This model assumes the data are generated by a finite set of states, each with its own characteristic covariance matrix (which represents a network). We assume the mean vector is zero. Only one state may be active at a particular time point. Based on this, the model learns a time course for the probability of a state being active at each time point in the input data. Both the state covariance matrices (i.e., functional networks) and probabilities are learnt directly from the data, without any input from the user. We use the “TDE-HMM” variant of this model, which was introduced in Vidaurre et al. (2018). See Gohil et al. (2024) for further details describing this generative model.
2.3.2 Dynemo
Here, we also use a multivariate normal observation model. However, DyNeMo assumes the data are generated by a set of modes (Gohil et al., 2022), each with its own characteristic covariance matrix (assuming zero mean). Crucially, these modes can overlap. The model learns the coefficient for how much each mode contributes to the total (instantaneous) covariance of the data at a given time point5. In Section 3.1, we will show this model outperforms the HMM using simulated data when the ground truth includes co-activating network bursts. Both the mode covariance matrices (i.e., functional networks) and mode time courses are learnt directly from the data, without any input from the user. See Gohil et al. (2022, 2024) for a detailed description of DyNeMo and a comparison with the HMM. SI Figure S2 compares the generative model for DyNeMo to the HMM.
2.3.3 Time-delay embedding
Before training each model, we prepare the data by performing time-delay embedding (Gohil et al., 2024). This is a process of augmenting the data with additional channels that contain time-lagged versions of the original channels. By doing this, we encode the spectral properties of the original data into the covariance matrix of the time-delay embedded data. See SI Figure S1 for description of how this works. This step allows us to model frequency-specific phase coupling (coherence) between pairs of parcels (as encoded by the cross-correlation function into the covariance matrix). Note, phase coupling between parcels can only occur at a particular frequency here. In DyNeMo, cross-frequency coupling would manifest as a co-activation of modes with activity at different frequencies.
In this paper, we will embed 14 additional channels for each original channel corresponding to lags of -7 to 7 samples. This corresponds to an embedding window of ms with 250 Hz data. We will also reproduce our analysis using a different number of lags to ensure our results are robust to this choice (see SI Fig. S8). After time-delay embedding, we are often left with a very high-dimensional time series. We will use PCA to reduce the number of channels to 80 to reduce the computational load in training a model. Eighty PCA components were chosen because they explained a high amount of variance (approximate 70%) in the time-delay embedded data. The final step before training a model is to standardise (z-transform) the data.
The data preparation and dynamic network modelling is summarised in SI Figure S3C. The use of lags in time-delay embedding is a good choice for 250 Hz data for researchers who are interested in 1–45 Hz activity in the original data. We recommend using enough PCA components to explain approximately 70% of variance.
2.3.4 Post-hoc spectra
After training the HMM/DyNeMo, we calculate state/mode spectra using the inferred state/mode time courses and the source (parcel) data. For the HMM, we use the multitaper approach described in Vidaurre et al. (2016), and for DyNeMo, we use the GLM-spectrum approach described in the SI of Gohil et al. (2022). Both of these methods provide high-resolution power/cross spectral densities (P/CSDs) for each parcel and pair of parcels for each state/mode and subject.
2.3.5 Power maps
We calculate power maps from the group-average state/mode PSDs. Power maps are calculated by taking the average value across a frequency band in the PSD for each parcel. For the HMM, we use the non-negative matrix factorisation (NNMF) approach introduced in Vidaurre et al. (2018) to select the frequency range. We take the first NNMF component from fitting two components to the stacked coherence spectra from each subject and state. This typically covers the frequency range 1–22 Hz. For DyNeMo, we average the mode PSDs across the entire frequency range (1–45 Hz). When visualising the power maps, we will show them relative to mean across states/modes without any thresholding.
2.3.6 Coherence networks
We use the coherence averaged over a frequency range as our measure of functional connectivity. To calculate coherence networks, we first calculate the coherency spectrum from the group-average P/CSDs using
where is the CSD for parcels and , and () is the PSD for parcel (). We then take the average value across a frequency range to give a single value for the coherence for the edge connecting parcel and . For the HMM, we use the first NNMF component (same as the power maps, approximately 1–22 Hz) to specify the frequency range. For DyNeMo, we use the full frequency range (1–45 Hz). For visualisation, we show the HMM coherence network relative to the mean across states, whereas for DyNeMo we retain the original values. We threshold the edges by selecting the top 2% of edges irrespective of sign.
2.3.7 Network response
Post-hoc, we perform an event-related response analysis on the inferred network activity (state/mode time courses for the HMM/DyNeMo). We calculate the network response by epoching and trial-averaging the state/mode time courses. For the HMM, the state time course corresponds to the Viterbi path (binarised state probability time course). For DyNeMo, the mode time course corresponds to the maximum a posteriori estimate for the mixing coefficient at each time point. Importantly, since each state/mode describes a network with specific oscillatory activity, a network response represents a set of underlying oscillatory responses that occur across that network. This is shown in Figure 3D and E. The result of a dynamic network analysis is a 1D time course for the network response for each subject and state/mode.
Comparison of a conventional parcel-based and network approach for oscillatory response analysis. (A) Parcel time courses epoched around events of interest (trials). (B) Full TF response of individual trials at the parcel level. (C) Trial-averaged full TF response at the parcel level. (D) Network activity (state/mode time course) epoched around events of interest. (E) Network response (trial-averaged state/mode time course).
Comparison of a conventional parcel-based and network approach for oscillatory response analysis. (A) Parcel time courses epoched around events of interest (trials). (B) Full TF response of individual trials at the parcel level. (C) Trial-averaged full TF response at the parcel level. (D) Network activity (state/mode time course) epoched around events of interest. (E) Network response (trial-averaged state/mode time course).
Similar to the TF response, we baseline correct the network response by subtracting the average value 100 ms before s. We use the same approach as we did for the TF response to test for statistical significance (non-parametric permutation testing with a GLM). Note, the network response contains both the evoked and induced oscillatory response. The inability of the dynamic network models in distinguishing between evoked and induced oscillatory responses is a limitation of the network approach.
3 Results
We will first compare the dynamic network perspective provided by the HMM and DyNeMo using simulated data in Section 3.1. Following this, we will compare the different methods for analysing task data using the real MEG dataset (described in Section 2.1).
3.1 Simulation
To illustrate DyNeMo’s ability to identify overlapping network activations, we trained both dynamic networks models on the simulated data described in Section 2.1.1. The ground truth simulated power map and PSD for each network is shown in Figure 4A.II and A.III respectively. The corresponding ground truth parcel-level brain activity is shown in Figure 4A.I. The dashed vertical lines in Figure 4A.I show the onset of visual (green) and motor (blue) network activations.
The assumption of mutual exclusivity harms the HMM’s ability to model the data when there are co-activating bursts of oscillatory activity. (A) Ground truth simulation of parcel-level data: a subset of the parcel time courses (I) and properties of the networks (II-IV). (B) Conventional parcel-level analysis applied to the simulated data. (C) HMM network analysis applied to the simulated data. (D) DyNeMo network analysis applied to the simulated data. (C.VI) and (D.VII) were calculated by multiplying the state/mode PSDs by the network response for the HMM and DyNeMo respectively.
The assumption of mutual exclusivity harms the HMM’s ability to model the data when there are co-activating bursts of oscillatory activity. (A) Ground truth simulation of parcel-level data: a subset of the parcel time courses (I) and properties of the networks (II-IV). (B) Conventional parcel-level analysis applied to the simulated data. (C) HMM network analysis applied to the simulated data. (D) DyNeMo network analysis applied to the simulated data. (C.VI) and (D.VII) were calculated by multiplying the state/mode PSDs by the network response for the HMM and DyNeMo respectively.
Using the simulated parcel-level data (shown in Fig. 4A), we performed a conventional TF response analysis selecting a parcel in the occipital cortex (Fig. 4B.I, top) and motor cortex (B.I, bottom). We observe the conventional single-parcel full TF analysis correctly identifies the oscillatory frequency of each burst. Note, there is also a wide-band activation across many frequencies at the onset of the valid 10 (or 20) Hz power increase at approximately 100 ms, which is an artefact of the spectral estimation method (wavelet transform).
Next, we trained the HMM on the simulated parcel-level data. The perspective provided by the HMM is shown in Figure 4C. We can see from the network response (C.II) that there is a clear response to the onset of visual and motor bursts. However, we see that multiple states are activated in response to each event type. We also see from the power maps (C.III) that the activity from both event types is present in each state. Looking at the model projected full TF response (C.V), which is calculated by multiplying the network response time course by the PSD for each state and summing, we can see the HMM is still a good model for the underlying oscillatory dynamics, just its decomposition of activity into states prevents it from identifying the ground-truth description of the data. Note, increasing the number of states in the HMM does not help it identify the ground-truth networks in the simulation.
Finally, we trained DyNeMo on the simulated parcel-level data. The perspective provided by DyNeMo is shown in Figure 4D. We see from the network response (D.II) that DyNeMo is better able to isolate bursts of oscillatory activity into separate modes. We also see from the power maps (D.IV) that DyNeMo correctly identifies the simulated networks and the model projected full TF response (D.VI) is a good representation of the ground-truth oscillatory content of the simulated data. In addition to the correct identification of event timings, DyNeMo is also able to learn the amplitude of each event type via the value of the mixing coefficients. Figure 4D.III shows the distribution of mode mixing coefficient values for when high- and low-amplitude visual events occur (left) and motor events occur (right). We see when there is a low-amplitude event the mean of the mode mixing coefficient distribution is much lower than for high-amplitude events. Note, using less modes than the number of networks in DyNeMo leads to the visual and motor networks combining into a single mode. Specifying more than three modes, DyNeMo learns only three modes are needed to describe the data and the extra modes have virtually no activation.
3.2 Real task MEG dataset
There are five unique events in this dataset: the presentation of a famous face, unfamiliar face, scrambled image, left button press, and right button press (see Fig. 1). We will study the average response (across trials, subjects, and sessions) for different event types. Specifically, we will look at:
What is the response to all visual stimuli? (Section 3.2.1.)
What is the difference in response between faces and scrambled images? (Section 3.2.2.)
What is the difference in response between famous and unfamiliar faces? (Section 3.2.3.)
What is the response to a button press (left or right)? (Section 3.2.4.)
3.2.1 Visual stimuli: all
Turning to the real MEG dataset, first, we look at the average response over all visual stimuli.
3.2.1.1 Single-channel analysis reveals a fast oscillatory response to visual stimuli in the -band in the occipital lobe and -band in the frontal lobe
Figure 5 shows the TF response calculated using conventional single-channel analysis. To be able to show such a parsimonious result, we have pre-selected a single sensor/parcel in the occipital lobe, as this is an area known to be involved with visual processing (Baars & Gage, 2013), and in the frontal lobe. At the sensor level (Fig. 5A), we can see a short-lived increase in -band (8–12 Hz) activity followed by a decrease relative to baseline and a long-lived increase in -band ( Hz) activity in the frontal lobe. We see a consistent description in source space (Fig. 5B). Both analyses indicate a very fast response to the task, on the order of 100 ms, which is consistent with existing literature (Carlson et al., 2013).
Conventional TF analysis reveals a fast oscillatory response to the task. For all visual stimuli: (A) Conventional sensor-level analysis. (B) Conventional parcel-level analysis. The black outline in TF plots indicates clusters with -value 0.05. Note, we had to pre-select the sensors/parcels of interest in this analysis.
Conventional TF analysis reveals a fast oscillatory response to the task. For all visual stimuli: (A) Conventional sensor-level analysis. (B) Conventional parcel-level analysis. The black outline in TF plots indicates clusters with -value 0.05. Note, we had to pre-select the sensors/parcels of interest in this analysis.
3.2.1.2 Functional brain networks of oscillatory activity are identified by the HMM and DyNeMo directly from the data
Figure 6 shows the networks inferred by the HMM (A) and DyNeMo (B). Figure 6A.I and B.I show the spatial distribution of oscillatory power for each network as well as the phase-locking connectivity (coherence). We identify well-known functional networks found in MEG data (Gohil et al., 2024). Of particular interest in this work is HMM states 2 and 5, which represent visual networks with -band activity; state 3, which represents a frontotemporal network that is often associated with higher-order cognition (Baars & Gage, 2013) with -band (1–8 Hz) activity; and state 4, which is a sensorimotor network with -band (13–30 Hz) activity. HMM state 1 is identified as the default mode network with wide-band (1–45 Hz) activity (Vidaurre et al., 2018). The remaining state (6) represents a suppressed wide-band power network, which in previous HMM analyses has often been ignored. In this work, we will explore the relevance of this network in relation to the task and its connection to the DyNeMo networks. The DyNeMo networks that are of particular interest are: mode 2, which is a visual network with -band activity; mode 3, which is a frontotemporal network with -band activity; and modes 4 and 5, which respectively represent right and left lateralised sensorimotor networks with -band activity. The remaining networks are mode 1, which is a left temporal network with -band (4–12 Hz) activity often associated with language (Fedorenko & Thompson-Schill, 2014), and mode 6, which is a suppressed power background network that does not show any additional oscillatory activity compared to the static (time-averaged) PSD (Gohil et al., 2022). Figure 6C shows the average values for the (renormalised) DyNeMo mode time course when each HMM state is active. This signifies the amount each DyNeMo mode contributes to an HMM state. We see multiple DyNeMo modes combine to form an HMM state. Note, the networks presented in Figure 6 can be reliably found in this dataset; see SI Figure S5.
DyNeMo and the HMM provide a fundamentally different network perspective. (A.I) Network states inferred by the HMM: power maps (top), coherence networks (middle), and PSDs (solid coloured line) with the static PSD (dashed black line) (bottom). (A.II) State probabilities for the first 8 s for the first session. The width of each colour represents the probability of a particular state. (B.I) Network modes inferred by DyNeMo: power maps (top), coherence networks (middle), and PSDs (bottom). (B.II) Renormalised mode time course (weighted by trace of each mode covariance (Gohil et al., 2022)) for the first 8 s for the first session. (C) Time-average DyNeMo renormalised mode time course when each HMM state is active minus the average across all time points.
DyNeMo and the HMM provide a fundamentally different network perspective. (A.I) Network states inferred by the HMM: power maps (top), coherence networks (middle), and PSDs (solid coloured line) with the static PSD (dashed black line) (bottom). (A.II) State probabilities for the first 8 s for the first session. The width of each colour represents the probability of a particular state. (B.I) Network modes inferred by DyNeMo: power maps (top), coherence networks (middle), and PSDs (bottom). (B.II) Renormalised mode time course (weighted by trace of each mode covariance (Gohil et al., 2022)) for the first 8 s for the first session. (C) Time-average DyNeMo renormalised mode time course when each HMM state is active minus the average across all time points.
3.2.1.3 Network responses to task are identified with millisecond temporal resolution
Figure 7A.II and B.II show network responses in the form of trial-averaged epoched state/mode time courses. Importantly, since each state/mode describes a network with specific oscillatory activity, a network response represents a set of underlying oscillatory responses that occur across that network. Both the HMM and DyNeMo were able to identify networks with millisecond-level dynamics directly from the data. DyNeMo provides a much simpler network response compared to the HMM. We see initial visual network activation, followed by a frontotemporal network activation combined with visual network deactivation. The reason we see many significant network activations and deactivations for the HMM is likely due to the mutual exclusivity, where for a network to activate, all other networks must deactivate. Lifting this constraint leads to the more parsimonious description provided by DyNeMo. Note, DyNeMo’s network response looks smoother than the HMM because DyNeMo has the ability to model both the amplitude and timing of network activations via the mode time course (see Fig. 4D.III), whereas the HMM can only model the timing with a binary state time course. Comparing the network responses (Fig. 7A.II and B.II) with a conventional single-parcel analysis (full TF response in Fig. 5B), we see oscillatory activity in the occipital lobe parcel is actually part of a wider visual network. Using the HMM/DyNeMo network description, we can project a model estimate for the oscillatory activity at a single parcel6. This is shown in SI Figure S6. We can see both the HMM and DyNeMo provide a good description of the oscillatory response to visual stimuli for a single parcel.
Both the HMM and DyNeMo reveal a millisecond resolution dynamic network-level perspective of brain activity in response to the task, with DyNeMo providing a more parsimonious decomposition. For all visual stimuli: (A) Parcel-based HMM analysis: (I) The inferred state probabilities for the first 8 s of the first session. The dashed vertical lines indicate when a particular visual stimulus was presented; (II) Network response. The colour of each line indicates its state; (A.III) Properties of the networks that show a significant response (-value ). (B) Parcel-based DyNeMo analysis: (I) The inferred (renormalised) mode time course; (II) Network response; (III) Properties of the networks that show a significant response (-value ). The horizontal bars above each time course in (A.II) and (B.II) indicate the time points with -value .
Both the HMM and DyNeMo reveal a millisecond resolution dynamic network-level perspective of brain activity in response to the task, with DyNeMo providing a more parsimonious decomposition. For all visual stimuli: (A) Parcel-based HMM analysis: (I) The inferred state probabilities for the first 8 s of the first session. The dashed vertical lines indicate when a particular visual stimulus was presented; (II) Network response. The colour of each line indicates its state; (A.III) Properties of the networks that show a significant response (-value ). (B) Parcel-based DyNeMo analysis: (I) The inferred (renormalised) mode time course; (II) Network response; (III) Properties of the networks that show a significant response (-value ). The horizontal bars above each time course in (A.II) and (B.II) indicate the time points with -value .
3.2.2 Visual stimuli: faces versus scrambled
Next, we examine a more subtle task effect by looking at the difference in response between the presentation of a face versus a scrambled image (the response to faces (famous and unfamiliar) minus the response to scrambled images). Studying this process allows us to understand the cognitive processes that occur in the detection and identification of a face in contrast to non-facial images.
3.2.2.1 Dynamic network approaches (HMM and DyNeMo) can identify a late network response that conventional methods fail to detect
Figure 8A and B show the difference in TF response for faces versus scrambled calculated using conventional methods at the sensor and parcel level respectively. We see using conventional methods that we fail to identify any significant induced TF responses in posterior sensors/parcels (Fig. 8A and B). We do observe a significant increase in evoked and full TF response but only at the parcel level (Fig. 8B). SI Figure S7 shows we fail to observe any significant TF responses in frontal sensors/regions. Turning to the dynamic network models, which describe the full TF response to the task, we see both the HMM and DyNeMo show a larger activation of the visual network for faces over scrambled images (Fig. 8C and D, right). Additionally, the HMM is able to identify a late activation of a suppressed power network (state 6, Fig. 8C, right). Figure 8D (middle) shows the time-averaged value for each DyNeMo (renormalised) mode time course for time points in the data when HMM state 6 is active. We see HMM state 6 can be represented as a mixture of DyNeMo modes 2, 3, and 6 and that the observed late response seen with the HMM (state 6 in Fig. 8C, right) is the combination of a visual network (mode 2) deactivation and frontotemporal network (mode 3) activation in DyNeMo (Fig. 8D, right).
Unique combinations of DyNeMo modes are modelled as distinct states in the HMM. For faces (famous and unfamiliar) versus scrambled images: (A) Conventional single-channel sensor-level analysis. (B) Conventional single-region parcel-level analysis. (C) Parcel-based HMM network analysis. (D) Parcel-based DyNeMo network analysis. The black outline in the TF plots in (A) and (B) indicates clusters with -value 0.05. The horizontal bars above each time course in (C) and (D) indicate the time points with -value . The bar chart in (D) shows the activation of each DyNeMo mode when HMM state 6 is active.
Unique combinations of DyNeMo modes are modelled as distinct states in the HMM. For faces (famous and unfamiliar) versus scrambled images: (A) Conventional single-channel sensor-level analysis. (B) Conventional single-region parcel-level analysis. (C) Parcel-based HMM network analysis. (D) Parcel-based DyNeMo network analysis. The black outline in the TF plots in (A) and (B) indicates clusters with -value 0.05. The horizontal bars above each time course in (C) and (D) indicate the time points with -value . The bar chart in (D) shows the activation of each DyNeMo mode when HMM state 6 is active.
3.2.2.2 Dynamic network analyses (HMM and DyNeMo) reveal a sequence of oscillatory network activations related to the cognitive processing of faces
Studying the difference between the network response to the presentation of a face (famous and unfamiliar) and scrambled images allows us to understand the network activations that occur during the processing of faces. First turning our attention to the network description provided by the HMM (Fig. 8C, right), we see there is an early visual network response (state 2, occipital -band activity) that lasts approximately 100 ms, which could represent a bottom-up response to the detection of a face. Following this, we see an activation of a frontotemporal network (state 3, frontotemporal /-band activity) that lasts approximately 100 ms. Following this, there is a long-lived (300–400 ms) activation of the suppressed visual network (state 6, suppressed occipital -band activity). The DyNeMo perspective (Fig. 8D, right) shows something similar; however, it is able to identify that the late suppression of the visual network (deactivation of mode 2) occurs simultaneously with an activation of a frontotemporal network (mode 3). This provides a new insight that the suppressed activity in the occipital cortex maybe linked to the activity in frontotemporal regions. Such activity could represent a top-down feedback response that occurs in the identification of faces. This description is consistent with existing literature on face recognition in the primate visual system (Moeller et al., 2008).
3.2.3 Visual stimuli: famous versus unfamiliar
Next, we look at an even more subtle effect by comparing the difference in response between famous and unfamiliar faces (the response to famous faces minus the response to unfamiliar faces).
3.2.3.1 Only DyNeMo reveals a deactivation of the visual network and activation of the frontotemporal network for famous versus unfamiliar faces
Figure 9A and B show the results of a conventional TF analysis at the sensor and parcel level respectively for the famous versus unfamiliar contrast. We see no significant TF response. Looking at the HMM analysis for this contrast in Figure 9C, we see a hint that there may be a response in the suppressed power network (state 6). However, this response is unconvincing. Comparing this to the network response identified by DyNeMo in Figure 9D, we see a much clearer response with a deactivation of the visual network (mode 2) and activation of the frontotemporal network (mode 3).
Modelling the data using DyNeMo can reveal subtle differences in brain activity missed by conventional methods and the HMM. For famous versus unfamiliar faces: (A) Conventional single-channel sensor-level analysis. (B) Conventional single-region parcel-level analysis. No significant time points or frequencies were found with conventional analysis. (C) Parcel-based HMM network analysis. (D) Parcel-based DyNeMo network analysis. The horizontal bars above each time course in (C) and (D) indicate the time points with -value .
Modelling the data using DyNeMo can reveal subtle differences in brain activity missed by conventional methods and the HMM. For famous versus unfamiliar faces: (A) Conventional single-channel sensor-level analysis. (B) Conventional single-region parcel-level analysis. No significant time points or frequencies were found with conventional analysis. (C) Parcel-based HMM network analysis. (D) Parcel-based DyNeMo network analysis. The horizontal bars above each time course in (C) and (D) indicate the time points with -value .
3.2.3.2 The identification of faces (famous vs. unfamiliar) occurs in the late network response
Focusing on the network response provided by DyNeMo (Fig. 9D), we do not see the early visual network (mode 2) activation that we saw in the faces versus scrambled contrast (Fig. 8D). This suggests the detection of a face occurs early on (around 100 ms after the presentation of the image). Following this, the network response shown in Figure 9D suggests the identification of the face (whether is it a famous or unfamiliar face) occurs after 400 ms and involves the frontotemporal network (mode 3) and visual network (mode 2). This description is consistent with existing literature (Moeller et al., 2008).
3.2.4 Button press: left versus right
The final contrast we study is the response to a button press. Note, the experiment involves two types of button presses: a left button press executed with the left index finger and a right button press executed with the right index finger.
3.2.4.1 DyNeMo identifies unilateral components of the sensorimotor network, providing a better model of the data than the HMM
Figure 10A and B, respectively, show the results of a conventional single-channel (sensor/parcel) analysis. Here, we epoch around all button presses (left or right). We selected a sensor/parcel near the motor cortex as we expect this area to be involved in executing a button press (Baars and Gage, 2013). We observe the expected post-movement -rebound a few hundred milliseconds following the button press (Cheyne, 2013). We see a consistent network description with both the HMM (Fig. 10C) and DyNeMo (Fig. 10D). Note, the activation of the visual networks after the button press is due to the start of the next trial. An interesting feature of the network description provided by DyNeMo is that the sensorimotor network has been split into two unilateral components (modes 4 and 5). Epoching the mode time courses for these two modes around left and right button presses, which were executed with left and right index fingers respectively (Fig. 10E), we see the contralateral network activates for each button press. This is in contrast to the HMM, where a single bilateral sensorimotor network activates in response to both left and right button presses.
DyNeMo offers a more precise dynamic network description, inferring significant unilateral sensorimotor networks that are missing from the HMM. For the button press events: (A) Conventional single-channel sensor-level analysis. (B) Conventional single-region parcel-level analysis. (C) Parcel-based HMM networks analysis. (D) Parcel-based DyNeMo network analysis. (E) Network responses for the HMM/DyNeMo epoched around left and right button presses. The black outline in the TF plots in (A) and (B) indicates clusters with -value 0.05. The horizontal bars above each time course in (C), (D), and (E) indicate the time points with -value .
DyNeMo offers a more precise dynamic network description, inferring significant unilateral sensorimotor networks that are missing from the HMM. For the button press events: (A) Conventional single-channel sensor-level analysis. (B) Conventional single-region parcel-level analysis. (C) Parcel-based HMM networks analysis. (D) Parcel-based DyNeMo network analysis. (E) Network responses for the HMM/DyNeMo epoched around left and right button presses. The black outline in the TF plots in (A) and (B) indicates clusters with -value 0.05. The horizontal bars above each time course in (C), (D), and (E) indicate the time points with -value .
4 Discussion
4.1 Sensor versus source-space analysis
In Figures 5, 8–10 we compared a conventional single-channel TF analysis in sensor and source (parcel)-space. We observed both methods provided consistent descriptions of the timing and oscillatory content of the response to the task. The main advantage of performing the analysis in source space is the ability to isolate the response in a particular region in the brain, which can aid with interpretability. For example, in Figure 5B, we were able to show the oscillatory response to visual stimuli occurs in the occipital lobe. An additional benefit of performing analysis in source-space is that the process of source reconstruction incorporates information regarding the head position relative to the sensors, which could provide a useful denoising effect.
4.2 Disadvantages of a conventional analysis
The conventional event-related oscillatory response analysis of task data uses region-by-region TF decompositions. It is simple, quick, and easy to compute. However, it is limited in what it can achieve. It only looks at a single sensor or region of interest at a time. Additionally, there can be a large number of sensors/parcels, which means we can end up performing a large number of multiple comparisons when we test for statistical significance, when we cannot sensibly pre-specify a small set of sensors/parcels of interest. This can drastically reduce our sensitivity to subtle effects. For example, we could not identify any differences in the response to famous versus unfamiliar faces in Figure 9A and B.
4.3 Advantages of a dynamic network analysis
Adopting a dynamic network approach can provide a useful perspective on how the brain responds to a task. Rather than doing many parcel-based TF analyses to oscillatory responses, we can instead identify how large-scale networks respond to the task. Since each state (with the HMM) or mode (with DyNeMo) describes a network with specific oscillatory activity, these network responses represent a set of underlying oscillatory responses that occur across that network that we can interrogate. For example, with a dynamic network approach we can understand which large-scale networks underpin single-region oscillatory changes. Also, once we know which networks activate or deactivate in response to a task, we can look up existing literature relating to the network and understand its function in a wider context. In Figure 7, we identified a visual and frontotemporal network is involved in the task we studied.
Another useful advantage is there is no need to specify the oscillatory frequencies of interest. Both the HMM and DyNeMo can learn networks with distinct oscillatory activity directly from the data. We also show in Figure 7 that we can resolve dynamics with the same resolution as event-related TF analyses, meaning we do not sacrifice any temporal precision with the dynamic network perspective. Note, the network response reflects both the evoked and induced TF response.
Decomposing brain activity into a low number of states (with the HMM) or modes (with DyNeMo) acts as a dimensionality reduction technique, which can aid with interpretation. In this work, we chose 6 to give us a low-dimensional network description. This also allows us to reduce the number of multiple comparisons we perform, which can increase our sensitivity to subtle effects. For example, only with the network models did we see a difference in the response between faces and scrambled images (Fig. 8).
4.4 The dynamic networks identified by the HMM and DyNeMo can be reliably inferred
As with many modern machine-learning models, the HMM and DyNeMo can suffer from a local optima issue, where slightly different networks can be inferred from the same data. With good initialisation, improved training techniques, and larger datasets, this has been minimised (Gohil et al., 2024). Empirically, we find selecting the best run7 (the one with the lowest loss8) from a set of 10 runs robustly finds reproducible network descriptions of the data. We also find group-level analyses, such as the network responses, are robust. In Figure S5, we show the same analysis repeated on 3 sets of 10 runs and demonstrate the same networks and conclusions are obtained with each.
The DyNeMo networks (Fig. 6B) appear to be more localised compared to the HMM networks (Fig. 6A). This raises the question whether the localised connectivity patterns we observe are due to volume conduction. The measure for connectivity that we use is coherence, which quantifies the level of phase stability between two oscillations irrespective of the value for the phase difference. Volume conduction results in zero phase-lag activity, which we remove in the symmetric orthogonalisation step following parcellation. Therefore, we conclude the coherence we observe is not due to volume conduction.
4.5 DyNeMo provided the most parsimonious network description of the methods tested
Both the HMM and DyNeMo are state-of-the-art methods for identifying dynamic networks in neuroimaging data. However, we showed in simulation (Fig. 4) that the assumption of mutually exclusive states may compromise the network description provided by the HMM. DyNeMo overcomes this limitation by allowing networks to overlap at a given time point. The network description provided by DyNeMo appears to be much more straightforward to interpret. We saw in Figure 7B, that the network response to visual stimuli was isolated to just a couple networks (modes 2 and 3) with localised power activations. However, with the HMM (Fig. 7A) many networks (states 2–6) show a significant response to the task. This is likely an artefact of the assumption of mutual exclusivity, which for a state to activate requires all other states to deactivate.
4.6 DyNeMo offered the most precise network descriptions of the methods tested
In this task, participants perform a button press with either their left or right index finger. When we compared the network response to left and right button presses separately, we saw DyNeMo was able to identify unilateral sensorimotor networks contralateral to movements (modes 4 and 5) whereas the HMM did not (Fig. 10E). This network description was learnt directly from the data, without any information regarding the button press being fed into the model. Such a description could be useful for studying lateralisation of function (Mutha et al., 2012) or inter-hemispheric connectivity (Krupnik et al., 2021).
4.7 Alternative approaches
DyNeMo is not the only approach that can model overlapping spatial patterns. For example, Hyvärinen et al. proposed ‘Fourier-ICA’, which applies ICA to the stacked short-time Fourier transform of each channel in M/EEG data (Hyvärinen et al., 2010). This approach is able to find overlapping patterns of time-frequency modes across channels. However, it does not model the interaction between channels or brain regions. DyNeMo, in contrast, is an explicit model for the interaction between channels or brain regions; in the latter case providing a task response in terms of functional connectivity or graphical network dynamics. This is achieved via the cross-spectral density modelling in the covariance matrix of time-delay embedded data, and DyNeMo is free to use this information to help infer the best spatiotemporal modes that describe the data. DyNeMo also benefits from an explicit temporal model (the ‘model RNN’ (Gohil et al., 2022)), which helps in inferring the mode time courses, whereas the sources in Fourier-ICA (analogous to the mode time courses in DyNeMo) are determined only by the maximisation of non-Gaussianity. In both approaches, choices need to be made in preparing the training data. For Fourier-ICA, it is the parameters used to calculate the short-time Fourier transform. For DyNeMo, it is the choice of time-delay embedding and PCA parameters. Another more recent approach was proposed by Power et al. (2023), which used convolutional dictionary learning on MEG data to identify repeated ‘atoms’ of temporally overlapping spatial patterns across channels in response to a task. One advantage of this approach is its ability to learn the waveform shape of the response in the raw training time series, which is not possible with Fourier-ICA or DyNeMo. However, unlike DyNeMo, this approach does not model the interaction between channels or brain regions.
5 Conclusions
Conventional event-related methods can be a useful technique for understanding the oscillatory response at a single region. However, a potentially more powerful technique that can give a whole-brain perspective is a dynamic network approach. We have shown that two state-of-the-art methods: the HMM and DyNeMo can learn frequency-specific large-scale functional networks that activate in response to a task with millisecond resolution. Comparing the HMM with DyNeMo, we found DyNeMo provides a particularly useful network description for studying task data. We provide the scripts for reproducing the analysis presented in this report here: github.com/OHBA-analysis/Gohil2024_NetworkAnalysisOfTaskData.
Ethics Statement
The original study that collected the data (Wakeman & Henson, 2015) was approved by Cambridge University Psychological Ethics Committee. Written informed consent was obtained from each participant prior to and following each phase of the experiment. Participants also gave separate written consent for their anonymised data to be freely available on the internet.
Data and Code Availability
The data are publicly available (Wakeman & Henson, 2015). Code to reproduce analysis presented in this manuscript is available here: https://github.com/OHBA-analysis/Gohil2024_NetworkAnalysisOfTaskData.
Author Contributions
C.G.: Conceptualisation; Methodology; Software; Validation; Formal Analysis; Data Curation; Writing—Original Draft; Writing—Reviewing & Editing; and Visualisation. O.K.: Conceptualisation; Writing—Reviewing & Editing. RH: Software. M.W.J.E.: Conceptualisation. O.P.J.: Writing—Review & Editing; L.T.H.: Writing—Review & Editing. A.J.Q.: Conceptualisation; Methodology. M.W.W.: Conceptualisation; Methodology; Data Curation; Supervision; Writing—Original Draft; and Writing—Reviewing.
Funding
This research was supported by the National Institute for Health Research (NIHR) Oxford Health Biomedical Research Centre. The Wellcome Centre for Integrative Neuroimaging is supported by core funding from the Wellcome Trust (203139/Z/16/Z). C.G. is supported by the Wellcome Trust (215573/Z/19/Z). O.K. is supported by the Marie Sklodowska-Curie Innovative Training Network “European School of Network Neuroscience (euSNN)” (860563). R.H. is supported by the EPSRC Centre for Doctoral Training in Health Data Science (EP/S02428X/1). M.W.J.E. is supported by the Wellcome Trust (106183/Z/14/Z, 215573/Z/19/Z), the New Therapeutics in Alzheimer’s Diseases (NTAD) supported by the MRC and the Dementia Platform UK (RG94383/RG89702). O.P.J. is supported by the MRC (MR/X00757X/1), Royal Society (RG/R1/241267), NSF (2314493), NFRF (NFRFT-2022-00241), and SSHRC (895-2023-1022). L.T.H. is supported by Henry Dale Fellowship (208789/Z/17/Z) from the Wellcome Trust; a NARSAD Young Investigator Grant from the Brain and Behavior Research Foundation; and the NIHR Oxford Health Biomedical Research Centre. M.W.W. is supported by the Wellcome Trust (106183/Z/14/Z, 215573/Z/19/Z), the New Therapeutics in Alzheimer’s Diseases (NTAD) study supported by UK MRC, the Dementia Platform UK (RG94383/RG89702), and the NIHR Oxford Health Biomedical Research Centre (NIHR203316). The views expressed are those of the author(s) and not necessarily those of the NIHR or the Department of Health and Social Care.
Declaration of Competing Interest
No competing interests.
Supplementary Materials
Supplementary material for this article is available with the online version here: https://doi.org/10.1162/imag_a_00226.
References
We often refer to the data collected during a task as task data.
This number of lags is sufficient in this simulation example because of the relatively low and narrowband oscillatory frequencies that are contained in the data. In real data, we typically use lags because there are many more and boarder oscillatory frequencies present.
We also simulate the scenario of increased oscillatory power without the phase coupling by narrowband filtering white noise; see SI Figure S4 for this.
In the accompanying Python scripts, each session is referred to as a ‘run’.
Note, in this work we will reweight and renormalise the inferred mode coefficients using the trace of each mode covariance. This ensures the (renormalised) mode time courses account for the magnitude of each covariance. See Gohil et al. (2022) for further details.
This is done by multiplying the network response by the state/mode PSD.
A run is a single model trained from scratch.
In our case, the loss is the variational free energy (Bishop, 2006) for both the DyNeMo and the HMM.