## Abstract

The frequency spectrum is a central method for representing the dynamics within electrophysiological data. Some widely used spectrum estimators make use of averaging across time segments to reduce noise in the final spectrum. The core of this approach has not changed substantially since the 1960s, though many advances in the field of regression modelling and statistics have been made during this time. Here, we propose a new approach, the General Linear Model (GLM) Spectrum, which reframes time averaged spectral estimation as multiple regression. This brings several benefits, including the ability to do confound modelling, hierarchical modelling, and significance testing via non-parametric statistics. We apply the approach to a dataset of EEG recordings of participants who alternate between eyes-open and eyes-closed resting state. The GLM-Spectrum can model both conditions, quantify their differences, and perform denoising through confound regression in a single step. This application is scaled up from a single channel to a whole head recording and, finally, applied to quantify age differences across a large group-level dataset. We show that the GLM-Spectrum lends itself to rigorous modelling of within- and between-subject contrasts as well as their interactions, and that the use of model-projected spectra provides an intuitive visualisation. The GLM-Spectrum is a flexible framework for robust multilevel analysis of power spectra, with adaptive covariate and confound modelling.

## 1 Introduction

Frequency-domain analyses of oscillations in electrophysiological recordings of brain activity contain information about the underlying neuronal activity. Both the peaks of specific oscillations and the broader spectral shape are informative about brain function and have inspired a wide literature across neuroscience (Buzsaḱi & Draguhn, 2004; Kopell et al., 2014). The time-averaged periodogram is the predominant method for spectrum estimation in neuroscience. It computes the average Fourier spectrum across a set of sliding window segments (Bartlett, 1948, 1950; Welch, 1967) based on the premise that the data are comparable over time and that the effect of noise will be attenuated when averaging across segments. This algorithm produces a statistical estimate of a spectrum and has remained largely the same for many decades. Statistical methods have greatly progressed in this time and many newer approaches can be directly applied to the windowed periodogram.

Here, we propose the General Linear Model Spectrum (GLM-Spectrum) framework for analysing time-averaged periodogram estimates of frequency spectra. This reframes the method of averaged periodograms as a regression problem by modelling frequency spectra over successive windows as a linear mixture of a set of user-specified regressors. This links linear spectrum estimation to the GLM analyses that have been developed for a broad range of neuroimaging applications, including structural and functional MRI (Friston et al., 1994; Woolrich et al., 2009), event-related fields (Smith & Kutas, 2014), and induced responses (Litvak et al., 2013). Specifically, we demonstrate the utility of multilevel models (Friston, 2007; Woolrich et al., 2004), non-parametric permutation testing (Nichols & Holmes, 2001; Winkler et al., 2014), contrast coding, and confound regression in the context of spectrum estimation. GLM-Spectrum can be applied to analyse any time series, from Local Field Potentials to multichannel Electro- and Magnetoencephalography (EEG & MEG). It is not dependent on any specific preprocessing methods in sensor- or source- space analyses, beyond what would apply to a typical frequency spectrum analysis. The GLM-Spectrum could also be configured to perform task analyses on the timescale of the sliding windows used in the STFT. For example, a set of “boxcar” regressors could be defined to contrast different task states. More broadly, the method is applicable to any time series analysis that looks to estimate a Fourier-based frequency spectrum.

We illustrate the GLM-Spectrum by analysing EEG recordings alternating between eyes-open and eyes-closed resting-state conditions from a freely available dataset (Babayan et al., 2019). First, the GLM-Spectrum is used to analyse time-series data from a single channel of one individual. The spectrum for the two resting conditions and their difference are computed, whilst a set of covariate and confound regressors account for linear trends over time and a diverse set of potential artefact sources. This approach is generalised to the whole head recording of a single subject to describe the spatial patterns associated with each regressor. Finally, a group-level, whole head analysis explores the GLM-Spectra of specific regressors and contrasts before quantifying how they differ between younger and older participants.

## 2 Methods

### 2.1 Time-averaged periodogram estimation

Time-averaged periodogram methods start by estimating a windowed short-time Fourier transform across time series $y$ using the windowing function $w$

The Fourier transform above is computing the k-th segment of the continuous input $y(t)$, which we denote with $y(t,k)$. The output matrix $Y(f,k)$ contains the STFT, which describes how the spectrum changes in power across the $K$ segments. A time-varying magnitude spectrum $Sy$ or power spectrum $Py$ can be computed from the STFT.

Where N is the length of the sliding window segments. Finally, the time-averaged periodogram is then the average of the time-varying power spectral density across segments.

If the previous computations included a windowing function $w(t)$ and overlapping time segments, then this is Welch’s power spectral density estimate (Welch, 1967). Welch’s time-averaged periodogram has the property that the noise level of the estimate decreases with increased data length, since more input data provide a larger number of segments for the central averaging step. It is still an imperfect estimator that has been subject to criticism (Prerau et al., 2017; Thomson, 1982) but it is practical, straightforward to compute, and in wide use across science and engineering. A detailed description of these equations is provided in Supplementary Section A. A description of how parameters such as window length and sample rate affect the spectrum is provided in Supplementary Section B.

### 2.2 General linear model spectrum

The GLM-Spectrum replaces the averaging step in the time-averaged spectrum estimation methods with a General Linear Model (also known as multiple regression). The GLM is widely used in neuroimaging analyses (Friston, 2007; Woolrich et al., 2009) and the same principles around analysis, model validation, and statistics apply here. The objective is to model the spectrum across the K sliding window segments as a linear function of a set of regressor variables. The magnitude GLM-Spectrum is defined as:

where $Sy(f)$ is the (Kx1) time-varying spectrum estimated at frequency ($f$) across all $K$ segments/windows (the STFT computed in 2) computed from a single channel (time series) of data, X is a (KxP) design matrix containing the P regressors of interest as they vary over time, and e(f) is a (Kx1) vector of residual errors. We model the whole spectrum using a mass-univariate modelling approach that fits a separate GLM for each frequency bin in the FFT. The resulting (Px1) vector B(f) contains the estimated regression parameters. We refer to the whole vector of estimates across frequency as the GLM “beta-spectrum.”^{1}

### 2.3 Estimating the GLM parameters

Once the design matrix has been specified and the data have been transformed into the STFT, we are ready to fit the regression parameters B in equation 4. Under the assumptions specified above, this can be achieved using Ordinary Least Squares (OLS) to estimate the regression parameters (also known as beta-estimates), $B(f)$, as

Alternatively, we can pre-multiply the data by the Moore-Penrose pseudo-inverse (MPPI) (Penrose, 1956) of the design matrix, which performs well even when there are multi-collinearities in $X$ (see Section 2.4):

where the superscript $+$ denotes the MPPI. More complex fitting routines could be used if the assumptions underlying OLS are inappropriate for a particular application. For example, the rest of the GLM-Spectrum framework would work in the same way if $B(f)$ were estimated using a robust or regularised regression. Similarly, it would be possible to extend the approach to Bayesian regression methods. Here, we use the pseudo-inverse model fitting approach (equation 6) for all GLM estimation.

### 2.4 Assumptions of the GLM-spectrum

The theory of linear regression underlying the GLM-Spectrum uses assumptions that simplify the problem and specify the conditions under which the solution is valid. Typically, five different assumptions are defined: validity, linearity, independence of errors, homoscedasticity of errors, and normality of errors. There remains debate about their relative importance (Gelman & Hill, 2007; Knief & Forstmeier, 2021).

The first two assumptions are relatively general. *Validity* states that the data being analysed should be an appropriate match to the research question. This apparently simple point is frequently overlooked by researchers (Gelman & Hill, 2007). *Linearity* is the assumption that the dependent variable can be described as a linear function of the predictors in the model design matrix. This is the central mathematical feature of linear regression models. We cover assumptions about the residuals and the distribution of variables in more detail in the next two sections.

#### 2.4.1 Distribution of the residuals

Three commonly reported assumptions relate to the residuals of the fitted model $e(f)$. *Independence of errors* states that the residuals of the model fit are independent and identically distributed (IID) over observations (over time in the case of GLM-Spectrum). *Homoscedasticity of errors* states that the variance of the error is consistent across all values of the predictor. Finally, *normality of errors* states that the residuals should have a normal, Gaussian distribution. Violating these assumptions affects the validity of inferential statistics computed from the model, limiting our ability to generalise results from our data to the population. Parameter estimates are more robust to violations of these assumptions. In most cases, we anticipate that inferential statistics will not be performed on first-level GLM-Spectrum results. Rather, the parameter estimates or t-values of many first-level models will be combined into a group model.

GLMs are relatively robust to violations of *homoscedasticity of errors* and *normality of errors* (Williams et al., 2013). The p-values computed from models with violations of these errors tend to be robust at moderate to large sample sizes, except in datasets with substantial outliers (Knief & Forstmeier, 2021).

A specific issue for first-level statistics is the likely presence of temporal autocorrelation in $e(f)$, indicating a violation of the *independence of errors* assumption. This issue is commonly encountered in other time-series models such as first-level fMRI analyses (Friston et al., 2000; Woolrich et al., 2001; Worsley and Friston, 1995). Future work that requires valid inferential statistics on first-level GLM-Spectra may develop explicit models for this temporal autocorrelation similar to the approach taken in fMRI (Friston et al., 2000; Woolrich et al., 2001).

#### 2.4.2 Distribution of the data and predictor variables

The ordinary least-squares GLM does not make any formal assumptions about the distributions of the data or predictor variables (Williams et al., 2013). Non-normal predictor variables are commonplace in regression analyses. For example, binary variables that “dummy code” for individual groups of observations are commonly used as predictors. Strongly skewed, fat-tailed or non-normal distributions can still negatively impact the fit by a greater likelihood of influential outlier observations. This may be increasingly problematic with smaller sample sizes, or with increasingly extreme outlier values in the data.

Distribution checking is a critical factor in the choice of whether to use the complex, magnitude, power or log-power spectrum as the dependent variable in a GLM-Spectrum. Power is most commonly used for spectrum analysis but power estimates are strictly positive-valued and tend to have distributions with a strong positive skew. The magnitude ($Sy(f)$) and log-power ($log(Py(f))$) spectra are likely to be more Gaussian and both result in similar GLM-Spectrum results (See Supplementary Section C). Though either of these forms are appropriate, here we use the magnitude spectrum as it is a good compromise between reducing the impact of outliers and maintaining simple visual interpretability of the results.

Future work could equally use the log-power spectrum or consider expanding the GLM-Spectrum approach to use Generalised Linear Modelling (Nelder & Wedderburn, 1972) to account for specific differences in the distribution of the data being modelled. Finally, we do not consider the complex spectrum due to variability in phase across time segments, leading to significant cancellation of the signal. If phase information is critical, and expected to be consistent across time segments, then future work may generalise these statistics to the complex spectrum (for example, Baker, 2021).

### 2.5 Design matrix specification

#### 2.5.1 Regressor selection

The regressors in the design matrix $X$ will typically be secondary time series that are recorded simultaneously with the main data or known *a priori*. The regressors must be prepared in the same manner as the main data, including any filtering and segmentation, to ensure correspondence between the design matrix and data. All regressors used in this paper are segmented following the modelled time-series data and summed within each segment to create a vector of values to use as a covariate.

The GLM is a highly general method as the design matrix, $X$, can be adapted depending on the application in question. However, this flexibility can also make the specification and interpretation of the regressors challenging. The addition of a new regressor to an existing GLM design matrix can change the parameter estimates and standard errors of the previous regressors. Therefore, the final choice and interpretation of any regressors is necessarily specific to each individual analysis.

Standard time-averaged spectrum estimation methods (such as Welch’s Periodogram) model the mean spectrum across time segments. Similarly, most GLM-Spectrum analyses will also want to include regressors that quantify this average. In the simplest case, a single, constant regressor of ones is directly equivalent to the standard method. However, the flexibility of the GLM allows us to build on this and define more sophisticated models with multiple covariates if required.

One extension enabled by the GLM-Spectrum is to use confound regression to model the effect of an artefact source and attenuate its contribution to the estimate of the overall mean. The amount of denoising applied by the model is proportional to the effect size of the confound regressor in question. This makes the confound regression adaptive to each individual model; the same potential noise source may be highly predictive of the STFT in one dataset but not the next. Researchers could consider running a formal model comparison to remove ineffective confound regressors from first-level analyses altogether. Here, we have taken the approach of maintaining all first-level regressors to simplify group analysis. Further group-level permutation testing assesses whether a noise source has a “significant” effect on the STFT. This is a flexible alternative to removing the artefactual time periods altogether. Confound regression can be performed by including a non-zero mean regressor alongside a constant regressor in the design matrix. With this specification, the constant regressor models the intercept (the average where the value of the artefact regressor is zero) whilst the confound regressor quantifies the artefact effect. This example is explored in more detail in Supplementary Section F.

Covariates can be included into the GLM in several ways. We can use indicator regressors (containing zeros and ones) which assume that the covariates effect will be the same each time it is present. Otherwise, we can use dynamic covariates to model phenomena that dynamically change over time in a continuous way. For example, this might include pupil size, heart rate, or respiration rate. When we include these types of continuous regressors, their regression parameters capture the “slope” effect; in other words, how much does the spectrum change with each increment in the value of the regressor. For example, when including a pupil-size regressor, the spectrum resulting from its regression parameter estimates would indicate how much the power in a particular frequency bin increases or decreases as the pupil-size changes by a certain amount.

Another decision is whether to demean a given covariate regressor in the design matrix. Counterintuitively, the interpretation of the regression parameter estimate is unchanged when a covariate is demeaned; in both cases, it is modelling the “slope” effect that quantifies how much the spectrum changes with each increment of the regressor. In contrast, the interpretation of a constant regressor in the same model will change depending on whether a covariate is demeaned or not. A constant regressor will model the mean over all time points if the other covariates are demeaned and will model the intercept if non-zero mean regressors are included. As a result, confound regressors that are intended to remove a given effect from the estimate of the mean will typically have a non-zero mean whilst dynamic covariates that model changes around the mean will be demeaned or z-transformed prior to model fitting.

#### 2.5.2 Multicollinearity

Finally, while it is not a violation of the model assumptions, one should take care when regressors in X can be expressed, to any extent, as a linear combination of other regressors. This is referred to as multicollinearity and means that there are infinite equally good solutions to the regression equation. Using the MPPI to fit the model parameters can overcome this limitation. If multiple solutions to equation 4 exist, the MPPI will return the regression parameters with the minimum Euclidean norm (Penrose, 1956). Note that when there is partial multicollinearity, the MPPI uses the component of the regressor that is uncorrelated with the rest of the design matrix (i.e., corresponding to any unique variability in that regressor) to find each parameter estimate. This property means the MPPI solution quantifies the unique effect of each regressor that cannot be accounted for by the others. Therefore, it is frequently desirable to proceed with the MPPI solution for a GLM whose design contains some degree of multicollinearity that we wish to eliminate from the results.

In addition, the impact of any multicollinearity is naturally accounted for in the variance of the affected regression parameter estimates. For example, when the presence of multicollinearity increases uncertainty about the value of a regression parameter, then that parameter estimate’s variance (as computed in Section 2.6) will be appropriately increased. Nonetheless, even when using the MPPI, we recommend assessing the correlation and singular value spectrum of the design matrix prior to model fitting as well as the variance of the regression parameters (Smith et al., 2007). This ensures that one is aware of the potential impact of multicollinearity on finding a significant result. If these checks identify unexplainable or unintended multicollinearity, perhaps from including too many or inappropriate regressors, then the design should be re-assessed prior to further analysis.

### 2.6 Contrasts and t-statistics

Once the design matrix is specified and the model parameters have been estimated, the GLM-Spectrum consists of a beta-spectrum for each regressor. This beta-spectrum contains the regression parameter estimates quantifying the linear effect of that regressor across the frequency range.

Next, we can compute simple linear combinations of regression parameter estimates, known as contrasts. Contrasts can be defined to ask questions about the size of these linear combinations, including whether they are significantly different to zero (using t-tests). This approach is commonly applied in neuroimaging applications (Friston, 2007; Woolrich et al., 2009).

Each contrast is defined as a vector of values that define the relative weights used to compare different parameter estimates. For example, we could define the following contrasts for a model that contains three regressors in its design matrix:

where C is a (P x $Nc$) matrix containing all $Nc$ contrasts. Using terminology common in neuroimaging, these contrasts define a Contrast Of Parameter Estimates, or a cope, which is computed from a matrix multiplication between the contrast and the model parameter estimates:

Here, we refer to the resulting frequency resolved vector of cope values, cope(f), as the GLM cope-spectrum. The individual contrasts are designed to ask specific experimental questions. Using the examples in equation 7, the first contrast asks whether $c1B^(f)=0$. This specifies a t-test that quantifies whether each value in the beta-spectrum of the first regressor is different from zero; regressors two and three are weighted to zero in this specific contrast, but nevertheless still explain variance in the overall model. The second contrast tests if $c2B^(f)=0$ and asks whether the mean of the beta-spectra from regressors two and three is different from zero. Note that setting the values in this contrast to 0.5 ensures that the contrast of the regression parameter estimates can be interpreted as the mean of the two regression parameters involved. When turned into statistics (see below), contrasts $c1$ and $c2$ are equivalent to one-sample t-tests in classical statistical frameworks.

Finally, testing if $c3B^(f)=0$ tests whether the difference in parameter estimates in the beta-spectrum of regressor 2 minus regressor 3 is different from zero. This is equivalent to an independent-samples t-test between the conditions modelled by these regressors. Regressor 1 is set to zero in the second two contrasts and is not directly included in the comparison. However, it is still explaining variance in the model and may be indirectly affecting the outcome of the contrast between regressors 2 and 3.

These contrasts are useful combinations of parameter estimates but we need the associated standard error to complete a formal statistical test. The ratio of the contrast value (cope) and its standard error is a t-statistic that indicates the estimated magnitude of a cope relative to its standard error. To compute the standard errors and subsequent t-statistics for each contrast, we first need to compute the residuals of the model fit:

Note that $Ry(f)$ contains the actual set of residuals for a given dataset and model fit. This is distinct from $ey(f)$, which denotes a more general white noise process. These residuals are used to compute the variance in the estimate of the cope, also known as a varcope. Firstly, we compute the variance of the residuals:

And transform this to get the variance of the relevant part of the model for this contrast:

$varcope(f)$ now contains the square of the standard error for this contrast. This computation can be costly with large datasets as several matrix multiplications must be performed. However, only the diagonal of the resultant matrix is used for further analysis. Therefore, we speed-up this computation using Einstein summation in numpy (https://numpy.org/doc/stable/reference/generated/numpy.einsum.html) to compute only the multiplications which appear in the final diagonal. More information on this and comparisons to alternative computation methods are described in Supplementary Section D. The spectrum of t-values corresponding to the contrast can then be computed as the ratio of the cope to its standard error:

This GLM t-spectrum quantifies the difference of each cope from zero in statistical terms, incorporating both the parameter estimates and their standard errors. Taken together, the GLM beta-spectrum $B(f)$, cope-spectrum $cope(f)$, and t-spectrum $t(f)$ provide an intuitive description of the frequency spectrum of the input data in terms of the specified regressors and contrasts.

### 2.7 Effect size computation with Cohen’s $F2$

Effect sizes represent the strength of a statistical relationship as a complement to hypothesis-based test statistics like t-tests. The effect size of a single variable within the context of a multivariate regression model can be computed with $Cohen'sF2$ (Cohen, 1988). The spectrum of effect sizes for a single regressor within a GLM-Spectrum model can be computed as (Selya et al., 2012).

in which $B$ denotes the regressor of interest and $A$ the set of all other regressors. $RAB2$ is then the proportion of variance explained by the full model, and $RA2$ is the variance explained by all regressors except $B$. This metric can be computed for every frequency, channel, and regressor within a model to establish a full spectrum of effect sizes.^{2}

Here, we compute effect sizes for each covariate by eliminating it from the full model and computing the value in equation 13. For categorical predictors, the effect size is computed by comparing the full model to a model in which separate categorical regressors are combined into a single constant regressor. As an example, this models the effect of modelling two conditions with their own mean term compared to modelling them both with the same mean.

### 2.8 Visualising effects with model projected spectra

The GLM beta-, cope-, and t-spectra assess parts of the overall time-varying spectrum in relation to the model. As these GLM-Spectra can relate to combinations of effects, their impact on the mean spectrum can be difficult to intuit. We propose computing the model-projected spectra to gain a more immediately intuitive visualisation of effects. This is a visualisation of how the spectrum changes for different values of the regressor of interest. For example, if the GLM-Spectrum of EEG data includes a covariate for pupil size, then its beta-spectrum will describe how the spectrum changes as pupil size expands and contracts. The model-projected spectrum could then be used to visualise the predicted spectrum at a specific pupil size.

The model-projected spectrum is typically calculated for a descriptive range of values in the original regressor. In this paper, we use the largest and smallest values from the regressor of interest. For example, to compute the projected spectrum for the largest and smallest value of a covariate regressor $Rv$ relative to a constant mean term, we use: model-projected spectra

The max and min model-projected spectrum then describes the range of variability in the spectrum that is described by the regressor. Note that this is only a visualisation method and that any apparent differences in these model projected spectra must be confirmed by statistical significance testing, such as non-parametric permutations (see Section 2.10).

### 2.9 Group models for GLM-spectrum

The GLM-Spectrum described thus far is used to describe continuous data recorded from a single session; we refer to this as the “first-level.” We now consider how we can carry out a “group-level” analysis to combine the results across the first-level GLM-Spectra from multiple sessions/subjects using a group-level (or second-level) GLM (Beckmann et al., 2003; Friston et al., 2002; Woolrich et al., 2004). In brief, we create a group-level dependent variable by concatenating the parameter estimates, copes, or varcopes from a set of first-level analyses and use another GLM to model how GLM-Spectra vary over sessions/subjects across the group. For example, here we fit a group-level beta-spectrum using the first-level cope-spectra for N subjects and a group-level design matrix:

where $copesubj1j(f)$ is the $jth$ first-level cope computed for subject $n$ at frequency bin $f$, that is, $copesubj1j(f)=$$cjB^subjn(f)$ where $cj$ is the $jth$ first-level contrast and $B^subjn$ is the first-level contrast regression parameter estimates for subject $n$. Note that $Xgroup$ is the (NxQ) group-level design matrix and the (Qx1) matrix $Bgroup(f)$ is the group-level regression parameters, where Q is the number of group-level regressors. As with the first-level GLM, the error $egroup(f)$ is assumed to be Normal and IID.

As with the first-level analysis, the group-level GLM is fitted using OLS with the MPPI; and is computed separately for each frequency, $f$ (and each channel or voxel—the indexing for which is not shown in the equations), in a mass-univariate manner. In addition, a separate group-level GLM is computed for each first-level cope of interest. As with the first-level GLM, contrasts can be used to ask a range of inference questions from the regression parameter estimates, $B^group(f)$. A resource showing examples of commonly used group-level design matrices and contrasts is available online (https://fsl.fmrib.ox.ac.uk/fsl/fslwiki/GLM).

As shown in the equation above, the simplest group-level model carries forward the cope-spectra from a set of first-level analyses. This can be thought of as a fixed-effects group model in which each observation (first-level result) contributes equally to the group effect and the group-level GLM models the between-session/subject variability. This is the approach taken in this manuscript. Future work can extend it to incorporate the information about the first-level standard errors in the varcope-spectra. Both the cope and varcope information could be carried forward to the group-level and fitting a mixed-effects model such as the FLAME method in fMRI (Woolrich et al., 2004). In practice, this model is challenging to fit as no simple closed-form estimation is available. Another alternative would be to carry the first-level t-statistics to the group-level. Future work can explore a wide range of possibilities for multilevel and mixed-modelling for the GLM-Spectrum (Table 1).

Term . | Definition . |
---|---|

STFT | The short-time Fourier transform of a time series, containing the spectrum computed within sliding window time segments across the data. Also called a time-varying spectrum. |

Design Matrix | A matrix of regressors used to explain variability in observed data with a linear regression model. |

Regressor | A single column of a design matrix containing explanatory variables relating to each individual observation. |

Beta-estimates | A parameter estimate describing the linear relation between a regressor and the observed data. Also known as regression parameter estimates. |

Beta-Spectrum | A vector of parameter estimates for a single regressor across the range of a frequency spectrum. |

Contrast | A planned comparison between one or more parameter estimates. |

Cope | The result of a defined contrast between beta-estimates. |

cope-spectrum | A vector of cope estimates for a single contrast across the range of a frequency spectrum. |

Varcope | The square of the standard error of the cope estimates. |

varcope-spectrum | A vector of estimates for a single contrast across the range of a frequency spectrum. |

t-statistic | The ratio of the departure of the estimated value of a contrast from its hypothesised value to its standard error. |

t-spectrum | A vector of t-statistics for a single contrast across the range of a frequency spectrum. |

Model-projected spectrum | A visualisation of a spectrum as predicted by a fitted GLM set at a particular set of covariate values. |

First-Level GLM | A linear model for a single data recording that models variability across time segments in an STFT with a specified design- and contrast-matrix, resulting in a set of beta-, cope-, and t-spectra. |

Group-Level GLM | A linear model for a group dataset combining a set of first-level GLM-Spectra. A group-level design- and contrast-matrix models variability in first-level beta or t-spectra across datasets, resulting in a set of group-level beta-, cope-, and t-spectra. |

Term . | Definition . |
---|---|

STFT | The short-time Fourier transform of a time series, containing the spectrum computed within sliding window time segments across the data. Also called a time-varying spectrum. |

Design Matrix | A matrix of regressors used to explain variability in observed data with a linear regression model. |

Regressor | A single column of a design matrix containing explanatory variables relating to each individual observation. |

Beta-estimates | A parameter estimate describing the linear relation between a regressor and the observed data. Also known as regression parameter estimates. |

Beta-Spectrum | A vector of parameter estimates for a single regressor across the range of a frequency spectrum. |

Contrast | A planned comparison between one or more parameter estimates. |

Cope | The result of a defined contrast between beta-estimates. |

cope-spectrum | A vector of cope estimates for a single contrast across the range of a frequency spectrum. |

Varcope | The square of the standard error of the cope estimates. |

varcope-spectrum | A vector of estimates for a single contrast across the range of a frequency spectrum. |

t-statistic | The ratio of the departure of the estimated value of a contrast from its hypothesised value to its standard error. |

t-spectrum | A vector of t-statistics for a single contrast across the range of a frequency spectrum. |

Model-projected spectrum | A visualisation of a spectrum as predicted by a fitted GLM set at a particular set of covariate values. |

First-Level GLM | A linear model for a single data recording that models variability across time segments in an STFT with a specified design- and contrast-matrix, resulting in a set of beta-, cope-, and t-spectra. |

Group-Level GLM | A linear model for a group dataset combining a set of first-level GLM-Spectra. A group-level design- and contrast-matrix models variability in first-level beta or t-spectra across datasets, resulting in a set of group-level beta-, cope-, and t-spectra. |

### 2.10 Non-parametric permutations for group GLM-spectra

Null-hypothesis testing for a given contrast can be carried out with non-parametric permutations (Nichols & Holmes, 2001; Winkler et al., 2014). A null distribution of observed statistics is derived by recomputing the GLM after manipulating the design matrix in line with the null hypothesis. The observed group average is then compared to this null distribution and is “significant” if it exceeds a pre-set critical threshold, such as the 95th percentile of the null distribution. Here, we perform non-parametric permutation using “sign-flipping” for categorical regressors and using “row-shuffling” for parametrically varying regressors.

In all cases, we permute the columns of the design matrix that are directly related to the relevant contrast whilst the remaining regressors are fixed (Draper & Stoneman, 1966; Winkler et al., 2014). The Draper-Stoneman approach can produce erratic results for small sample sizes and may be out-performed by more generally robust methods (Winkler et al., 2014), such as O’Gorman (2005) or Freedman and Lane (1983).

Non-parametric permutation testing could, in principle, be carried out to assess the results of a first-level GLM-Spectrum. In contrast to a group-level analysis, the first-level permutations would only assess whether a particular effect observed within a dataset could be expected to generalise to a wider sample of possible data observations from the same source. Moreover, the time segments in a first-level GLM-Spectrum are likely to exhibit strong autocorrelation (see Section 2.4) which would need to be accounted for before any permutation testing could provide a valid result (Friston et al., 2000; Woolrich et al., 2001).

### 2.11 Software implementation and dependencies

The analyses in this paper were carried out in Python 3.9 with core dependencies of numpy (Harris et al., 2020), scipy (Virtanen et al., 2020), and Matplotlib (Hunter, 2007). MNE python (Gramfort, 2013) was used for EEG/MEG data processing with OSL (https://github.com/OHBA-analysis/osl) batch processing tools (Quinn et al., 2022). Python-meegkit (https://github.com/nbara/python-meegkit) was used for the “zapline” line noise removal algorithm (de Cheveigné, 2020). The spectrum analyses further depend on the Spectrum Analysis in Linear Systems toolbox (Quinn & Hymers, 2020) and glmtools (https://pypi.org/project/glmtools/). All codes used to run analyses and generate the figures in this paper are available online (https://github.com/OHBA-analysis/Quinn2022_GLMSpectrum).

### 2.12 The LEMON dataset

### 2.12.1 Participants and ethics

Resting-state EEG recordings from 191 individuals of an open-source dataset were analysed. The details of the original study including participant recruitment, experimental proceedures, and ethical approval can be found in the original publication (Babayan et al., 2019). The study was carried out in accordance with the Declaration of Helsinki, and the study protocol was approved by the ethics committee at the medical faculty of the University of Leipzig (reference number 154/13-ff) (Babayan et al., 2019).

#### 2.12.2 EEG preprocessing

All data pre-processing was carried out using MNE-Python and OSL using the OSL batch pre-processing tools. The raw data for each subject comprise a resting-state EEG recording from a 62-channel (61 EEG and 1 EOG) using a BrainAmp MR plus amplifier. The channels were in a 10-10 layout and referenced to FCz. Sixteen minutes of data were recorded in one-minute blocks alternating between eyes-closed and eyes-open resting-state. Data were acquired with a bandpass filter between 0.015 Hz and 1 kHz at a sampling frequency of 2500 Hz. The remaining acquisition details are reported in Babayan et al. (2019).

The raw data were first converted from Brainvision files into MNE-Python Raw data objects. The continuous data were bandpass filtered between 0.25 Hz and 125 Hz using an order-5 Butterworth filter. Line noise was suppressed using a spatial filter following the “zapline” algorithm (de Cheveigné, 2020) as implemented in python-meegkit. Bad channels were automatically identified using the generalised-extreme studentized deviate (G-ESD; Rosner, 1983) routine to identify outliers in the distribution of variance per channel over time. The data were then resampled to 250 Hz to reduce space on-disk and ease subsequent computations. Independent Component Analysis (ICA) denoising was carried out using a 30 component FastICA decomposition (Hyvarinen, 1999) on the EEG channels. This decomposition explained an average of 99.2% of variance in the sensor data across datasets. Independent components that contained non-neuronal signals such as blinks were automatically identified by correlation with the simultaneous V-EOG channel. ICA components linked to saccades were identified by correlation with a surrogate H-EOG channel, that is, the difference between channels F7 and F8. Between 2 and 7 “artefactual” components were identified in each dataset, with an average of 2.66 across all datasets. The two ICs that correlated strongest with the V-EOG and H-EOG channels were separately retained for later use in the GLM design matrix.

The continuous sensor data were then reconstructed without the influence of the artefactual V-EOG and H-EOG components. Bad segments were identified by segmenting the ICA-cleaned data into arbitrary 2-second chunks (distinct from the STFT time segments) and using the G-ESD algorithm to identify outlier (bad) samples with high variance across channels. An average of 31 seconds of data (minimum 6 seconds and maximum 114 seconds) were marked as bad in this step. This procedure is biased towards low-frequency artefacts due to the 1/f shape of electrophysiological recordings. Therefore, to identify bad segments with high-frequency content, the same procedure was repeated on the temporal derivative of the ICA-cleaned data. An average of 27 seconds of data (minimum 2 seconds, maximum 109 seconds) were marked as bad when using the differential of the data.

To retain consistent dimensionality across the group, any bad channels were interpolated using a spherical spline interpolation (Perrin et al., 1989) as implemented in MNE-Python. Finally, the spatial gradient between each channel and its distance from the reference sensor (FCz), is attenuated by computing the surface Laplacian (or current source density) of the sensor data. The surface Laplacian data is reference free and has sharper spatial topographies than the raw EEG; though this is a complex computation that is dependent on several hyperparameters, and which may reduce sensitivity to deeper sources (Kayser & Tenke, 2015).

#### 2.12.3 First-level GLM-spectrum

A “first-level” analysis models the data within each individual participant’s EEG data. The STFT is computed for each dataset using a 2-second segment length, a 1-second overlap between segments, and a Hanning taper. The 2-second segment length at the sample rate of 250 Hz gives a resolution of around 2 frequency bins per unit Hertz in the resulting spectrum. The short-time magnitude spectrum is computed from the complex valued STFT and the frequency bins ranging between 0.1 Hz and 100 Hz taken forward as the dependent variable in the first-level GLM-Spectrum for that dataset. The GLM design matrix is specified with six regressors (Fig. 1). Two binary regressors model intercept terms for each of the eyes-open and eyes-closed time segments. The third regressor is a z-transformed covariate describing a linear trend over time. This regressor is included to model potential “time on task” effects over the course of the recording. Previous literature has reported that both alpha power and alpha frequency may change over time within a single EEG recording (Benwell et al., 2019). A non-zero mean regressors for bad segments is computed from the sum of the number of “bad” samples within each STFT time segment. Finally, two further non-zero mean regressors are computed from absolute value of the V-EOG and H-EOG independent component time-courses.

A range of contrasts are specified to quantify critical hypothesis tests (see contrast table in Fig. 1). The overall mean is modelled by a contrast summing the eyes-open and eyes-closed regressors together weighted by the proportion of ones in each regressor (Contrast 1, Fig. 1). A t-test between the spectrum in the eyes-open and eye-closed conditions is specified with a differential contrast (weighted [1, -1], in the direction of eyes-open minus eyes-closed; Contrast 5; Fig. 1). Finally, separate one-sample tests are specified for each covariate with contrasts containing a single 1 for a given regressor (Contrasts 3 to 8; Fig. 1). The model parameters were estimated using the MPPI, and no statistical assessment was carried out at the first-level.

#### 2.12.4 Structural MRI processing

Individual anatomical details were extracted from T-weighted structural MRI scans (Babayan et al., 2019). All images were processed using the FMRIB Software Library (Woolrich et al., 2009). Images were reoriented to standard Montreal Neurological Institute (MNI) space, cropped, and bias-field corrected. FMRIB’s Linear Registration Tool (FLIRT; Greve & Fischl, 2009; Jenkinson & Smith, 2001; Jenkinson et al. 2002) was used to register to standard space before brain extraction was performed using BET (Smith, 2002). Brain images were segmented into different tissue types (grey matter, white matter, and CSF) using FMRIB’s Automated Segmentation Tool (FAST; Zhang et al., 2001). The voxel count for each tissue type was extracted and normalised by the individual’s total brain volume (also computed by FAST) to create a percentage. The total brain volume and individual percentage of grey matter were carried forward as group-level covariates in the GLM-Spectrum.

#### 2.12.5 Group-level GLM-spectrum

We now carry out a “group-level” analysis to combine the first-level results and describe between-subject variability with another GLM. As described in Section 2.13.2, the cope-spectra of each first-level GLM were used as the dependent variable in the group-level GLM. The group-level design matrix contained categorical regressors coding the age group of each participant and covariates corresponding to variability in participant sex, head size, and relative grey matter volume (Fig. 2). We have modelled age as two distinct, categorical regressors as the LEMON dataset contains participants recruited from either a young (20-40 years old) or an old (60-80 years old) group. Age could equally be included as a single parametrically varying regressor in a dataset that recruited from across a broad age range. A contrast was defined to estimate the mean across all participants and a second to estimate the difference between young and old participants. Finally, one-sample t-tests were specified for each covariate to test whether the regression coefficient significantly differed from zero.

The group-level design matrix was used to model each of the first-level contrasts. The group model parameter estimates were computed using the MPPI. Statistical significance in the group-level t-spectra was assessed using cluster-based non-parametric permutations using sign-flipping or row-shuffle permutations. A cluster forming threshold of p = 0.001 was used (equivalent to t(205) = 3.34) across all tests. A spatial extent threshold of 1 was set to ensure that any cluster has at least 2 adjacent points exceeding the cluster forming threshold.

## 3 Results

### 3.1 First-level covariate spectra on a central EEG channel

Figure 2 summarises the first-level GLM-Spectrum analysis of a single channel (Pz) from an exemplar resting-state EEG recording. The pre-processed EEG time series (Fig. 3A) was split into 2 second time segments with a 50% overlap (Fig. 3B), modified by a Hann window (Fig. 3C) and transformed into the frequency domain using a Short-Time Fourier Transform (STFT; Fig. 3D). Each column of this STFT contains the time-course of the magnitude at each frequency and constitutes the dependent variable in a first-level GLM. The first-level GLM (see methods Section 2.13.2) models this variability over time in relation to the resting conditions, artefacts detected in the data, and the electrooculogram (EOG). The final first-level design matrix had six regressors (Fig. 3E) modelling the two resting-state conditions, a linear trend over time, and three potential dynamic confounds. The full design matrix and contrast specification for the entire run can be seen in Figure 1.

The first-level GLM was fitted separately for each frequency bin using a standard ordinary least squares routine. The average magnitude spectra for the eyes-open and eyes-closed rest periods are quantified by two cope-spectra (Fig. 3F) specified by first-level contrasts 3 and 4 (Fig. 1). Both cope-spectra showed a 1/f-type structure and a prominent alpha peak around 9 Hz. The eyes-open > eyes-closed contrast (the regressor was coded to have ones for eyes-open and minus ones for eyes-closed; specified in contrast 5 in Fig. 1) had negative values in the cope-spectrum, indicating that spectral magnitude was larger in the eyes-closed condition across a range of frequencies, peaking around the alpha range (Fig. 3G – top panel). The square of the standard error of the estimates is shown in the varcope-spectrum (Fig. 3G – top panel) and indicates where the estimate of the mean was least certain. This roughly followed the shape of the spectra in Figure 3F, showing a clear alpha peak and a weak 1/f trend. This is an example of a close relationship between cope and varcope estimates that can lead to large effects in the beta-spectrum being substantially less prominent in the t-spectrum. The final t-spectrum contained a full spectrum of t-values for the contrast between the two resting conditions (Fig. 3H). The large magnitude, negative t-values in alpha and surrounding frequencies qualitatively indicated a greater magnitude in the eyes-closed condition. In sum, this shows a full spectrum perspective on the “alpha reactivity” or “alpha blocking” effect (Adrian & Matthews, 1934) that is most commonly assessed within a-priori frequency bands (Babiloni et al., 2011; Wan et al., 2018).

### 3.2 First-level spectral analysis on whole head EEG

So far, GLM-Spectrum method has been applied to univariate data (i.e., single channel EEG data), but it can be readily extended to a full multichannel dataset. To model the GLM-Spectrum across channels, a separate GLM using the same design matrix was fitted to each channel and frequency bin. This provides a description of spectral effects over frequency and space. Accordingly, we extended the resting-state model to the 61-channel whole head EEG recording. The design matrix and contrast specification were the same as the single channel analysis.

The beta-spectrum computed from the condition regressors showed the familiar 1/f slope and prominent occipital alpha features in both resting-state conditions. The GLM analysis was identical to the single channel results (Fig. 3) but now can include spatial distributions alongside the frequency spectrum. Qualitatively, the eyes-open condition (Fig. 4A) had a smaller alpha peak than the eyes-closed condition (Fig. 4B). Both conditions had a similar topography in this subject. The t-spectrum of the contrast between eyes-open and eyes-closed rest showed a large, negative effect peaking around the alpha range (Fig. 4C) as seen in the single channel example (Fig. 3G). This difference had a spatial maximum in posterior-central regions, replicating the occipital-parietal location of the alpha reactivity effect widely reported in the literature (Babiloni et al., 2011; Wan et al., 2018).

The linear trend showed substantial t-values around the alpha range, though its model projected spectra suggest that this is a subtle effect (Fig. 4D). The bad segment regressor effect (Fig. 4E) peaks at relatively low (<4 Hz) and high (>25 Hz) frequency ranges. The size of this effect is visualised more intuitively in the model projected spectrum (Fig. 3F— bottom panel) which showed the substantial increase in both a low- and high-frequency range. Finally, the V-EOG (Fig. 4F) and H-EOG (Fig. 4G) covariates both showed large t-values in relatively low frequencies (around 1 Hz). The V-EOG had a large negative effect around the alpha range, suggesting that time segments with high V-EOG activity were associated with lower alpha magnitude. In contrast, the H-EOG showed an additional positive effect in high frequencies, suggesting that segments with high H-EOG activity are associated with larger high frequency spectral magnitude. In particular, the complex pattern of effects in the V-EOG t-spectrum manifests as segments with high V-EOG activity showing a decrease in alpha magnitude and an increase in alpha frequency.

### 3.3 Covariate effects are highly variable across datasets

Next, we build on the single subject exemplar result by exploring the effect of the covariate and confound regressors across 206 EEG recordings from the LEMON dataset. Firstly, we show a qualitative description of the model R-squared values and related effect sizes for each predictor variable. The group averages here are intended as illustrations and are not supported by inferential statistics. Firstly, the GLM-Spectrum was able to describe between 60% and 70% of the variance in the STFT, averaged across all participants and channels (Fig. 5A). This value was highly variable across participants, particularly above 50 Hz where the R-square values ranges between 20% and 80%. Secondly, Cohen’s $F2$ statistic was used to compute the marginal effect size of each predictor within the model (see Section 2.7 for details). Modelling the two resting conditions separately, as opposed to using a single average, results in a group average effect size which peaks around the alpha frequency, though the effect in individual datasets was highly variable (Fig. 5B). The effect size of the four covariates has a low group average but is, again, highly variable in individual datasets (Fig. 5C). This variability was particularly prominent at higher frequencies and indicates the adaptive effects of the covariate regressors. These predictors have strong associations with the STFT in some individuals but not others.

To further illustrate the effect of including the continuous covariate predictors, the frequency spectrum was computed for both the full model and a reduced model containing only the two condition terms (i.e., eyes-open and eyes-closed), excluding other covariates and confounds (Fig. 5D). The inclusion of covariate regressors has a different pattern of effect across frequencies for each dataset. The difference between the full and reduced model is relatively small for datasets i and v, indicating that the effect of the covariates was minimal for these recordings. In contrast, datasets ii and iv show moderate differences in low and high frequencies whilst iii shows a substantial difference between the full and reduced model.

### 3.4 Group-level design matrix and contrasts

The GLM-Spectrum framework can be extended to group analyses by carrying a set of first-level results to a second-level GLM (See methods Section 2.9). This group-level analysis models between-subject variability across independent first-level GLM-Spectra. These multilevel, hierarchical models are well established in neuroimaging (Beckmann et al., 2003; Friston, 2007; Friston et al., 2002; Woolrich et al., 2004), and the existing theory applies to the GLM-Spectrum.

The group-level GLM was fitted to the first-level cope-spectra (Fig. 6A) across all datasets separately for each channel, frequency bin, and first-level contrast (see methods Section 2.13.4). The group-level design matrix contained two condition regressors modelling the mean across subjects for younger and older participants separately and three z-transformed parametric covariates modelling between-subject variability in sex, total brain volume, and relative grey matter volume (Fig. 6B). Two group contrasts were defined alongside the main effects. One overall average modelled the sum of the young and old groups (Contrast 1; Fig. 6B) weighted by the number of participants in each group. A second contrast quantified the linear group difference (Contrast 2; Fig. 6B). Finally, a set of main effect contrasts were also defined for each regressor (Contrasts 3 to 7; Fig. 6B). The final fitted model contains a group-level beta-spectrum that describes the linear effect of a group regressor across separate datasets. The group-level GLM returns beta-spectra for the overall mean spectrum (Fig. 6C) as well as contrasts and main effects (Fig. 6D). At the group-level, the error bars now indicate the standard error of the fitted mean across participants (rather than across STFT time windows, as in the first-level).

### 3.5 Group effects of age and eyes-open versus eyes-closed rest

Next, we explored how the GLM-Spectrum varies across resting conditions and participant age. Therefore, two main-effect contrasts and their interaction were explored. Two-tailed, non-parametric, cluster (with clusters formed over frequencies and sensors) permutation tests were used to establish statistical significance for all group-level analyses.

We first computed the group average of the within-subject difference between eyes-open > eyes-closed rest. Specifically, we computed the *Mean* group-level contrast (Contrast 1; Fig. 6B) on the *eyes-open > eyes-closed* first-level contrast (Contrast 2; Fig. 1). Non-parametric cluster permutation testing indicated two significant clusters (Fig. 7A) that together cover the whole frequency range investigated here. The first cluster showed a negative effect with greater magnitude in the eyes-closed condition. This cluster spanned the lower frequency range (approximately 30 Hz and lower) across almost all channels, though the effect peaked around the alpha range in occipital channels. The posterior peak within this cluster matched the expected occipito-parietal source of the alpha reactivity effect (Wan et al., 2018). The second cluster showed a positive effect, indicating higher magnitude in the eyes-open condition. This cluster spanned the higher frequency range (10-100 Hz) across all channels with the largest effect in bilateral frontal regions. This cluster may partially reflect residual eye movements that have not been accounted for by ICA or the first-level artefact regression. The t-statistics in a low-frequency range in frontal sensors are strongly modulated by the inclusion or exclusion of the EOG-based confound regressors (see Supplementary Section G).

Further, we computed the difference in the time-averaged first-level spectra between the younger and older adults. This corresponds to computing the *Young > Old* group-level contrast (Contrast 2; Fig. 6B) on the *Overall Mean* first-level contrast’s cope-spectrum (Contrast 1; Fig. 1). Non-parametric permutations on this showed three significant clusters. The first cluster indicated a positive effect in which younger participants had larger power than older participants. The cluster covered low frequencies (<8 Hz) and all channels peaking in frontal and occipital channels (Fig. 7B). The direction of this effect is consistent with previously reported decreases in delta and theta power in older adults (Klimesch, 1999). The second cluster indicates greater beta-frequency magnitude for older adults. The cluster spans between 15 Hz and 30 Hz and peaks in bilateral central sensors. This finding replicates literature showing higher beta power in older adults (Xifra-Porxas et al., 2019). Finally, the third cluster indicates greater magnitude for older participants in a cluster between 35 Hz and 100 Hz that peaks in frontal sensors. In addition, the change in overall spectral shape qualitatively supports indications that older adults have a flatter 1/f slope in the EEG spectrum (Merkin et al., 2022; Voytek et al., 2015), though we did not explicitly quantify 1/f slope here.

No significant cluster for an age difference was identified in the alpha range, though individual t-statistics reach around 5. This null effect may relate to the choice of sensor normalisation during pre-processing (Klimesch, 1999). In addition, the choice of a 0.5 Hz to 100 Hz frequency range may be too wide to assess any relatively narrow band alpha changes in the presence of large broadband effects in higher frequencies.

Finally, we explored whether the within-subject difference in eyes-open and eyes-closed resting-state changed between the younger and older adults. Specifically, this corresponds to the *Young > Old* group-level contrast (Contrast 2; Fig. 6B) computed on the *eyes-open > eyes-closed* first-level contrast (Contrast 5; Fig. 1). Non-parametric permutation testing identified a single significant cluster with a negative t-statistic (Fig. 7C). This indicates the presence of an interaction effect in which older adults show a larger difference in the eyes-open > eyes-closed contrast. Interestingly, there is no indication of an interaction effect in the low to mid alpha range. The interaction can be qualitatively summarised by plotting the beta-spectrum separately for each of the underlying four mean levels (Fig. 7D).

### 3.6 Group average of first-level cope-spectra

We explored the group averages of each first-level covariate regressor. The first-level linear trend regressor is expected to be sensitive to slow drifts throughout the recording. The t-spectrum of the *Mean* group-level contrast (Contrast 1; Fig. 6B) computed on the *Linear Trend* first-level contrast (Contrast 5; Fig. 1) showed four significant clusters (Fig. 8A) with a mix of increases and decreases across space and frequency. Broadly these clusters showed a decrease in magnitude over time in both low frequencies (0.5-4 Hz) and high frequencies (20-100 Hz) peaking in occipital posterior channels. In contrast, an increase in magnitude over time was found in a low theta/alpha range (3-9 Hz) and high frequencies (15-100 Hz) peaking in frontal sensors. The presence of bad segments was associated with a single, strong positive effect indicating increased magnitude in the EEG during marked bad segments. The *Mean* group-level contrast (Contrast 1; Fig. 6B) computed on the *Bad Segments* first-level contrast (Contrast 6; 1) showed a strong positive effect across all frequencies and across all channels (Fig. 8B).

Finally, we computed the *Mean* group-level contrast (Contrast 1; Fig. 6B) computed on the *V-EOG* and *H-EOG* first-level contrasts separately (Contrast 7 and 8; Fig. 1). Three clusters indicated that increased variance in the V-EOG is associated with both increases and decreases in magnitude of the EEG (Fig. 8C). One positive effect impacted all sensors and all frequencies (though not always simultaneously) showing the wide-ranging impact of eye movements on the spectrum. Two clusters showed negative effects in the alpha range (around 9 Hz) and its harmonic (18 Hz) in posterior sensors. This effect indicates a decrease in alpha when eye movements are largest. Increased variance in H-EOG time series was associated with increased magnitude across nearly all sensors and frequencies, as summarised in a single, positive statistical cluster (Fig. 8D).

These effects demonstrate that the first-level covariates are associated with consistent group-level effects. However, the results must be interpreted in the context of the variability in first-level effect sizes (Fig. 5). Each first-level covariate effect was highly variable in the individual datasets, ranging from no effect in some participants to a covariate effect that exceeded the eyes-open > eyes-closed difference in others (Fig. 5B, C). As such, we might not expect to see one or more of these effects in a given single dataset, though the group effects are strong.

### 3.7 Group-level covariate effects

The results so far have combined and contrasted group averages of the first-level results. One possible group-level confound for the age contrast is that the LEMON dataset contains a different number of male and female participants who are not perfectly balanced across age groups. A separate group regressor indicating the reported sex of each participant was included to model between-subject variability relating to this factor, effectively partialling it out from the main age effect of interest in each group contrast. We visualise the overall effect of participant-reported sex on the first-level cope-spectra (averaged across eyes-open and eyes-closed resting-state); this corresponds to the *Sex* group-level contrast (Contrast 5; Fig. 6B) computed on the *Overall Mean* first-level contrast (Contrast 1; Fig. 1). A single significant cluster identified stronger spectral magnitude in female participants between 1 and 48 Hz, peaking around the alpha and beta frequency ranges (Fig. 9A). Increased power in females relative to males has been previously reported in the EEG literature (Aurlien et al., 2004; Zibrandtsen and Kjaer, 2021). Further work is required to distinguish whether this is a true neuronal difference or reflects simpler anatomical differences such as skull thickness.

Between-subject variability associated with two anatomical covariates was modelled at the group-level. The total brain volume of each participant and the proportion of grey matter were relative to total brain volume. As before, the inclusion of these regressors ensures that the reported group effects are not biased by these anatomical factors. We separately computed the *TotalBrainVol* and *GrayMatterVol* group-level contrasts (Contrasts 6 and 7; Fig. 6B) on the *Overall Mean* first-level contrast (Contrast 1; Fig. 1). Non-parametric permutation testing did not identify any significant effects for either overall brain volume or relative grey matter volume (Fig. 9B, C). Though these are null effects, the inclusion of these regressors in the group model enables a more refined interpretation of the other results. The inclusion of these confounds means that any variance they might explain can not be attributed to one of the other regressors. Specifically, this increases our confidence that the differences between younger and older adults are not caused by correlated variability in head size or relative grey matter volume.

## 4 Discussion

We have outlined the theory behind the GLM-Spectrum and provided a tutorial overview of its application. We illustrated a practical application for the use of the GLM-Spectrum, using an open EEG dataset to simultaneously quantify and contrast the spectrum of two alternating resting-state conditions whilst regressing out the effect of artefacts including bad segments and eye movements. Both artefact types were associated with a strong group effect but diverse effects at the first-level. Of particular interest is the alpha peak in the spectrum of regression coefficients of the V-EOG regressor. This is likely a true neuronal effect linked to blinking that is not removed by ICA but can be explicitly modelled by the GLM-Spectrum. Finally, we extended our analysis to the group-level and explored the spectral differences between older and younger adults, whilst accounting for the effects of sex, brain volume, and relative grey matter volume. Older adults have lower magnitude in the theta range (3-7 Hz) and higher magnitude in the beta and gamma ranges (>15 Hz). A range of within- and between-subject effects were explored and, crucially, we showed that the reported age effect is robust to differences in participant sex, head size, or relative grey matter volume.

### 4.1 A comprehensive framework for spectrum analysis

The GLM-Spectrum is a practical combination of two well-established methodologies that modernises the statistics underlying the time-averaged periodogram, a long-standing and standard spectral estimation method (Bartlett, 1948, 1950; Welch, 1967). Specifically, we utilise multilevel general linear modelling (Friston, 2007; Woolrich et al., 2004), non-parametric permutation testing (Nichols & Holmes, 2001; Winkler et al., 2014), contrast coding, and confound regression to extend the scope of classical time-averaged spectrum estimators.

This approach is generalisable to a huge range of analyses. In principle, the GLM-Spectrum could be used in place of Welch’s periodogram or other time-averaged spectrum estimate in any analysis pipeline. A very simple GLM-Spectrum analysis could be configured to be exactly equivalent to these standard approaches. In the simplest case, without first-level covariates, the GLM-Spectrum provides a formal framework for multivariate whole head group analysis of power spectra. Moreover, GLM-Spectrum allows for linear denoising of spectrum estimates in datasets where simultaneous recordings of potential artefact sources are available, or artefact time courses can be derived. In addition, covariate effects and contrasts can be readily defined to quickly compute spectra associated with specific external dynamics. For example, an early application of this method has used a GLM-Spectrum to compute power spectra associated with dynamic whole-brain functional networks in MEG (Gohil et al., 2022).

### 4.2 Covariate and confound regression for spectrum analysis

The GLM-Spectrum can characterise spectral changes associated with covariates and potential artefact sources. Standard ICA denoising removes artefacts that share the time-course of the artefact channel. In contrast, confound regression is exploratory across the spectrum. Denoising can be applied to any frequency band with dynamics associated with the segmented artefact time-course irrespective of the original spectrum. For example, the V-EOG blink artefact has a classic low-frequency response that can be attenuated by removing correlated independent components. However, eye blinks are also associated with relatively prolonged changes in alpha and beta power (Liu et al., 2020). In the context of this paper, we consider this to be an “indirect” artefact; it is spatially and spectrally separated from the artefact source and is unlikely to have arisen from volume conduction. The GLM-Spectrum can detect these differences and remove their effect from the overall mean. As such, it could not have been detected or removed by ICA denoising. In another context, this might form the contrast of interest, but in this case, we apply confound regression to minimise the effect of eye movements and blinks on the eyes-open > eyes-closed condition contrast.

### 4.3 Limitations of the GLM-spectrum model

As outlined in the main text, the parameters of a model are only valid if the underlying assumptions are met. The GLM has several relevant assumptions for the spectrum analysis presented here, particularly at the first-level. In detail, the GLM assumes that the residuals of the model are independently and identically distributed. The presence of any temporal autocorrelation in the residuals indicates that the parameter estimates must be interpreted with caution as this assumption has been violated. Future work can account for this shortcoming by building on similar work in fMRI.

The covariate and confound regressors in a GLM-Spectrum model dynamics over time are found to be in a highly simplified sense. This approach is appropriate to quantify relatively slow dynamics, on timescales of seconds, in the context of a spectrum estimator that already utilises sliding time segments for spectrum estimation. The sliding windows are tuned for spectral resolution. They have fixed and arbitrary length and may not accurately reflect the true timescale of dynamics in the covariate variables. As such, limited conclusions about underlying dynamics can be made from a GLM-Spectrum. We can only say that a dynamic relationship existed at the specific timescale selected for spectrum estimation. If precise temporal dynamics are of interest, a more advanced, window-free method such as the Hidden Markov Model (Quinn et al., 2018, 2019; Vidaurre et al., 2016, 2018) or Empirical Mode Decomposition (Huang et al., 1998) might be more appropriate.

### 4.4 Conclusion

The GLM-Spectrum builds on methodologies that are all well established in the field. The novelty of this work is to bring modern statistics and classical spectrum estimation together into a single framework and to thoroughly explore the theoretical, computational, and practical challenges of its use. The result is an approach for spectrum analysis across the whole head and frequency range with the flexibility to generalise to a huge variety of research and engineering questions.

## Data and Code Availability

The data analysed in this paper are resting-state EEG recordings from an open-source dataset (Babayan et al., 2019) (https://fcon_1000.projects.nitrc.org/indi/retro/MPI_LEMON.html) covered by the Open Data Commons Public Domain Dedication and License (PDDL) v1.0 licence (https://opendatacommons.org/licenses/pddl/1-0/). Raw EEG data were downloaded in BIDS format from the Gesellschaft für wissenschaftliche Datenverarbeitung mbH Göttingen (GWDG) FTP server (https://ftp.gwdg.de/pub/misc/MPI-Leipzig_Mind-Brain-Body-LEMON/EEG_MPILMBB_LEMON/EEG_Raw_BIDS_ID/). All code used to run analyses and generate the figures in this paper are available online via github (https://github.com/OHBA-analysis/Quinn2022_GLMSpectrum).

## Author Contributions

A.J.Q.: Conceptualisation, Methodology, Software, Formal analysis, Data Curation, Project administration, Writing—Original Draft, Writing—Review & Editing, and Visualisation. L.Z.A.: Conceptualisation, Methodology, and Writing—Review & Editing. C.G.: Conceptualisation, Methodology, and Writing—Review & Editing. O.K.: Methodology, Writing—Review & Editing. J.P.: Formal analysis, Writing—Review & Editing. C.Z.: Formal analysis, Writing—Review & Editing. A.C.N.: Writing—Review & Editing, Supervision. M.W.W.: Conceptualisation, Methodology, Writing—Original Draft, Writing—Review & Editing, and Supervision.

## Funding & Acknowledgments

This project was supported by the Medical Research Council (RG94383/RG89702) and by the NIHR Oxford Health Biomedical Research Centre. The Wellcome Centre for Integrative Neuroimaging is supported by core funding from the Wellcome Trust (203139/Z/16/Z). A.C.N. is supported by the Wellcome Trust (104571/Z/14/Z) and James S. McDonnell Foundation (220020448). M.W.W. is supported by the Wellcome Trust (106183/Z/14/Z; 215573/Z/19/Z). A.C.N. and M.W.W. are further supported by an EU European Training Network grant (euSSN; 860563). This research was funded in whole, or in part, by the Wellcome Trust. For the purpose of open access, the author has applied a CC-BY public copyright licence to any Author Accepted Manuscript version arising from this submission. The computations described in this paper were performed using the University of Birmingham’s BlueBEAR HPC service, which provides a High Performance Computing service to the University’s research community. See http://www.birmingham.ac.uk/bear for more details.

## Declaration of Competing Interest

The authors have no competing interests.

## Supplementary Material

Supplementary material for this article is available with the online version here: https://doi.org/10.1162/imag_a_00082

## References

^{1}

The term “beta” has different uses in the fields of statistics/linear-modelling and neuronal oscillations. In this work, we refer to a “beta-spectrum” in the statistical sense (a spectrum of linear model parameter estimates) rather than the neuronal sense (oscillatory activity within a 15Hz to 30Hz frequency range).

^{2}

Cohen’s original work suggests that values of $F2\u22650.02$, $F2\u22650.15$, and $F2\u22650.35$ represent small, medium, and large effects respectively (Cohen, 1988; Selya et al., 2012). These labels should be interpreted with caution and only used as informal guidelines.