Neurosurgical procedures that enable direct brain recordings in awake patients offer unique opportunities to explore the neurophysiology of human speech. The scarcity of these opportunities and the altruism of participating patients compel us to apply the highest rigor to signal analysis. Intracranial electroencephalography (iEEG) signals recorded during overt speech can contain a speech artifact that tracks the fundamental frequency (F0) of the participant’s voice, involving the same high-gamma frequencies that are modulated during speech production and perception. To address this artifact, we developed a spatial-filtering approach to identify and remove acoustic-induced contaminations of the recorded signal. We found that traditional reference schemes jeopardized signal quality, whereas our data-driven method denoised the recordings while preserving underlying neural activity.

Human intracranial recordings, that is, in-vivo electrophysiological signals acquired during specific neurosurgical treatments such as focal epilepsy and deep brain stimulation (DBS), have enabled the study of neural responses with high temporal and spatial resolution, in both surface and deep structures of the human brain during behavioral tasks (Tekriwal et al., 2018). The study of speech motor control especially benefits from awake intraoperative recordings during which local field potentials (LFP) and single unit activity of subcortical targets can be simultaneously acquired during speech production (Lipski et al., 2018; Ma et al., 2022).

Intracranial EEG (iEEG) provides much better spatial specificity and higher signal-to-noise-ratio (SNR) than surface EEG. However, iEEG can also suffer from eye movement, muscle, and even cardiac artifacts (Mercier et al., 2022). To develop invasive brain–computer interfaces (BCI) for speech prostheses, the neural activity in the high-gamma band (60 – 200 Hz) is typically used for speech decoders (Angrick et al., 2019; Anumanchipalli et al., 2019; Moses et al., 2021; Proix et al., 2022). Recently, we and others have shown that electrocorticography recordings may contain artifacts associated with the mechanical vibrations produced by the participant’s voice or sounds played by a loudspeaker (Bush et al., 2022; Roussel et al., 2020). Although not always present, the acoustic-induced artifact can appear in different setups, recording modalities (e.g., ECoG, µECoG, and Utah arrays), and experimental conditions, as shown by Roussel et al. who detected the artifact in data from three out of five centers analyzed (Roussel et al., 2020). We have also detected the artifact in local field potentials recorded from DBS leads or stereotactic mapping electrodes (Bush et al., 2022). Interestingly, the artifact has not yet been observed in stereo EEG data (sEEG). For overt speech experiments, this speech artifact shares spectral characteristics with the produced audio signal, being locked at the fundamental frequency (F0) of the participant’s voice (Bush et al., 2022). The overlap between typical human F0 (between 70 and 240 Hz) and high-gamma activity (60 to 250 Hz) imposes the need to account for this speech artifact to study the brain activity associated with speech production.

As shown before, the acoustic artifact is introduced along the acquisition chain, where the mechanical vibrations of the acoustic signal are translated into voltage (Roussel et al., 2020). Passive electrical components can exhibit an electrical response when stressed physically, a phenomenon referred to as the microphonic effect (Nelson & Davidson, 2002). This effect can be exacerbated in the case of speech tasks performed during stereotactic neurosurgery, at which the patient’s head is fixed to a stereotactic frame (Fig. 1a). This frame may act as a resonance system that exacerbates speech-induced vibrations originating in the larynx and traveling through the head and skull. Speech-induced vibrations, which look like a distorted version of the speech audio, can affect the electrodes and acquisition chain. In affected electrodes, the time–frequency representation shows narrow-band components that resemble the audio spectrogram rather than true neural activity, as shown in Figure 1a. This narrow-band component tracks the fundamental frequency of the participant’s voice (Fig. 1b). Acoustic-induced vibration artifacts can be detected by measuring the coherence value between the speech acoustic signal and neural recordings in the high-gamma frequency band (Bush et al., 2022) (Fig. 1c). This coherence value varies largely between and within patients, with no clear spatial pattern across recording channels (Bush et al., 2022). This indicates that the speech artifact is a channel-specific type of noise, and thus traditional re-reference schemes may jeopardize the quality of the neural recordings (Liu et al., 2015).

Fig. 1.

Speech-induced speech artifact model assumptions. (a) Schematic representation of how speech-induced mechanical vibrations can affect neural recordings during stereotactic frame surgery. The frame attached to the patient’s head acts as a resonance system with the skull, transmitting and distorting speech-induced vibrations that can affect the neural recordings’ acquisition chain. An artifactual electrode has a suspicious narrow-band component that resembles the audio time–frequency information, as illustrated in the average spectrogram plots calculated across speech-produced epochs. (b) The speech artifact tracks the fundamental frequency (F0) of the participant’s voice. (c) The coherence between the audio signal and the neural recordings across trials (<ϕ>) can assess the level of speech artifact contamination in each electrode. During speech production, many channels present artifact (lie outside the dashed circumference). (d) The recordings, at the amplifier level, can be thought as a linear mixing between the speech artifact and brain-related sources. (e) The proposed unmixing pipeline is based on two spatial-filtering methods. While the Spatio-Spectra Decomposition (SSD) helps to enhance the signal-to-noise ratio in the frequency band of interest, the phase-coupling optimization seeks at finding the sources with the highest coherence value with respect to the audio. (f) By means of low-rank factorization, the signal reconstruction is done zeroing out the speech artifactual sources.

Fig. 1.

Speech-induced speech artifact model assumptions. (a) Schematic representation of how speech-induced mechanical vibrations can affect neural recordings during stereotactic frame surgery. The frame attached to the patient’s head acts as a resonance system with the skull, transmitting and distorting speech-induced vibrations that can affect the neural recordings’ acquisition chain. An artifactual electrode has a suspicious narrow-band component that resembles the audio time–frequency information, as illustrated in the average spectrogram plots calculated across speech-produced epochs. (b) The speech artifact tracks the fundamental frequency (F0) of the participant’s voice. (c) The coherence between the audio signal and the neural recordings across trials (<ϕ>) can assess the level of speech artifact contamination in each electrode. During speech production, many channels present artifact (lie outside the dashed circumference). (d) The recordings, at the amplifier level, can be thought as a linear mixing between the speech artifact and brain-related sources. (e) The proposed unmixing pipeline is based on two spatial-filtering methods. While the Spatio-Spectra Decomposition (SSD) helps to enhance the signal-to-noise ratio in the frequency band of interest, the phase-coupling optimization seeks at finding the sources with the highest coherence value with respect to the audio. (f) By means of low-rank factorization, the signal reconstruction is done zeroing out the speech artifactual sources.

Close modal

Considering the recorded brain activity as superpositions of different statistical sources (Sabbagh et al., 2020), the brain signals at the amplifier level can be thought of as the consequence of a linear mix between true neurophysiological and non-neural sources, including speech-induced vibrations (Fig. 1d). Using spatial filter methods, these sources can be untangled and estimated from the multivariate (multichannel) electrode signals (Schaworonkow & Voytek, 2021). As such, traditional re-referencing schemes used in neurophysiology, such as bipolar, Laplacian, or common average reference (CAR), can be reframed as spatial-filtering approaches, in which the recorded brain signals are multiplied by a predefined matrix that references the recording of one electrode with respect to a neighborhood of channels (Cohen, 2022) (see Section 2.5.1).

Data-driven spatial filters offer a more flexible re-reference scheme than traditional methods. They can be used for denoising, using linear transformations to estimate the data subspace related to “noise” and discard it for the subsequent signal analyses (Cohen, 2017, 2022; Haufe et al., 2014). This approach has been used primarily for noninvasive electrophysiology (Cohen, 2017, 2022; Haufe et al., 2014; Wallstrom et al., 2004), and more recently for iEEG signal processing (Mercier et al., 2022; Michelmann et al., 2018; Schaworonkow & Voytek, 2021). A typical pipeline for artifact removal in noninvasive EEG consists of using principal component analysis (PCA) and independent component analysis (ICA) for identifying artifact components (Jung et al., 2000; Wallstrom et al., 2004). PCA is commonly used as a dimensionality reduction tool to alleviate the decomposition done by ICA. Then, by means of low-rank factorization, backward and forward projections are made between the signal and the source space, identifying and discarding those components related to artifacts.

For the case of speech-related EMG artifacts (originating by movement of articulators), ICA-based methods were postulated as promising denoising approaches for muscular speech-induced artifact removal in surface EEG for event-related potential analyses, which do not investigate high-gamma activity (Porcaro et al., 2015; Vos et al., 2010). A key difference between muscular and vibration-induced speech artifacts is that while both overlap with high-gamma activity, the narrow-band characteristic of the vibration speech artifact may lead to poor ICA-based denoising performance due to its limitations in decomposing mixed narrow-band oscillations (Nikulin et al., 2011) (see also Section 3).

Here, we introduce phase-coupling decomposition (PCD), an algorithm for acoustic-induced artifact rejection. This algorithm performs data-driven spatial-filtering denoising based on low-rank factorization. It is designed to separate acoustic-induced artifactual sources from neural sources via a phase-coupling decomposition. The spatiospectral decomposition (SSD) (Nikulin et al., 2011) algorithm is used first to enhance signal-to-noise ratio around F0 and perform dimensionality reduction to help the search of artifactual components (Fig. 1e). The phase-coupling optimization (PCO) (Waterstraat et al., 2017) method is then applied to identify sources phase locked to the acoustic signal (Fig. 1e). Thus, the coherence between the audio and the neural data is optimized, allowing retrieval of those sources related to acoustic-induced noise. Similar to the ICA-based artifact pipeline mentioned above, signal reconstruction is based on low-rank factorization, discarding the detected artifactual components (Fig. 1f).

First, we demonstrate how PCD cleans acoustic-induced artifacts from an affected recording (Fig. 2). Then, we test the denoising performance of this algorithm in simulated data, in which the artifact and the neural sources are artificially generated and/or mixed. The algorithm successfully recovers the artifactual source, in the time, frequency, and phase domains even when dealing with highly nonstationary audio signals and high SRN (Fig. 3). Importantly, we demonstrate the algorithm’s ability to denoise while preserving neural data and compare its performance with respect to traditional spatial-filtering methods such as CAR and ICA (Fig. 4). Finally, we test the PCD denoising algorithm in real acoustic contaminated iEEG data, showing a significant reduction of the extent and number of artifact-affected electrodes while preserving the underlying speech-related neurophysiological response (Figs. 5–7).

Fig. 2.

Illustration of the phase-coupling decomposition pipeline. Space transformations from the recording space to the clean space in the temporal, power, and phase domains. (a) (Recording space), Top: iEEG recordings (black traces) were combined with the recorded audio (blue trace) to simulate the contaminated recordings for illustration purposes. Middle: note the similarity of audio and iEEG power spectra. Bottom: Polar plot showing phase-coupling value between the audio and iEEG channels (represented as dots). (b) (SSD space) SSD identifies components that maximize power around F0 (color coded in the spectral domain from dark red to light pink). (c) (PCO space) Only those SSD components with the strongest power around F0 are used to compute PCO. The MVL is optimized and only those PCO components that show the highest coherence with the audio are identified as artifactual (light-blue trace). Here, the first eight components are shown. (d) (Denoised space) via low-rank factorization, the artifactual component(s) are excluded for signal reconstruction. Note the clean iEEG traces, with no peak around F0 in the power spectra and phase-coupling values centered at zero.

Fig. 2.

Illustration of the phase-coupling decomposition pipeline. Space transformations from the recording space to the clean space in the temporal, power, and phase domains. (a) (Recording space), Top: iEEG recordings (black traces) were combined with the recorded audio (blue trace) to simulate the contaminated recordings for illustration purposes. Middle: note the similarity of audio and iEEG power spectra. Bottom: Polar plot showing phase-coupling value between the audio and iEEG channels (represented as dots). (b) (SSD space) SSD identifies components that maximize power around F0 (color coded in the spectral domain from dark red to light pink). (c) (PCO space) Only those SSD components with the strongest power around F0 are used to compute PCO. The MVL is optimized and only those PCO components that show the highest coherence with the audio are identified as artifactual (light-blue trace). Here, the first eight components are shown. (d) (Denoised space) via low-rank factorization, the artifactual component(s) are excluded for signal reconstruction. Note the clean iEEG traces, with no peak around F0 in the power spectra and phase-coupling values centered at zero.

Close modal
Fig. 3.

PCD correctly retrieves nonstationary and weak artifactual sources. Representative performance of PCD on an artifactual source with a nonstationary signal profile and an artifact-to-physiological gamma ratio (AGR) = 2dB. (Top panel) (a1-a3), Visualization in the time, frequency, and time–frequency domain of the ground-truth artifactual source for a given trial. (b1-b3), Same as above for the retrieved artifactual source for the same trial. (Bottom panel) PCD performance across trials. (c) One-dimensional statistical parametric mapping was used to evaluate statistical similarities at each sample point between the true and estimated source for each trial. No significant differences (t-stats values < t*) were found at any time point. (d) Magnitude-squared coherence estimate (MSCE) between true and estimated artifact sources. Gray lines represent individual trials. Mean MSCE across trials is denoted by the dark red line. The mean F0 across trials is shown as the blue vertical line, while the violet band indicates the SAFB. The mean MSCE value in the SAFB was always above 0.97, regardless of the variable AGR across trials. (e) Phase-locking value (PLV) across simulated trials at different AGR between the true and estimated artifact source, for each trial. Inset on the right corner aggregates the phase difference found across trials between the true and estimated artifact source.

Fig. 3.

PCD correctly retrieves nonstationary and weak artifactual sources. Representative performance of PCD on an artifactual source with a nonstationary signal profile and an artifact-to-physiological gamma ratio (AGR) = 2dB. (Top panel) (a1-a3), Visualization in the time, frequency, and time–frequency domain of the ground-truth artifactual source for a given trial. (b1-b3), Same as above for the retrieved artifactual source for the same trial. (Bottom panel) PCD performance across trials. (c) One-dimensional statistical parametric mapping was used to evaluate statistical similarities at each sample point between the true and estimated source for each trial. No significant differences (t-stats values < t*) were found at any time point. (d) Magnitude-squared coherence estimate (MSCE) between true and estimated artifact sources. Gray lines represent individual trials. Mean MSCE across trials is denoted by the dark red line. The mean F0 across trials is shown as the blue vertical line, while the violet band indicates the SAFB. The mean MSCE value in the SAFB was always above 0.97, regardless of the variable AGR across trials. (e) Phase-locking value (PLV) across simulated trials at different AGR between the true and estimated artifact source, for each trial. Inset on the right corner aggregates the phase difference found across trials between the true and estimated artifact source.

Close modal
Fig. 4.

PCD removes acoustic-induced speech artifact while preserving neural activity in simulated data, outperforming previous methods. Impact on signal quality of applying CAR, ICA, and PCD for a trial of the realistic simulation (low-amplitude broadband gamma modulation, AGR = -2 dB and nonstationary artifact source). Temporal trace (a), power spectra (b), and phase-coupling plots (c) as in Figure 2. (a) Resulting temporal traces with time reference relative to the speech onset (t = 0). (b) Power spectrum of each resulting iEEG signal (gray lines) and the recorded audio (blue line). The mean power spectrum across iEEG channels is shown by the thick black line. Violet shades indicate the SAFB. (c) The phase relationship between the audio and the resulting brain signals. Each dot represents a channel. (d) Biplots of the PC subspace described by the first three principal components (PC1, PC2, PC3). Red dots represent scores while blue lines represent the loading directions. The shape of the red cloud illustrates the PCA embedding.

Fig. 4.

PCD removes acoustic-induced speech artifact while preserving neural activity in simulated data, outperforming previous methods. Impact on signal quality of applying CAR, ICA, and PCD for a trial of the realistic simulation (low-amplitude broadband gamma modulation, AGR = -2 dB and nonstationary artifact source). Temporal trace (a), power spectra (b), and phase-coupling plots (c) as in Figure 2. (a) Resulting temporal traces with time reference relative to the speech onset (t = 0). (b) Power spectrum of each resulting iEEG signal (gray lines) and the recorded audio (blue line). The mean power spectrum across iEEG channels is shown by the thick black line. Violet shades indicate the SAFB. (c) The phase relationship between the audio and the resulting brain signals. Each dot represents a channel. (d) Biplots of the PC subspace described by the first three principal components (PC1, PC2, PC3). Red dots represent scores while blue lines represent the loading directions. The shape of the red cloud illustrates the PCA embedding.

Close modal
Fig. 5.

PCD removes the speech artifact while preserving underlying γ-activity in intracranial recordings. (a) Coherence value between a channel and the recorded audio, measured by means of ITPC. Here a dot represents a channel, and the horizontal solid line indicates the significance ITPC threshold (3.08). (b) Percentage of electrodes that lie below the significant ITPC threshold, where every dot represents the data from a recording session. (c) Time–frequency plots for raw signals and effect of applying each denoising method in an electrode with strong speech artifact (top row) and an electrode with physiological speech-related γ-activity (bottom row). Time is relative to speech onset. (d) Gamma profiles (z-scored normalized) during stimulus onset of two electrodes located in the superior temporal gyrus for two different patient’s data. (e) Gamma profiles (z-scored normalized) during speech onset of two electrodes located in the precentral gyrus for two different patient’s data. Different color traces are used for the raw and the resulting denoised data via each tested method.

Fig. 5.

PCD removes the speech artifact while preserving underlying γ-activity in intracranial recordings. (a) Coherence value between a channel and the recorded audio, measured by means of ITPC. Here a dot represents a channel, and the horizontal solid line indicates the significance ITPC threshold (3.08). (b) Percentage of electrodes that lie below the significant ITPC threshold, where every dot represents the data from a recording session. (c) Time–frequency plots for raw signals and effect of applying each denoising method in an electrode with strong speech artifact (top row) and an electrode with physiological speech-related γ-activity (bottom row). Time is relative to speech onset. (d) Gamma profiles (z-scored normalized) during stimulus onset of two electrodes located in the superior temporal gyrus for two different patient’s data. (e) Gamma profiles (z-scored normalized) during speech onset of two electrodes located in the precentral gyrus for two different patient’s data. Different color traces are used for the raw and the resulting denoised data via each tested method.

Close modal
Fig. 6.

Artifact retrieval comparison. Comparison in the temporal, frequency, and time–frequency domain of the recorded audio and artifactual component estimated by CAR, ICA, and PCD. For a given participant, at a randomly selected trial, the component with the highest phase relationship with the audio is shown for ICA and PCD, while for CAR, average across trials is depicted.

Fig. 6.

Artifact retrieval comparison. Comparison in the temporal, frequency, and time–frequency domain of the recorded audio and artifactual component estimated by CAR, ICA, and PCD. For a given participant, at a randomly selected trial, the component with the highest phase relationship with the audio is shown for ICA and PCD, while for CAR, average across trials is depicted.

Close modal
Fig. 7.

Understanding the impact of each denoising method as a preprocessing step. (a) Percentage gain in electrodes below the significant ITCP threshold (Gain) versus artifact homogeneity and strength, for CAR, ICA, and PCD. Each dot represents the data from a recording session. Raw data with no detectable speech artifact are marked with a green star. Data from three subjects (A, B, and C) are highlighted: Homogeneous artifact case (Subject A); strong artifact case (Subject B), and no artifact case (Subject C) represented as circles with blue, gray, and black contours, respectively. (b), Impact in speech decoding from neural data (detection of consonant–vowel syllables) in three cases: (i) clean electrode (CE) gain is higher for CAR and ICA than for PCD (Subject A), (ii) CE gain is better with PCD than for the other two methods (Subject B), and (iii) data without speech artifact (Subject C). ACC stands for accuracy, that is, the number of correct classified samples over the total of samples. Every point represents the accuracy of the testing data on a 5-fold cross-validation scenario.

Fig. 7.

Understanding the impact of each denoising method as a preprocessing step. (a) Percentage gain in electrodes below the significant ITCP threshold (Gain) versus artifact homogeneity and strength, for CAR, ICA, and PCD. Each dot represents the data from a recording session. Raw data with no detectable speech artifact are marked with a green star. Data from three subjects (A, B, and C) are highlighted: Homogeneous artifact case (Subject A); strong artifact case (Subject B), and no artifact case (Subject C) represented as circles with blue, gray, and black contours, respectively. (b), Impact in speech decoding from neural data (detection of consonant–vowel syllables) in three cases: (i) clean electrode (CE) gain is higher for CAR and ICA than for PCD (Subject A), (ii) CE gain is better with PCD than for the other two methods (Subject B), and (iii) data without speech artifact (Subject C). ACC stands for accuracy, that is, the number of correct classified samples over the total of samples. Every point represents the accuracy of the testing data on a 5-fold cross-validation scenario.

Close modal

2.1 The phase-coupling decomposition algorithm

We consider the brain recordings as a linear combination of statistical sources, where at least one of those sources is related to the speech artifact, which can be detected by measuring the coherence value between the audio signal and the neural recordings (Bush et al., 2022). The recorded brain signals are considered then as terms of a linear forward model, in which NC statistical sources are projected into NC channels along the NS data samples via a linear mixing process:

(1)

where XNc×Ns denotes the recorded brain signals, SNc×Ns is the source activity, and ANc×Nc denotes the mixing matrix. Each column a of the mixing matrix A is known as spatial pattern and describes the projection of an individual source to the sensor space.

The idea of PCD is to find the artifactual sources that are phase coupled with the acoustic signal, denoted here as zNs. In that respect, PCD seeks to find the artifactual sources by solving the inverse problem related to (1) so that:

(2)

where T denotes transpose and S^RNc×Ns is a matrix of whose columns have the estimated artifactual sources.

The denoising pipeline can be thought of as a supervised two-step spatial-filtering procedure which performs data cleaning based on low-rank factorization (Haufe et al., 2014). As shown in Box 1, the speech artifact rejection comprises four main steps: (1) band-width estimation, (2) spatiospectral decomposition, (3) phase-coupling optimization, and (4) signal reconstruction. In the following subsections, each step is described.

Box 1.

Phase-Coupling Decomposition

Given the recorded brain activity XNc×Ns, where Nc and Ns are the number of channels and samples, respectively, and an acoustic signal zNs, this algorithm identifies narrow-band components of X that are phase coupled with z.

Step 1: Estimate the speech artifact bandwidth.

  1. Calculate the power spectrum of z.

  2. Around the fundamental frequency (F0) of the voice, find the peak in the spectra (PF0).

  3. Define the center PF0 and bandwidth of the speech artifact frequency band (SAFB): Fc± Δ Fc.

Step 2: Find projections of X that maximize the signal-to-noise ratio (SNR) around Fc via the spatiospectral decomposition (SSD) (Nikulin et al., 2011).

  1. Define as signal band the SAFB: Fc± Δ Fc, and as noise band all frequencies except SAFB. Find the projection matrix WSSD by solving equation (3) and project the data on the first k SSD components W˜SSD=[w1,,wk], as follows: X˜SSD=W˜SSDTX.

Step 3: Find the artifactual sources via the phase-coupling optimization algorithm (PCO) (Waterstraat et al., 2017).

  1. In the SSD space, find the projection matrix WPCO which maximizes the mean vector length (see equations (6) and (7)) between X˜SSD and z.

  2. Compute the PCO components XPCO=WPCOTXSSD and identify the m acoustic-induced sources.

Step 4: Perform the denoising via low-rank factorization.

  1. Compute the PCD unmixing matrix WPCD which projects X to the artifact source space. This matrix is the conjunction of both WPCO and WSSD (equation (8)).

  2. Compute the PCD mixing matrix via the pseudoinverse:

    APCD=WPCD

  3. Reconstruct the signal by using (Ncm) components: Xdenoised=A˜PCDW˜PCDTX.

2.1.1 Speech artifact band-width estimation

The speech artifact is a narrow-band component noise. It happens around the fundamental frequency (F0) of the participant voice (Bush et al., 2022). The recorded audio signal z is used as a proxy for the speech artifact frequency band estimation. We use Welch’s method (Welch, 1967) for calculating the power spectra of the audio (detrended signal). An estimate of F0 from audio recordings is used as a starting point to find a peak in gamma band (50–250 Hz). Here, the Gaussian curve fitting algorithm is used to calculate the central frequency (FC) and band width (ΔFc) of the speech artifact frequency band (SAFB) by estimating the mean and full width at half maximum of the resulted fitted signal. The SAFB is then defined as Fc± ΔFc.

2.1.2 Spatiospectral decomposition

The spatiospectral decomposition algorithm (SSD) is a spatial-filtering approach that allows to maximize the signal power at a given frequency peak while minimizing it for surrounding frequencies (Nikulin et al., 2011). The method assumes that the recorded brain activity is a superposition of narrow-band oscillations at the given frequency peak of interest (signal) and neighboring frequency bands (noise), that is X=Xsignal+Xnoise. By traditional temporal filters, such as the Butterworth zero-phase filter, the signal and noise contributions to X can be separated (Nikulin et al., 2011).

The objective of SSD is to find a set of spatial filters (columns of an unmixing matrix WSSD) such that the power of the signal is maximized while the power of the noise is minimized:

(3)

where Σsignal= XsignalTXsignal  and Σnoise= XnoiseTXnoise  are the covariance matrices on the signal band and noise band, respectively.

The number of components returned by SSD is equal to the number of channels given in X, sorted according to the relative power spectrum in the frequency band of interest. Using the first k SSD components, dimensionality reduction is achieved by projecting the data on the reduced SSD space as follows:

(4)

where W˜SSD=[wSSD1,,wSSDk].

In the context of the vibration artifact denoising, we consider here that our frequency band of interest is the SAFB. Also, the data projected on the first k SSD components X˜SSD are used as a good starting point to run the second spatial-filtering technique involved in the PCD pipeline.

2.1.3 Phase-coupling optimization

Phase-coupling optimization (PCO) is a supervised spatial-filtering algorithm developed by Waterstraat et al. that seeks to recover neural sources that are phase coupled to an external target variable (Waterstraat et al., 2017). The optimization criterion is based on the maximization of the mean vector length (Canolty et al., 2006) (MVL), a differentiable metric that results in values different from zero when a phase coupling exists. Within this denoising pipeline, we take advantage of the PCO formulation and extend it in order to find the artifactual sources underlying neural activity. Phase coupling is then computed via the MVL between the SSD data projection X˜SSD and the best guess of the artifact source. Considering that data in SSD space belong to the complex domain, the analytic signal YSSD should be first obtained. The Hilbert transform is utilized to find the imaginary part such that

(5)

where i denotes the imaginary unit defined so as i2=1. Therefore, the MVL is defined by

(6)

Note that phase coupling is calculated across sample points, such that MVL represents a summary of the phase relation across a given time window. Assuming that the artifact source is the recorded audio signal z, a linear filter wPCO=w=[w1,,wk] is found by maximizing the mean vector length at each sample point between the acoustic signal and the SSD components, as follows:

(7)

where |·| denotes the amplitude of a complex signal and yt is k dimensional vector at the tth sample point.

Although the function defined in (7) is differentiable, convexity with respect to w cannot be guaranteed. Thus, the best solution is selected out of a pool of different solutions found by several random initializations of w. Typically, between 10 and 15 independent runs of the algorithm are done, and the solution with the largest MVL is selected as the best one. The complete set of spatial filters that defines the PCO demixing matrix WPCOk×k is found by solving (7) iteratively, out projecting the previously found filters from the solution space (Dähne et al., 2014; Waterstraat et al., 2017). It is important to mention that the data are firstly whitened by a matrix M. The column vectors of WPCO are sorted in decreasing order according to the MVL value found during the optimization procedure. Thus, the first m components from the resulting data projection in the PCO space XPCO=WPCOTXSSD correspond to the speech artifact sources.

2.1.4 Signal reconstruction

The objective of the PCD pipeline is to denoise signals contaminated with the speech artifact. The artifact source estimation is just a proxy to facilitate data cleaning. Given that two spatial-filtering transformations are applied as a chained process, the PCD unmixing matrix WPCD which projects the raw data from the amplifier space to the artifact source space should be computed by taking into account the learned linear transformations WPCO and WSSD as follows:

(8)

where I denotes the identity matrix of dimension Nck×Nck, and M is the Nc×Nc whitening matrix applied to find the set of solution vectors to (7).

Once the unmixing matrix is computed, the mixing matrix that defines the forward model explained in (1) can be calculated based on its pseudoinverse:

(9)

where denotes the Moore–Penrose pseudoinverse. Then the set of equations that define the linear forward and backward model is given by

(10)

Now, by low-rank factorization, the denoised signal can be reconstructed by zeroing from the mixing and unmixing matrices the first m components which were previously identified as artifactual. Thus, signal reconstruction is simply done by

(11)

where A˜PCD and W˜PCD represent the mixing and unmixing matrices of rank Ncm, respectively.

Figure 2 depicts how contaminated recordings are denoised by applying PCD, in the temporal, spectral, and phase domains. For illustration purposes, we artificially contaminated iEEG recordings with an audio signal (Recording space, Fig. 2a), resulting in iEEG signals with a strong peak in spectral power around F0 (Fig. 2a center), together with a consistent phase relationship with the acoustic signal (Fig. 2a bottom). After SSD is applied, only a few components have enhanced power spectra around F0 (Fig. 2b center), serving as a good starting point for the next step. Applying PCO in the SSD space gives us components maximally phase coupled with the recorded audio (Fig. 2c bottom). The components identified as artifactual will be discarded for signal reconstruction, thus achieving a denoised signal (Fig. 2d).

2.2 Simulated neural data

Given that it has also been observed acoustic-induced vibration artifact during stimulus presentation (Roussel et al., 2020), physiological gamma modulation simulations allow direct comparison of denoised data to well-defined ground-truth simulated neural signals, ensuring gamma activity solely linked to speech production. In this line, to benchmark our method’s denoising performance, we applied PCD on simulated neural data with added simulated speech artifact. This approach allows direct comparison of denoised data with the ground-truth simulated neural signals. Several simulations were conducted as explained hereafter. Briefly, we simulated recurrent networks of leaky integrate-and-fire (LIF) neurons (N = 5000, 80% excitatory and 20% inhibitory) to generate 100 physiological broadband γ-source activities, defined as the summation of absolute values of the excitatory and inhibitory currents entering the excitatory neurons population (Supplementary Fig. 1). The model captures key features of physiological LFPs, including (i) the entrainment of the low-frequency fluctuations to the external input, (ii) the internal resonance of broadband γ-oscillations driven by the mean rate of the external input (Supplementary Fig. 2), and (iii) phase–amplitude coupling between low- and high-frequency oscillations (Supplementary Fig. 3). We defined a single simulated speech artifact source Sa(t) assuming that it is identical to the produced audio signal, denoted as z  in the PCD pipeline (Sa(t)z(t)). After adjusting what we called the artifact-to-physiological gamma ratio (AGR), we linearly projected the sources to the simulated recordings by the application of a random mixing matrix. Note that AGR can be thought as the inverse of the SNR. We tested the PCD pipeline in different simulation scenarios based on the expression of the external input νsignal(t) driving LIF neurons and the audio source Sa(t) (Supplementary Fig. 1). Details about the model and simulation settings can be found in the Supplementary Table S1.

2.2.1 Simulation scenarios

Toy scenarios were created to stress the method under different predefined source conditions, as depicted by Supplementary Figure 1.

2.2.1.1 Toy examples

We used toy examples for initial assessment of the effect of the PCD pipeline to remove the acoustic-induced artifact in in-silico neural signals (Supplementary Figs. 4–8). We used three scenarios for these toy examples, where νsignal(t) was defined as the superposition of the sustained and periodically modulated signal, as follows:

(12)

where As is the amplitude of the sustained input, and Ap, fp, and φp are the amplitude, the frequency, and the phase of the periodic signal, respectively.

In Scenario 1 (sinusoidal audio scenario, SAS), we synthetized the contaminating audio source Sa as a sinusoidal function with noise, as follows:

(13)

where A0 is the amplitude, F0 is the fundamental frequency, φ0 is the phase, and η(t) is the white noise term.

In Scenario 2 (colored noise audio scenario, CAS), the audio source was synthesized by coloring the spectrum of the white noise η(t), as follows:

(14)

where BP(t) is the impulse response of the 25th order Butterworth filter, σ is the standard deviation, and * is the convolution operator.

In Scenario 3 (modulated colored noise scenario, MCAS), we mimicked the temporal profile of the audio in the Syllable Repetition Task (see Triplet repetition task data), applying the activation pattern M(t) to the audio source in Scenario 2, which reads:

(15)

where ti is the onset of ith syllable, which lasts 0.5 s. Parameters used to define the audio signal in the different toys examples can be found in Supplementary Table S2.

2.2.1.2 Realistic scenario

In this scenario (recorded audio scenario, RAS), we simulated 60 trials of speech-locked transient γ-source activity, feeding Gaussian signals in the network, as follows:

(16)

where A and γ are the intensity and temporal location modulation, and FWHM is the fullwidth at half maximum which controls the temporal focality of the modulation. Parameters were set to reproduce physiological speech-locked γ-modulation patterns (see Supplementary Fig. 9) described in the literature (Chrabaszcz et al., 2019; Fischer et al., 2020; Moses et al., 2021). As audio source, we selected recorded audios of Parkinson’s disease patients during the Syllable Repetition Task. Three different audio signals were selected accordingly to the spectrogram profiles, for example, nonstationary of the pitch (Supplementary Fig. 10). Across trials, the same audio signal was mixed by changing the linear mixing matrix A in Eq. (1).

2.3 Neural real data

In total, 54 English-speaking patients undergoing an awake DBS implantation surgery consented to participate in an intraoperative speech task at the University of Pittsburgh (IRB Protocol #PRO13110420). Patients were instructed to repeat consonant–vowel syllable triplets that were played on their earphones. Specifically, the basis set of phonemes included four consonants (/v/, /t/, /s/, /g/) and three cardinal vowels (/i/, /a, /u/) with distinctive tongue positions and acoustic properties. Then, 1 to 4 recording sessions of up to 120 syllable triplets each were performed by participants. Sessions differed regarding the neural recording modalities registered. Electrocorticography (ECoG) was always registered through 1 or 2 strips of 53 or 64 contacts targeted to cover the left ventral sensorimotor cortex, superior temporal gyrus, and inferior frontal regions. Microelectrode recordings (MER) and LFPs from a macro ring located 3 mm above the microelectrode tip were registered during subcortical target mapping, for participants undergoing subthalamic nucleus (STN) or Globus Pallidus Internus (GPi) DBS. LFPs registered from the DBS lead were recorded from the STN, GPi, and ventral intermediate nucleus (VIM). The produced audio signal was recorded by a directional microphone placed near the patient’s mouth. Data were time-aligned, preprocessed, low-pass filtered at 500 Hz, down-sampled to 1 kHz, and saved as a FieldTrip (Oostenveld et al., 2011) object for subsequent analyses. For more information about the dataset, we refer the reader to Bush et al. (2022).

2.4 PCD algorithm implementation

F0 may vary between trials of the same participant. As a result, the phase relationships between the artifact source and the neural recordings change across trials (Supplementary Fig. 12). To better track F0 changes, the PCD pipeline was applied on a trial-wise basis. Data were high-pass filtered above 2 Hz and notch filtered at 60 Hz and its three first harmonics. Given that the artifact was observed only during overt speech production times, epochs around the produced speech onset were extracted for fitting the model. Under the hypothesis that the artifact is introduced in the acquisition chain, the different synchronized brain recording modalities (MER-LFP + ECoG or DBS-LFP + ECoG) were combined to form a unique data matrix. That is, the signals from all available brain recording modalities were treated as a unique sensor space.

For every trial, the model was fitted as follows. First, the audio signal was used to estimate the SAFB, and thus the noise and signal band were accordingly defined (Box 1 - step 1). Data preparation proceeded, which included applying the Hilbert transform to find the analytic representation of the signal, as well as data whitening. The audio signal was z-scored for normalization purposes and used as the best guess of the artifact source. The SSD algorithm was applied to the real part of the signal. The final number of SSD components to keep (k) was automatically selected based on the participation ratio (PR), defined as follows:

(17)

where λ1 λ2λNC are the eigenvalues resulting from solving SSD which account for the SNR participation of each SSD component. The PR has been shown to be a reasonable measure of neural dimensionality in PCA (Gao et al., 2017). PCO was applied on data projected onto the SSD space, extracting one component at the time. Every component resulted in a mean vector length (MVL) value, such that MVL1MVL2MVLk. Once both spatial-filtering methods were learned, the unified unmixing and mixing matrices that described the PCD pipeline were computed (Box 1 – step 4).

The identification of the artifactual sources was also automatically done by finding the elbow in the trace of the MVL value across components, that is the point at maximum curvature differentiating the contribution of strong and weak components. Those components showing the highest MVL values were identified as artifactual.

Considering that further analysis would be done on the data, a wider epoch starting 6 seconds before and ending 6 seconds after the produced onset was used when applying the learned matrix transformation.

2.5 Traditional methods for denoising

The performance of PCD was compared against the traditional denoising method based on spatial filtering. In the following, each of the baseline methods is presented.

2.5.1 CAR as a spatial-filtering algorithm

Common average reference (CAR) is a spatial-filtering method that subtracts the common noise at every electrode, calculated as the average of all recordings. The data re-referenced via CAR are calculated as follows:

(18)

where Nc accounts for the number of electrodes and J=[1Nc]T is an Nc×Nc matrix of ones.

With this formulation, it is easy to see that CAR can be thought of as a spatial-filtering method which implies a linear transformation of the data.

Given that CAR takes the average across channels, data structure is not critical in this matter. The matrix transformation was applied in the continuous data for each type of channel recording.

2.5.2 The PCA + ICA pipeline

Independent component analysis (ICA) assumes that the sources linearly mixed in (1) are independent. Such an assumption is true for many artifacts that appear in brain recordings, such as electrocardiography or electromyography. In order to ensure that independent sources are extracted from the data, non-Gaussianity of WTX is maximized. Here in particular, ICA is implemented using Picard (Ablin et al., 2018), a fast algorithm to solve the maximum likelihood estimation ICA formulation. This ICA implementation was chosen since it is known to result in robust components estimations in cases where the sources are not completely independent.

Basic data preprocessing was done before applying ICA. A temporal filtering above 2 Hz was implemented in order to remove low-frequency drifts, which can negatively affect the quality of the ICA fit (Winkler et al., 2015). Then data were z-scored and PCA was applied in order to both feed ICA with whitened data and to reduce the dimensionality of the problem. The number of PCA components was automatically selected by setting the explained variance to 0.99. As in the PCD pipeline, model fitting was done using the windowed data within the produced audio epoch, combining, if exists, the different brain recording modalities. For the sake of comparison with PCD, artifactual sources identification was done based on the circular mean of the phase difference. Those components showing the highest phase-locking values were selected as artifactual. Denoising via low-rank factorization was applied in the wider epochs, as done for PCD.

2.6 Metric for performance assessment

2.6.1 Performance assessment in toy examples

Given that in toy examples we simulated only one artifactual source, we expected the PCO to find only one component with high coherence with the audio (i.e. MVL) in the PCO space.

To assess the performance of the denoising pipeline, we calculated time-, frequency-, and phase-domain metrics that estimate the agreement between the ground-truth Xgt and cleaned data X, as well as between the true z(t) and estimated artifactual source z˜(t).

For each channel, we compared the similarity in the time domain of the neural data (X vs.Xgt) as follows:

(19)

where T is the duration of the simulation and σ2(·) is the variance. χ values have been converted to a logarithmic scale for visualization purposes (Supplementary Figs. 5, 7).

To assess the fidelity of the estimated speech-induced artifact (z vs. z˜), we used the magnitude coherence estimate (MSCE) and the consistency of the phase difference based on the PLV. The MSCE returns values between 0 and 1 indicating how similar two signals are at each frequency, as follows:

(20)

where P stands for the power spectral density. The PLV is a metric between 0 and 1 that quantifies the phase agreement between two signals, as follows:

(21)

where ϕ denotes for the instantaneous phase.

In addition, in the RAS examples, we assessed the similarity of the true and estimated audio source by computing the one-dimensional statistical parametric mapping (SPM1d, https://spm1d.org/). By mean of SPM1d, statistical differences at the sample level were assessed. Finally, to evaluate the goodness in the source recovery of the method at a global level, the Mean Square Error (MSE) was also computed. To account for a bounded MSE value, each signal was first divided by its maximum value before computing the error. As such, the closer to zero the normalized MSE is, the most similar the two signals are.

2.6.2 Neural preservation assessment

Principal Component Analysis (PCA) is a dimensionality reduction technique that identifies an ordered set of orthogonal directions that captures the greatest variance in the data. It is widely accepted and used in the neuroscience community for analyzing neural population activity (Cunningham & Yu, 2014). The low-dimensional space identified by PCA captures variance of all types, including noise. Such data representation can be thought of as the Cartesian coordinate basis describing subspaces in which the data lie (Chung & Abbott, 2021). Thus, for assessing neural preservation after applying denoising methods, the PCA low-dimensional space was utilized. Subspaces describing the same geometry as the ground-truth data should be found after denoising if neural preservation is achieved. Thus, for every denoising pipeline, as well as the ground truth and the noisy data, PCA was fitted independently. The learned loading and scores were plotted for each decomposition made on the first three principal components (PC).

To quantify the preservation of the physiological signal, we measured the degree of similarity between the PCA loadings (LNc×Nc) of the ground-truth data and the resulting data after each denoising pipeline was applied, using the cosine similarity (CS) (Koay et al., 2022). CS measures the extent to which two vectors point in the same direction. In this way, CS was used as a tool to quantify the degree of neural preservation after applying a denoising pipeline. Given that PCA loading signs are arbitrary, bounded CS between 0 and 1 can be found by taking the absolute value of the cosine similarity, as follows:

(22)

where i denotes the index of a given loading vector, Lgt indicates the PCA loading matrix of the ground-truth data Xgt, L stands for the PCA loading matrix of a given denoised data Xdenoised, and ||·|| and |·| denote the 2 norm and the absolute value operators, respectively. The closer CS is to 1, the more similar the two vectors are. Thus, a good denoising pipeline from the neural data preservation point of view should be the one from which PCA loading vectors resemble the same directions as the data with only true gamma modulation.

2.6.3 Intertrial coherence for speech artifact quantification

The speech artifact level at each electrode was computed based on the intertrial phase consistency (ITPC) (Cohen, 2014), following the same framework proposed and used in Bush et al. (2022). That is, the audio and the neural signals were band-pass filtered between 70 and 240 Hz, that is, within the plausible SAFB range. Then considering the complex representation of the neural data for a given channel y=x+i(x) and the audio signal z at a given trial e, the phase between these two quantifies can be measured by

(23)

At the end of this procedure, all the phases across the Nt trials are arranged on a vector φ= [φ1,,φNt]. If there is intertrial phase consistency, the mean value of ϕ across trials (〈ϕ〉) will be different from 0, and thus it can be quantified as follows:

(24)

where std(·) stands for the standard deviation. It has been found that ITPC values equal or above 3.08 indicate that the presence of the speech artifact on that given electrode is significant, and thus the electrode must be considered contaminated (Bush et al., 2022). Here ITPC was used to assess the level of audio contamination an electrode has either before or after a denoising pipelines was applied.

2.6.3 Artifact presence quantification: definition of homogeneity, strength, and clean electrode gain

To understand denoising performance with respect to the characteristic of the artifact presence across electrodes, we defined indices to quantifying the artifact homogeneity and strength across electrodes, as follows.

Artifact homogeneity should quantify the consistency of the artifact presence across electrodes. Let us denote ϑ=[ITPC1,,ITPCNC] as a vector with the stored ITPC value found per each electrode. Considering that n-dimenstional unit vectors have variance between 0 and 1n, following the idea proposed by Umakantha et al. (2021), we define the artifact homogeneity as follows:

(25)

that is, a value between 0 and 1, for homogeneous artifact presence and non-homogeneuous artifact presence, respectively.

To summarize the presence of the artifact, we defined the artifact strength by taking the mean ITPC value found across electrodes:

(26)

In addition, we computed the clean electrode gain by taking the relative change of the clean electrode percentage before (%CEdenoised) and after (%CEraw) applying a denoising method, that is

(27)

2.7 Deep learning model for consonant identification

In the case of BCI for speech prothesis, the denoised data will most probably be used to feed a brain decoding algorithm, reason from which it is important to evaluate the impact of each denoising method as a preprocessing step in a speech decoding pipeline. Here, a densely connected convolutional neural network (DenseNet) (Huang et al., 2017) was trained to classify the consonants from neural signals (ECoG). For each syllable, original/denoised ECoG signals were spectrally filtered into 7 frequency bands ranging between 0 to 250 Hz. These syllable-level neural data were used as training set. We then extracted the perceptual linear prediction (PLP) (Hermansky, 1990) features from the corresponding audio recordings. Both PLP features and consonant identities were used as training labels. Our DenseNet model was designed to first map neural signals into PLP spectra and then predict the consonant class from the PLP space. For that purpose, the mean-squared-error loss of PLP feature prediction and the cross-entropy loss of consonant classification were jointly optimized during model training. We used 5-fold cross-validation while measuring model performance, withholding a different 20% partition from the training data as test set each time. Separate models were trained for each subject and each data type, resulting in 12 different models (3 subjects, 4 data types: Raw, CAR, ICA, PCD). During testing, reserved ECoG data were fed into the trained model, and the accuracy was measured based on the consonant predictions.

3.1 Method benchmarking on in-silico data

3.1.1 Algorithm stress test on simulated data: parametric sweep and performance evaluation

We applied the PCD pipeline on simulated data using the speech artifact Sa(t) as audio signal (Sa(t)z(t)). Given that we simulated only one artifactual source, we expected the PCO to find only one component with high coherence with the audio (i.e. MVL) in the PCO space. The quality of denoised data was compared against the ground truth by means of Eq. (19), while the similarity of the retrieved artifactual source to the original one was computed using Eqs. (20) and (21).

To stress the PCD pipeline in the different scenarios, we evaluated performance during a sweep of key parameters in the toy SAS and CAS examples. While for the SAS scenario AGR [-100, 30] dB, F0 [70, 180] Hz and Nc [3, 100] were varied, in the CAS scenario we swept ΔF [2, 16] Hz, the filter order [3, 27]  and the simulation duration [0.5, 3.5] s.

Results showed that for pure sinusoidal artifacts, PCD perfectly removed the artifact regardless of the AGR, fundamental frequency, and number of channels (Supplementary Figs. 4, 5). For the colored noise artifact, simulations suggested a small decrease of performance with broadband artifacts, which is consistent with known limitations of SSD (Supplementary Figs. 6, 7). Finally, PCD yielded robust performances when tested with modulated colored noise artifact simulations (Supplementary Fig. 8).

Similarly, in the case of RAS scenario, PCD reliably recovered the artifactual source, as can be observed in the time, frequency, and time–frequency profiles of both the true (Fig. 3a) and estimated (Fig. 3b) artifact sources. Across trials, no differences were found at any sample point between the true and the estimated artifact sources, as assessed by SPM1d (Fig. 3c). The median-normalized MSE yielded value across trials was 1.5 x 103. The estimated artifactual source had high coherence with the true artifactual source at frequencies around F0 (mean MSCE > 0.97, Fig. 3d), independently of differences in the AGR (Supplementary Fig. 11). Finally, the estimated source was either in phase or antiphase relationship with respect to the true source (Fig. 3e). Similar results were found for the other realistic simulations (Supplementary Fig. 11), showing that independently of the pitch modulation (F0 nonstationarity) and the AGR value, the artifactual source could be retrieved. Nevertheless, as may be expected, stronger artifacts (high AGR) were more accurately extracted by the method, rendering higher PLV of the extracted component with the simulated artifact source (Supplementary Fig. 11).

3.2 PCD removes the speech artifact while preserving simulated γ activity

Next, we compared the performance of PCD with CAR and ICA using simulated data under the realistic scenario (RAS). For a fair comparison, all three methods were implemented on a trial-wise basis and components identified by ICA were scored according to their phase-coupling value against the recorded audio. We evaluated performance in terms of the capacity of each method to retrieve the temporal, frequency, and phase information of the simulated ground-truth neural data (Fig. 4a–c). Figure 4 shows the simulated neural activity without the artifact source (Artifact free), when linearly combined with the artifactual source (Contaminated), and the resulting denoised signals from each method (CAR, ICA, PCD). PCD outperformed the other methods in terms of its ability to clean the data and retrieve the simulated brain signals, as can be observed in the time (Fig. 4a), frequency (Fig. 4b), and phase (Fig. 4c) domains. In this simulation, CAR produced traces very similar to the contaminated signal, while ICA attenuated the artifact to some extent. Interestingly, PCD completely removed the narrow-band component induced by the speech artifact while preserving gamma modulation observed in the simulated brain signals.

To assess the preservation of physiological brain signals after denoising, we compared the neural embedding defined in the subspace spanned by the first three PCA components (PC subspace). Note how the PC embedding is distorted in the contaminated signals, as compared with the artifact-free signals (Fig. 4d). CAR resulted in an embedding indistinguishable from the contaminated signals (Fig. 4d), consistent with the time, frequency, and phase analyses. Interestingly, while ICA was able to attenuate the speech artifact, the PC embedding was different from that for the artifact-free signals, indicating that ICA distorted the underlying physiological sources. Here, PCD was the only algorithm that completely removed the artifactual source, as assessed by the signals’ power spectrum (Fig. 4b) and phase-coupling (Fig. 4c) plots, while simultaneously preserving the underlying physiological activity, as evidenced by the indistinguishable PC embedding to that of the artifact-free signals (Fig. 4d). To quantitively assess the similarities to the ground-truth PC embedding, and thus provide a proxy for the preservation of the underlying neural signal after denoising, we used the cosine similarity metric of the three first components (Eq. (22)). The mean resulting value was of 0.79, 0.74, and 0.99 for CAR, ICA, and PCD, respectively, while for the contaminated data this value was 0.79. Lower CS value for ICA than the contaminated data could be explained by the fact that ICA is changing the embedding even more drastically than the artifact itself.

3.3 Denoising PCD performance in acoustic contaminated iEEG data

We applied the PCD pipeline to intracranial recordings of 54 patients undergoing DBS implantation surgery who participated in an intraoperative speech task (see Section 2.3). As we previously described, around 40% of channels in this dataset show speech-induced vibration artifact (Bush et al., 2022).

Considering that the source of the speech artifact is the same across different types of electrodes, the denoising pipeline was applied to all available iEEG recordings together (LFP from ECoG, the DBS lead, and the ring and tip contact of the microelectrode). Given that phase relationships between the recorded audio and the neural recordings are not consistent across trials (Supplementary Fig. 12), data cleaning occurred in a trial-wise manner. The resulting denoised signals were compared with those obtained after applying CAR and ICA as in Figure 4. The number of components to be removed was automatically selected based on the elbow detection on the MVL values.

To assess denoising performance, we used the distribution of ITPC with the audio (Eq. (24)), (Fig. 5a) and the percentage of electrodes without significant ITPC (Fig. 5b) before vs. after applying a denoising framework (Raw vs. CAR, ICA, PCD). Results show that CAR exacerbated the artifact, shifting the ITPC distribution toward higher values (Fig. 5a), whereas ICA and PCD reduced ITPC values and increased the percentage of electrodes without significant coherence with the audio (Fig. 5a–b). However, as shown next, the low coherence values for ICA were due to an aggressive removal of all high-frequency components (artifactual and physiological).

To illustrate the method’s performances, time–frequency plots are shown from two electrodes of the same participant and trial. Electrode ecog149 has a strong artifactual narrow-band component after speech onset (Fig. 5c top panel), while ecog159 has a characteristic physiological gamma modulation around the time of speech onset (Fig. 5c bottom panel). Note that the speech artifact is either exacerbated or artificially introduced after applying CAR due to the presence of the artifact on other electrodes (Fig. 5c – CAR). Interestingly, ICA abolished the physiological gamma modulation observed in ecog159, whereas PCD preserved this activity while simultaneously removing the narrow-band component (Fig. 5c). A similar plot, but for electrodes with strong artifact component also in the first harmonic of F0, is presented in Supplementary Figure 13, showing that the method can reduce the presence of the artifact also in harmonics F0 frequencies.

It is of particular interest to analyze the high-gamma profiles before and after applying each denoising method. To this end, we extracted high-gamma power time locked to the auditory stimulus onset and the speech onset, for electrodes located in auditory cortex (superior temporal gyrus, Fig. 5d) and motor cortex (precentral gyrus, Fig. 5e), respectively. The gamma response shown in the auditory cortex during auditory stimulus presentation was unaffected by the artifact; however, it was greatly attenuated by ICA (Fig. 5d). Similarly, ICA also abolished the gamma response during speech production for the electrodes shown over motor cortex (Fig. 5e).

To further explore the differences in performance across methods, we studied the artifactual sources retrieved by CAR, ICA, and PCD. For the case of ICA and PCD, the first component (which maximum phase-locking value with the audio) is shown. In addition, the number of removed components by these two methods was analyzed. Although in most of the cases between 1 and 4 components were removed by both ICA and PCD (Supplementary Fig. 14), the artifactual sources retrieved by ICA do not resemble a speech-induced artifact (Fig. 6). As expected, CAR component, calculated by taking the mean across channels, also does not resemble the vibration artifact.

Additionally, to explore whether the variability in denoising performance can be explained by characteristics of the artifact across electrodes, we evaluated the relationship between the relative gain of electrodes by each denoising method (Eq. (27)), artifact strength (Eq. (26)), and artifact homogeneity (Eq. (25)). Results are shown in Figure 7a. Note that recordings with negative gain (i.e. increased in percentage affected electrodes) are shown in red. CAR has good denoising performance only for highly homogeneous artifacts with a mild-to-moderate artifact strength (Pearson correlation rgainhomogeneity=0.59, p < 0.0001; rgain strength=0.23, p = 0.006) and can outperform the other methods under these conditions (e.g. Subject A, blue line circles in Fig. 7a). Note that CAR can introduce artifact to clean data (e.g., Subject C, black line circles in Fig. 7a). For ICA, stronger artifacts with mild-to-moderate homogeneity resulted in higher gain of clean electrodes, as assessed by ITPC (rgainstrength=0.57, p < 0.0001; rgainhomogeneity=0.24, p = 0.003). Interestingly, PCD gain also increases with higher artifact strength but is not significantly correlated with artifact homogeneity (rgainstrength=0.50, p < 0.0001; rgainhomogeneity=0.12, p = 0.14), as illustrated by Subject B (gray line circles in Fig. 7a).

Finally, we assessed the effect of applying each denoising method as a preprocessing step on decoding performance of a densely connected convolutional neural network (DenseNet) (Huang et al., 2017) for consonant decoding. We tested the decoding method in three different cases: (i) CAR and ICA gain is greater than PCD gain (Subject A), (ii) PCD outperforms CAR and ICA (Subject B), and (iii) data have no artifact (Subject C). Figure 7b shows that consonant classification accuracy is similar or better when PCD is applied as compared with classification on raw data. Conversely, ICA always decreases classification accuracy, despite increasing the number of electrodes bellow the significant ITPC threshold. Interestingly, for Subject A although the number of contaminated electrodes increased after CAR, the consonant classification accuracy improved, suggesting that decoding capacity in this subject might be partially driven by the artifact (Fig. 7b -Subject B).

In the last decade, the use of iEEG for clinical and research purposes has increased all over the world (Mercier et al., 2022). The new developments and technologies with respect to electrode designs have opened the door for studying brain activity with higher spatial and temporal resolution. The study of human processing and speech synthesis has gained tremendous advantages by data recorded within the brain, as shown by the latest findings in speech decoding (Moses et al., 2021). Nevertheless, it has been recently discovered that acoustic-induced vibrations can affect the quality of iEEG signals recorded speech perception and/or production tasks. This recently described acoustic-induced artifact (Bush et al., 2022; Roussel et al., 2020) overlaps in time and frequency with gamma-band activity and can be assessed by the phase relationship with the played or spoken audio (Bush et al., 2022).

Given that the presence of such artifact can compromise the reliability of speech-related analyses, including those used in BCI development, in the present work we have presented a data-driven spatial-filtering approach for safety removal of speech-induced vibration artifacts in the iEEG.

To raise awareness to the community, we have evaluated the performance of two traditional and commonly used spatial-filtering methods for artifact removal. Through our experiments, we demonstrated that traditional spatial-filtering approaches are not appropriate for denoising the speech artifact (Figs. 4, 5). Specifically, we showed that CAR exacerbates the presence of the speech artifact when it is heterogeneous across recording channels, "subtracting in" the artifact to otherwise unaffected channels (Fig. 5 - CAR). As expected, CAR only showed good performance when the vibration artifact had a homogeneous representation across electrodes (Fig. 7a - CAR). Nevertheless, the vibration artifact is a channel-specific type of noise, and as such, while CAR can be part of traditional preprocessing pipelines, it should not be used without proper data analysis and evaluation of the artifact. For the case of ICA, it can be observed that while the method reduces the coherence of neural signals with the audio (Fig. 5a - ICA), this comes at the cost of a strong degradation of physiological γ-band modulations (Fig. 5d, e - ICA), which ultimately results in a reduction of speech-decoding performance from neural data (Fig. 7b - ICA). Interestingly, when analyzing simulated data, the cosine similarity of PCA embeddings to the ground-truth data was lower after applying ICA than for the contaminated data. This demonstrates that ICA can degrade the underlying neural data more aggressively than the artifact itself. Although artifactual ICA components were selected based on the phase relationship with the audio, this strong data degradation could indicate that ICA fails in identifying and differentiating true gamma modulation from the speech artifact. The frequency band overlap between these two signals compromises the strong independence assumption of ICA, foreseen for a reliable method performance.

In recent years, data-driven spatial-filtering methods have been introduced as re-referencing schemes (Cohen, 2017; de Cheveigné, 2020; Schaworonkow & Voytek, 2021). Such is the case of SSD, an effective method to increase the SNR for narrow-band components, in which not only the central frequency of the band of interest is enhanced but also its harmonics (Schaworonkow & Voytek, 2021): a characteristic that is particularly suitable for denoising speech-induced artifacts. Given that the speech contaminations can be assessed by means of coherence with audio channels (ITPC) (Bush et al., 2022), PCO (another data-driven spatial-filtering method) is ideally suited to decompose the acoustic artifact source from brain recordings. By combining these methods, we developed PCD, a novel data-driven algorithm for denoising brain recordings contaminated with acoustic-induced artifacts (Figs. 1, 2). Here, we show that PCD can retrieve the speech artifact while preserving physiological γ-band modulations at overlapping frequencies (Figs. 3, 4). Through extensive simulations, we show that PCD works for different number of channels (from 3 to 100), different ratios of artifact to neural sources amplitude (from -100 to 30 dB), and across different durations of simulated artifact (from 0.5 to 3.5 s), although it is sensitive to the SSD filter parameters such as the bandwidth around F0 (ΔF) and the filter order (Supplementary Figs. 5, 7). Experiments in realistic scenarios showed that PCD can retrieve the artifactual source regardless of the pitch modulation of the audio signal (Fig. 3 and Supplementary Fig. 11), demonstrating robustness with respect to F0 changes. Numerical experiments at different level of contamination, simulated by changing the AGR value, showed that artifact estimation was always successful, with only a small drop in the phase-locking value at lower levels of the artifact for some audio types (Supplementary Fig. 11 - audio types 2 and 3). The relationship between artifact strength and PCD denoising capability was also shown in real data experiments as assessed by Figure 7. These results show that even in cases in which the artifact presence might be low, the use of PCD can help in data cleaning. Nevertheless, the stronger the artifact is, the better the performance of the method will be (Fig. 7 - PCD).

In addition, experiments in real data showed that PCD can diminish the number of artifactual electrodes without distorting the underlying neural response (Figs. 5, 7). High-gamma profiles show that PCD (but not ICA) preserves physiological gamma responses (Fig. 5d, e), indicating PCD represents a more reliable denoising method for these speech artifacts than other traditional pipelines. This finding was also replicated when inspecting the detected artifactual source (Fig. 6) and when testing each denoising method as a preprocessing step of a deep learning network for consonant detection (Fig. 7b). Here, it is interesting to mention that in the three evaluated cases, regardless of the electrode gain performance, ICA always resulted in the lowest consonant decoding accuracy, suggesting that relevant neural activity that encodes speech has also been removed during the ICA denoising.

PCD was designed by considering the characteristics of the speech-induced vibration artifact. As such, the algorithm first applies SSD to enhance the AGR in a frequency band centered at F0 (SAFB). Given that we do so in a trial-wise manner, the changes of F0 across trials are tracked. In addition, for participants with low F0, at which it first harmonics of their F0 may also be in the high-gamma range, the use of SSD also ensures an enhancement of the spectral peak for the first harmonic of F0 (Schaworonkow & Voytek, 2021). Although not perfectly, the harmonics of F0 can be reduced by means of PCD (see Supplementary Fig. 13). Retaining those components at which the AGR in the SAFB is maximized reduces the dimensionality of the problem enough to proceed with the identification of those components phase locked with the audio. Given that PCO is a nonconvex optimization problem, the lower the dimensionality of the search space is, the faster the method converges. As in the traditional PCA+ ICA pipeline, the use of SSD before PCO prepares the data space to maximize the chances of success in the artifact identification.

Nevertheless, the PCD method has several underlying assumptions. First, it assumes the speech artifact is a narrow-band component around the fundamental frequency of the participant’s voice, which must be estimated for each participant from audio recordings. Second, it assumes the artifact source is common across electrode modalities, thus allowing the combination of all recording modalities which maximizes the chance of extracting the artifact source (Cohen, 2017). While this might be counterintuitive, the artifact is likely due to mechanical vibrations of cables and connectors along the recording chain (Roussel et al., 2020), which can affect different recording modalities in the same way. Third, we assume that the recorded audio is a good estimation of the true artifactual source (i.e., there is no spectral distortion or delay between audio and artifact). While this is a strong assumption, currently there is no better proxy for the artifactual source than the recorded audio. Violations of this assumption may explain the difference between simulated (Figs. 3, 4) and real data performance (Fig. 5).

The current implementation of PCD has several limitations. (i) Performance declines for broadband artifactual sources (Supplementary Fig. 7), a limitation inherited from SSD, which can enhance the power only in narrow frequency bands (Nikulin et al., 2011; Schaworonkow & Voytek, 2021). This limitation could also partially explain the differences in performance between simulated and real data. (ii) The method does not account for systematic distortions between the recorded audio and the speech artifact. Modeling such distortions might be a promising approach to further improve the method's performance in future studies. (iii) PCD is computationally expensive given that it involves solving a nonconvex optimization problem, thus requiring several runs until it converges to the best solution (Waterstraat et al., 2017). As such, PCD for online BCI applications would require further ad-hoc implementations to speed up the optimization process. (iv) The method must be applied in a trial-wise manner given that the artifact's phase relationship with the audio strongly varies across trials (Supplementary Fig. 12). For this reason, the method was fitted to individual speech production epochs to optimally estimate the artifact and then a wider window was applied to avoid introducing discontinuities that could affect subsequent analyses. Applying PCD per trial has the additional advantage of reducing memory requirements and computational cost.

Despite the orders of magnitude greater resolution of intracranial recordings compared with scalp EEG signals, the potential for induced speech artifacts is still a significant concern (Bush et al., 2022; Roussel et al., 2020). Critical to BCI development using intracranial recordings from overt speech production is the recognition that these artifacts can hamper training of robust machine learning models, which may be biased and fail when used for speech-impaired patients in real-life applications. Although the origins of the acoustic-induced artifact are still unknown, care must be taken when recording neural data to diminish the potential sources of the vibration artifacts. Evaluating different reference schemes may help in some cases (Mercier et al., 2022), but ultimately the presence of acoustic-induced artifacts should be always assessed. If contamination is found, signal preprocessing methods should verify not only that artifactual components are removed or attenuated, but also that neural signals are retained. The PCD method may facilitate the building of BCI decoding models upon reliable neural data by avoiding artifactually driven performance. Additionally, PCD was designed specifically to mitigate loss of signal quality we observed using standard preprocessing methodologies for speech artifact denoising. Using simulated data, we showed that when PCD's assumptions are satisfied, the method can completely remove the speech artifact while preserving speech-related γ-band modulations. For real data, for which the underlying assumptions might not be strictly satisfied, PCD still achieves significant reductions of the strength and extent of the speech artifact. The consonant decoder results obtained with PCD-denoised signals illustrate the practicability of this method for potential improvements in the reliability of BCI frameworks. Moreover, although this study focused on denoising speech artifacts in intracranial recordings collected intraoperatively, PCD can in principle be applied to remove any narrow-band artifact from electrophysiological data, including sEEG and EEG, if an estimate of the artifactual source is available. Further research is needed to assess the performance of PCD in different settings such as noninvasive EEG, and for different types of narrow-band artifacts. When the artifact is known to be spread across different electrodes or electrode modalities, we recommend pooling all channels to a common data matrix from which the artifactual components should be extracted.

Data simulation and numerical experiments were conducted in Matlab 2020b. We used custom-based scripts (BML toolbox; https://github.com/Brain-Modulation-Lab/bmlbmlbmlbmlbml based on the Fieldtrip library (https://www.fieldtriptoolbox.org). We used online available implementations of SSD (https://github.com/svendaehne/matlab_SSD) and PCO (https://github.com/neurophysics/PCO_matlab). For measuring the phase-locking values and phase differences, the CircStast toolbox (https://www.jstatsoft.org/article/view/v031i10) was utilized. In addition, Rstudio was used to compute the ITPC. For computing the simulations, we used the publicly available C-optimized implementation (http://senselab.med.yale.edu/ModelDB/ShowModel.asp?model=152539)33. We computed the MSCE and the consistency of the phase difference by using the built-in MATLAB mscohere function and circ_r function in the CircStat Toolbox51, respectively. We used the RainCloud library52 to compare distributions of data (https://github.com/RainCloudPlots/RainCloudPlots#read-the-preprint).

The data that support the findings of this study are available upon reasonable request. A formal data sharing agreement is required.

The code for the PSD algorithm is available online at https://github.com/Brain-Modulation-Lab/PCD

V.P., M.V., and A.B. conceived the computational experiments, and R.M.R. conceived the experimental task design. V.P. conceived and coded the initial algorithm. V.P. and M.V. designed the final algorithm and pipeline. A.B. performed the data curation. M.V. designed and coded the simulated data framework. V.P. and M.V. tested the algorithm against real and simulated data. S.L. and Q.R. did the deep-learning experiments. V.P., M.V., S.L., Q.R., and A.B. contributed tools to curate and analyze data. A.B., N.E.C., and R.M.R. gave insights into analyzing results. The original draft was written by V.P. and M.V. and was revised and edited by all authors. R.M.R. supervised the project.

The authors declare no competing interests.

We would like to thank research participants for their generous contribution of time and effort in the operating room and additional experimenters who acquired and organized the data. This work was funded by the National Institute of Health (BRAIN Initiative), through grants U01NS098969, U01NS117836, and R01NS110424 to R.M.R.

Supplementary material for this article is available with the online version here: https://doi.org/10.1162/imag_a_00301.

Ablin
,
P.
,
Cardoso
,
J.-F.
, &
Gramfort
,
A.
(
2018
).
Faster independent component analysis by preconditioning with Hessian approximations
.
IEEE Transactions on Signal Processing
,
66
(
15
),
4040
4049
. https://doi.org/10.1109/TSP.2018.2844203
Angrick
,
M.
,
Herff
,
C.
,
Mugler
,
E.
,
Tate
,
M. C.
,
Slutzky
,
M. W.
,
Krusienski
,
D. J.
, &
Schultz
,
T.
(
2019
).
Speech synthesis from ECoG using densely connected 3D convolutional neural networks
.
Journal of Neural Engineering
,
16
(
3
),
036019
. https://doi.org/10.1088/1741-2552/ab0c59
Anumanchipalli
,
G. K.
,
Chartier
,
J.
, &
Chang
,
E. F.
(
2019
).
Speech synthesis from neural decoding of spoken sentences
.
Nature
,
568
(
7753
),
493
498
. https://doi.org/10.1038/s41586-019-1119-1
Bush
,
A.
,
Chrabaszcz
,
A.
,
Peterson
,
V.
,
Saravanan
,
V.
,
Dastolfo-Hromack
,
C.
,
Lipski
,
W. J.
, &
Richardson
,
R. M.
(
2022
).
Differentiation of speech-induced artifacts from physiological high gamma activity in intracranial recordings
.
NeuroImage
,
250
,
118962
. https://doi.org/10.1016/j.neuroimage.2022.118962
Canolty
,
R. T.
,
Edwards
,
E.
,
Dalal
,
S. S.
,
Soltani
,
M.
,
Nagarajan
,
S. S.
,
Kirsch
,
H. E.
,
Berger
,
M. S.
,
Barbaro
,
N. M.
, &
Knight
,
R. T.
(
2006
).
High gamma power is phase-locked to theta oscillations in human neocortex
.
Science
,
313
(
5793
),
1626
1628
. https://doi.org/10.1126/science.1128115
Chrabaszcz
,
A.
,
Neumann
,
W.-J.
,
Stretcu
,
O.
,
Lipski
,
W. J.
,
Bush
,
A.
,
Dastolfo-Hromack
,
C. A.
,
Wang
,
D.
,
Crammond
,
D. J.
,
Shaiman
,
S.
,
Dickey
,
M. W.
,
Holt
,
L. L.
,
Turner
,
R. S.
,
Fiez
,
J. A.
, &
Richardson
,
R. M.
(
2019
).
Subthalamic nucleus and sensorimotor cortex activity during speech production
.
The Journal of Neuroscience
,
39
(
14
),
2698
2708
. https://doi.org/10.1523/JNEUROSCI.2842-18.2019
Chung
,
S.
, &
Abbott
,
L. F.
(
2021
).
Neural population geometry: An approach for understanding biological and artificial neural networks
.
Current Opinion in Neurobiology
,
70
,
137
144
. https://doi.org/10.1016/j.conb.2021.10.010
Cohen
,
M. X.
(
2014
).
Analyzing neural time series data: Theory and practice
.
The MIT Press
. ISBN (electronic): 9780262319553. https://doi.org/10.7551/mitpress/9609.001.0001
Cohen
,
M. X.
(
2017
).
Comparison of linear spatial filters for identifying oscillatory activity in multichannel data
.
Journal of Neuroscience Methods
,
278
,
1
12
. https://doi.org/10.1016/j.jneumeth.2016.12.016
Cohen
,
M. X.
(
2022
).
A tutorial on generalized eigendecomposition for denoising, contrast enhancement, and dimension reduction in multichannel electrophysiology
.
NeuroImage
,
247
,
118809
. https://doi.org/10.1016/j.neuroimage.2021.118809
Cunningham
,
J. P.
, &
Yu
,
B. M.
(
2014
).
Dimensionality reduction for large-scale neural recordings
.
Nature Neuroscience
,
17
(
11
),
1500
1509
. https://doi.org/10.1038/nn.3776
Dähne
,
S.
,
Nikulin
,
V. V.
,
Ramírez
,
D.
,
Schreier
,
P. J.
,
Müller
,
K.-R.
, &
Haufe
,
S.
(
2014
).
Finding brain oscillations with power dependencies in neuroimaging data
.
NeuroImage
,
96
,
334
348
. https://doi.org/10.1016/j.neuroimage.2014.03.075
de Cheveigné
,
A.
(
2020
).
ZapLine: A simple and effective method to remove power line artifacts
.
NeuroImage
,
207
,
116356
. https://doi.org/10.1016/j.neuroimage.2019.116356
Fischer
,
P.
,
Lipski
,
W. J.
,
Neumann
,
W.-J.
,
Turner
,
R. S.
,
Fries
,
P.
,
Brown
,
P.
, &
Richardson
,
R. M.
(
2020
).
Movement-related coupling of human subthalamic nucleus spikes to cortical gamma
.
eLife
,
9
,
e51956
. https://doi.org/10.7554/eLife.51956
Gao
,
P.
,
Trautmann
,
E.
,
Yu
,
B.
,
Santhanam
,
G.
,
Ryu
,
S.
,
Shenoy
,
K.
, &
Ganguli
,
S.
(
2017
).
A theory of multineuronal dimensionality, dynamics and measurement
.
bioRxiv
. https://doi.org/10.1101/214262
Haufe
,
S.
,
Dähne
,
S.
, &
Nikulin
,
V. V.
(
2014
).
Dimensionality reduction for the analysis of brain oscillations
.
NeuroImage
,
101
,
583
597
. https://doi.org/10.1016/j.neuroimage.2014.06.073
Hermansky
,
H.
(
1990
).
Perceptual linear predictive (PLP) analysis of speech
.
The Journal of the Acoustical Society of America
,
87
(
4
),
1738
1752
. https://doi.org/10.1121/1.399423
Huang
,
G.
,
Liu
,
Z.
,
Van Der Maaten
,
L.
, &
Weinberger
,
K. Q.
(
2017
).
Densely connected convolutional networks
.
2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
,
2261
2269
. https://doi.org/10.1109/CVPR.2017.243
Jung
,
T. P.
,
Makeig
,
S.
,
Humphries
,
C.
,
Lee
,
T. W.
,
McKeown
,
M. J.
,
Iragui
,
V.
, &
Sejnowski
,
T. J.
(
2000
).
Removing electroencephalographic artifacts by blind source separation
.
Psychophysiology
,
37
(
2
),
163
178
. https://doi.org/10.1111/1469-8986.3720163
Koay
,
S. A.
,
Charles
,
A. S.
,
Thiberge
,
S. Y.
,
Brody
,
C. D.
, &
Tank
,
D. W.
(
2022
).
Sequential and efficient neural-population coding of complex task information
.
Neuron
,
110
(
2
),
328
-
349.e11
. https://doi.org/10.1016/j.neuron.2021.10.020
Lipski
,
W. J.
,
Alhourani
,
A.
,
Pirnia
,
T.
,
Jones
,
P. W.
,
Dastolfo-Hromack
,
C.
,
Helou
,
L. B.
,
Crammond
,
D. J.
,
Shaiman
,
S.
,
Dickey
,
M. W.
,
Holt
,
L. L.
,
Turner
,
R. S.
,
Fiez
,
J. A.
, &
Richardson
,
R. M.
(
2018
).
Subthalamic nucleus neurons differentially encode early and late aspects of speech production
.
The Journal of Neuroscience
,
38
(
24
),
5620
5631
. https://doi.org/10.1523/JNEUROSCI.3480-17.2018
Liu
,
Y.
,
Coon
,
W. G.
,
Pesters
,
A.
,
de Brunner
,
P.
, &
Schalk
,
G.
(
2015
).
The effects of spatial filtering and artifacts on electrocorticographic signals
.
Journal of Neural Engineering
,
12
(
5
),
056008
. https://doi.org/10.1088/1741-2560/12/5/056008
Ma
,
J.
,
Yang
,
B.
,
Qiu
,
W.
,
Li
,
Y.
,
Gao
,
S.
, &
Xia
,
X.
(
2022
).
A large EEG dataset for studying cross-session variability in motor imagery brain-computer interface
.
Scientific Data
,
9
(
1
),
531
. https://doi.org/10.1038/s41597-022-01647-1
Mercier
,
M. R.
,
Dubarry
,
A.-S.
,
Tadel
,
F.
,
Avanzini
,
P.
,
Axmacher
,
N.
,
Cellier
,
D.
,
Vecchio
,
M. D.
,
Hamilton
,
L. S.
,
Hermes
,
D.
,
Kahana
,
M. J.
,
Knight
,
R. T.
,
Llorens
,
A.
,
Megevand
,
P.
,
Melloni
,
L.
,
Miller
,
K. J.
,
Piai
,
V.
,
Puce
,
A.
,
Ramsey
,
N. F.
,
Schwiedrzik
,
C. M.
, …
Oostenveld
,
R.
(
2022
).
Advances in human intracranial electroencephalography research, guidelines and good practices
.
NeuroImage
,
260
,
119438
. https://doi.org/10.1016/j.neuroimage.2022.119438
Michelmann
,
S.
,
Treder
,
M. S.
,
Griffiths
,
B.
,
Kerrén
,
C.
,
Roux
,
F.
,
Wimber
,
M.
,
Rollings
,
D.
,
Sawlani
,
V.
,
Chelvarajah
,
R.
,
Gollwitzer
,
S.
,
Kreiselmeyer
,
G.
,
Hamer
,
H.
,
Bowman
,
H.
,
Staresina
,
B.
, &
Hanslmayr
,
S.
(
2018
).
Data-driven re-referencing of intracranial EEG based on independent component analysis (ICA)
.
Journal of Neuroscience Methods
,
307
,
125
137
. https://doi.org/10.1016/j.jneumeth.2018.06.021
Moses
,
D. A.
,
Metzger
,
S. L.
,
Liu
,
J. R.
,
Anumanchipalli
,
G. K.
,
Makin
,
J. G.
,
Sun
,
P. F.
,
Chartier
,
J.
,
Dougherty
,
M. E.
,
Liu
,
P. M.
,
Abrams
,
G. M.
,
Tu-Chan
,
A.
,
Ganguly
,
K.
, &
Chang
,
E. F.
(
2021
).
Neuroprosthesis for decoding speech in a paralyzed person with anarthria
.
New England Journal of Medicine
,
385
(
3
),
217
227
. https://doi.org/10.1056/NEJMoa2027540
Nelson
,
R.
, &
Davidson
,
L.
(
2002
).
Electrical noise generated from the microphonic effect in capacitors
.
IEEE International Symposium on Electromagnetic Compatibility
,
2
,
855
860
. https://doi.org/10.1109/ISEMC.2002.1032708
Nikulin
,
V. V.
,
Nolte
,
G.
, &
Curio
,
G.
(
2011
).
A novel method for reliable and fast extraction of neuronal EEG/MEG oscillations on the basis of spatio-spectral decomposition
.
NeuroImage
,
55
(
4
),
1528
1535
. https://doi.org/10.1016/j.neuroimage.2011.01.057
Oostenveld
,
R.
,
Fries
,
P.
,
Maris
,
E.
, &
Schoffelen
,
J.-M.
(
2011
).
FieldTrip: Open source software for advanced analysis of MEG, EEG, and invasive electrophysiological data
.
Computational Intelligence and Neuroscience
,
2011
,
1
9
. https://doi.org/10.1155/2011/156869
Porcaro
,
C.
,
Medaglia
,
M. T.
, &
Krott
,
A.
(
2015
).
Removing speech artifacts from electroencephalographic recordings during overt picture naming
.
NeuroImage
,
105
,
171
180
. https://doi.org/10.1016/j.neuroimage.2014.10.049
Proix
,
T.
,
Saa
Delgado
, J.,
Christen
,
A.
,
Martin
,
S.
,
Pasley
,
B. N.
,
Knight
,
R. T.
,
Tian
,
X.
,
Poeppel
,
D.
,
Doyle
,
W. K.
,
Devinsky
,
O.
,
Arnal
,
L. H.
,
Mégevand
,
P.
, &
Giraud
,
A.-L.
(
2022
).
Imagined speech can be decoded from low- and cross-frequency intracranial EEG features
.
Nature Communications
,
13
(
1
),
48
. https://doi.org/10.1038/s41467-021-27725-3
Roussel
,
P.
,
Godais
,
G. L.
,
Bocquelet
,
F.
,
Palma
,
M.
,
Hongjie
,
J.
,
Zhang
,
S.
,
Giraud
,
A.-L.
,
Mégevand
,
P.
,
Miller
,
K.
,
Gehrig
,
J.
,
Kell
,
C.
,
Kahane
,
P.
,
Chabardés
,
S.
, &
Yvert
,
B.
(
2020
).
Observation and assessment of acoustic contamination of electrophysiological brain signals during speech production and sound perception
.
Journal of Neural Engineering
,
17
(
5
),
056028
. https://doi.org/10.1088/1741-2552/abb25e
Sabbagh
,
D.
,
Ablin
,
P.
,
Varoquaux
,
G.
,
Gramfort
,
A.
, &
Engemann
,
D. A.
(
2020
).
Predictive regression modeling with MEG/EEG: From source power to signals and cognitive states
.
NeuroImage
,
222
,
116893
. https://doi.org/10.1016/j.neuroimage.2020.116893
Schaworonkow
,
N.
, &
Voytek
,
B.
(
2021
).
Enhancing oscillations in intracranial electrophysiological recordings with data-driven spatial filters
.
PLoS Computational Biology
,
17
(
8
),
e1009298
. https://doi.org/10.1371/journal.pcbi.1009298
Tekriwal
,
A.
,
Felsen
,
G.
, &
Thompson
,
J. A.
(
2018
).
Modular auditory decision-making behavioral task designed for intraoperative use in humans
.
Journal of Neuroscience Methods
,
304
,
162
167
. https://doi.org/10.1016/j.jneumeth.2018.05.004
Umakantha
,
A.
,
Morina
,
R.
,
Cowley
,
B. R.
,
Snyder
,
A. C.
,
Smith
,
M. A.
, &
Yu
,
B. M.
(
2021
).
Bridging neuronal correlations and dimensionality reduction
.
Neuron
,
109
(
17
),
2740
-
2754.e12
. https://doi.org/10.1016/j.neuron.2021.06.028
Vos
,
D. M.
,
Riès
,
S.
,
Vanderperren
,
K.
,
Vanrumste
,
B.
,
Alario
,
F.-X.
,
Huffel
,
V. S.
, &
Burle
,
B.
(
2010
).
Removal of muscle artifacts from EEG recordings of spoken language production
.
Neuroinformatics
,
8
(
2
),
135
150
. https://doi.org/10.1007/s12021-010-9071-0
Wallstrom
,
G. L.
,
Kass
,
R. E.
,
Miller
,
A.
,
Cohn
,
J. F.
, &
Fox
,
N. A.
(
2004
).
Automatic correction of ocular artifacts in the EEG: A comparison of regression-based and component-based methods
.
International Journal of Psychophysiology
,
53
(
2
),
105
119
. https://doi.org/10.1016/j.ijpsycho.2004.03.007
Waterstraat
,
G.
,
Curio
,
G.
, &
Nikulin
,
V. V.
(
2017
).
On optimal spatial filtering for the detection of phase coupling in multivariate neural recordings
.
NeuroImage
,
157
,
331
340
. https://doi.org/10.1016/j.neuroimage.2017.06.025
Welch
,
P.
(
1967
).
The use of fast Fourier transform for the estimation of power spectra: A method based on time averaging over short, modified periodograms
.
IEEE Transactions on Audio and Electroacoustics
,
15
(
2
),
70
73
. https://doi.org/10.1109/TAU.1967.1161901
Winkler
,
I.
,
Debener
,
S.
,
Muller
,
K.-R.
, &
Tangermann
,
M.
(
2015
).
On the influence of high-pass filtering on ICA-based artifact reduction in EEG-ERP
.
2015 37th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC)
,
4101
4105
. https://doi.org/10.1109/EMBC.2015.7319296

Author notes

*

Equal contribution

This is an open-access article distributed under the terms of the Creative Commons Attribution 4.0 International (CC BY 4.0) license, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. For a full description of the license, please visit https://creativecommons.org/licenses/by/4.0/legalcode.

Supplementary data