FiNN: A toolbox for neurophysiological network analysis

Abstract Recently, neuroscience has seen a shift from localist approaches to network-wide investigations of brain function. Neurophysiological signals across different spatial and temporal scales provide insight into neural communication. However, additional methodological considerations arise when investigating network-wide brain dynamics rather than local effects. Specifically, larger amounts of data, investigated across a higher dimensional space, are necessary. Here, we present FiNN (Find Neurophysiological Networks), a novel toolbox for the analysis of neurophysiological data with a focus on functional and effective connectivity. FiNN provides a wide range of data processing methods and statistical and visualization tools to facilitate inspection of connectivity estimates and the resulting metrics of brain dynamics. The Python toolbox and its documentation are freely available as Supporting Information. We evaluated FiNN against a number of established frameworks on both a conceptual and an implementation level. We found FiNN to require much less processing time and memory than other toolboxes. In addition, FiNN adheres to a design philosophy of easy access and modifiability, while providing efficient data processing implementations. Since the investigation of network-level neural dynamics is experiencing increasing interest, we place FiNN at the disposal of the neuroscientific community as open-source software.


INTRODUCTION
Analyzing dependence between neurophysiological signals, and the definition of large-scale networks, has become an important field of research that greatly enhances our comprehension of communication between distinct neural structures (Bressler & Menon, 2010;Siegel et al., 2012). Neural connectivity in particular is commonly quantified by estimating the degree to which neural oscillations within the same frequency band or across different frequency bands relate to each other (Fries, 2005). These two types of communication modes are known as same-frequency coupling (sfc) and cross-frequency coupling (cfc), respectively (Friston, 2011;Hyafil et al., 2015). Neural communication on a network level can be quantified on the basis of neurophysiological data from a wide variety of data sources including electroencephalography (EEG), magnetoencephalography (MEG), and local field potentials (LFPs) (Engel et al., 2013;Ganzetti & Mantini, 2013). An estimation of connectivity across regions and/or frequencies is generally more computationally intensive than the local synchronization of neural activity within a smaller spatial region (He et al., 2019). While the number of power estimates are linearly related to the number of sensors, the number of potential connectivity candidates increases in a quadratic order of magnitude. Furthermore, in many applications, the amount of neurophysiological data is rising rapidly, partly due to increasingly dense sensor setups and high sampling rates (Brinkmann et al., 2009;Sahasrabuddhe et al., 2020;Song et al., 2015), as well as to more demanding analysis techniques, such as machine learning. A greater number of samples is therefore required (Glaser et al., 2019;Kus et al., 2004).
In recent years, a number of toolboxes include functions for estimating neural connectivity. However, the majority are either heavyweight frameworks that deeply encapsulate data, making modifications or the integration into an existing pipeline difficult and increasing the time required for memory processing; or frameworks with broad usability, but limited functionality for neuroscientific data analysis and interpretation (see Results). Here, we introduce FiNN (Find Neurophysiological Networks), a novel framework written in Python that provides tools to analyze neurophysiological data in a bid to quantify network-wide neural communication within a lightweight and computationally efficient framework. FiNN includes several functions for cleaning and processing neurophysiological data in the context of connectivity. It also includes visualization routines and statistical methods, both of which are useful tools for the analysis of connectivity in large, high-dimensional neurophysiological datasets.
The goal of FiNN is to provide an open-source software toolbox offering easy to use and computationally efficient methods for both users and developers. From a user perspective, it is important that the toolbox is accessible and manageable. We therefore designed the functions in FiNN such that they can be readily used without a deep knowledge of the underlying functionality. In addition to an elaborate documentation on the individual functions, we included detailed information on the internal processing functions to promote modifiability. Furthermore, to facilitate the analysis of large datasets across high-dimensional spaces, memory and CPU consumption were strictly monitored and reduced, thereby achieving a high level of scalability.
Materials and Methods describes the functionality of the toolbox, while Results discusses FiNN in relation to the established toolboxes generally used to analyze neurophysiological data. This is followed by an illustration of its performance in comparison to a selection of established toolboxes in the Discussion and Outlook sections.

Toolbox Documentation and Installation
FiNN (v1.0) is freely available to the research community as open-source code on GitHub at https://github.com/neurophysiological-analysis/FiNN (Scherer, 2022a). It can be downloaded as a zip file containing the last release, or by cloning the git repository. Documentation is available at https://neurophysiological-analysis.github.io/FiNN (Scherer, 2022b) and includes a detailed description of the functions implemented. Furthermore, FiNN contains a demo folder with several trial scripts. The scripts are intended to provide an introductory demonstration of the functions implemented, but can also be used to gain a deeper methodological understanding of the functions. An exemplary workflow utilizing FiNN for the analysis of EEG data is provided in Table 1. A real-world application of FiNN can be observed in Scherer et al. (2022a) where FiNN was used to quantify phase-amplitude coupling between neuronal FiNN: Find Neurophysiological Networks is the framework presented in this paper. It offers a wide range of functions for the analysis of electrophysiological data.
GitHub: An online repository for programming code. sfc: Coupling between signals with the same frequency. cfc: Coupling between signals with different frequencies.
EEG: Head-attached sensors used to record information about cortical activation patterns.

MEG:
Electromagnetism based sensors used to record information about cortical activation patterns.

LFP:
Low-frequency (<300 Hz) electrophysiological activity recorded from within the brain.
spiking and LFP activity. In the scope of this work, FiNN was used for data processing and statistical analysis.

Organization of the FiNN Toolbox
FiNN (v1.0) consists of nine modules: basic processing, artifact rejection, filters, crossfrequency coupling, same-frequency coupling, statistics, visualization, file IO, and miscellaneous. A brief description of each module is provided below. Further details can be found in the online documentation (https://neurophysiological-analysis.github.io/ FiNN; Scherer, 2022b). Recommendations are also provided as to how to apply these for analysis, where applicable.
Basic processing package. For initial processing, the basic processing package offers the common average re-referencing (CAR) and downsampling functions. This function subtracts from the data the part that is shared by all channels, since it is presumably the result of activity at the reference electrode, and hence equal in all channels. This procedure is recommended only when a large number of EEG channels (≥64) is available and these are equally distributed across the head (Nunez, 2010). The downsampling function downsamples a signal from its original sampling frequency to a lower, configurable target sampling frequency. Prior to the application of the downsampling function, it is important for the signal to be low-pass filtered below half the target frequency, as aliasing artifacts may otherwise be introduced into the  (Shannon, 1948(Shannon, , 1949. Furthermore, we recommend that the target sampling frequency be set as low as possible while maintaining a sufficient level of temporal accuracy to accelerate data evaluation further down the line. Artifact rejection package. The artifact rejection package contains two functions to detect artifacts and one function to clean the data. Bad channel identification identifies individual channels in which the power in a predefined frequency range deviates by more than three standard deviations (default value, configurable) from the mean power of all given channels as faulty channels, since spectral power is a useful feature for separating artifacts from EEG (Islam et al., 2016). In an optional second step, the function provides a custom-built dialog window that shows the mean power of each channel ( Figure 1A). Channels marked as faulty are automatically highlighted, and the user can visually confirm the selection and make changes accordingly. The outputs of the bad channel identification function are a list of nonfaulty channels and a list of faulty channels. Furthermore, it is highly recommended that frequency bands that are part of the subsequent analysis for artifact detection be avoided. This advice should be followed to ensure that the results are not biased, as side effects may arise if the power in the frequency band of interest is used in the artifact rejection.
Once the faulty channels have been identified, an optional follow-up step is to restore them using the channel restoration function. This restores faulty channels by averaging the signals from their respective neighbors ( Figure 1B and 1C). A default adjacency matrix is provided to the channel restoration function. In the event of one or more neighboring channels of a faulty channel being faulty themselves, the channels are iteratively restored, commencing at the channel with the most nonfaulty neighbors. Once a channel has been restored during an iteration, it is considered a nonfaulty channel during the next iteration of the reconstruction process. The outlier removal function removes any sample within a data set with a z-score higher than two (default value, configurable). This process is repeated iteratively until only those samples with a z-score of less than two remain or until a minimum sample threshold is reached. This function should be applied only to data from a unique, single state (e.g., within the same subject and same condition). It assumes that provided data is from a single process with a Gaussian distribution. Any data segments which fail this assertion are iteratively removed from the provided data. In the event that this assumption cannot be met, the approach presented here may not apply. The assumption of Gaussianity may be evaluated by either visual inspection or tests for normality.
Filters package. The main function in this module is the implemented finite impulse response (FIR) filter. The filter is implemented via an overlap add approach (Rabiner & Gold, 1975) to speed up the processing procedure, especially for longer signals. Furthermore, the filter implementation provides a rapid and precise operation mode, which initially converts the input data into either 32 bits floats (fast) or 64 bits floats (precise), and subsequently performs all operations with the required precision. The FIR filter can be configured as either a low-pass, highpass, band-pass, or band-stop filter. Furthermore, custom filters can be easily designed and accessed using the functions listed in the main FIR function. Additionally, a wraparound scipy's Butterworth filter is also available.
Cross-frequency coupling package. The cfc package currently implements the following phaseamplitude coupling (PAC) metrics: direct modulation index , modulation index (Tort et al., 2008), phase-locking value (Mormann et al., 2005), and mean vector length (Canolty et al., 2006). A description of the metrics can be found in Table 2. The input signals should already be filtered into the frequency bands of interest, for example, with the FIR filter implemented in the filters package (see Filters package section above).
Same-frequency coupling package. The sfc package currently implements the following metrics: directionalized absolute coherency (Scherer et al., 2022c), magnitude squared coherence (Carter et al., 1973), imaginary coherence (Nolte et al., 2004), weighted phase lag index (Vinck et al., 2011), and phase slope index (Nolte et al., 2008). A description of the metrics can be found in Table 2. It also includes a function for calculating the complex coherency, which can be interpreted as a measure of consistency between two signals with a constant phase shift at a specific frequency (Shaw, 1984). Complex coherency is implemented as an additional function since it is a common precursor of the magnitude squared coherence, imaginary coherence, and others. The metrics implemented in the sfc module accept input data from the time domain, the frequency domain, and the complex coherency domain. Apart from the most commonly used time domain signals, the additional input data formats (frequency domain and complex coherence domain) were added to support modifiability of the implemented functionality, and to allow more efficient processing of multiple connectivity metrics in parallel when the data is already available in the required format.
Statistics package. The statistics package contains the generalized linear mixed models function, which allows to employ generalized linear mixed models in the statistical evaluation of an investigation. The rpy2 package is used to wrap around the lme4 package (Bates et al., 2015) in R (R Core Team, 2021). The implementation presented in FiNN provides a complete and easily interpretable output which comprises both the significance values, indicating how reliable an effect is, and the coefficients, allowing for an estimation of how impactful an observed effect is.

PAC:
Coupling between the phase of a low frequency signal and the amplitude of a high frequency signal.
Visualization package. FiNN offers the topoplot function, with the additional functionality of visualizing various levels of significance. Depending on their statistical significance, individual channels may be marked both before and/or after multiple comparison correction (Figure 2). The topoplot function is built on top of Matplotlib (Hunter, 2007), which is a data visualization library in Python. Since this function is solely responsible for visualization, any feature calculation and/or statistical evaluation has to be performed independently.   The direct modulation index metric is a variant of the modulation index from Tort et al. (2008) and evaluates PAC between two signals. Instead of quantifying PAC using entropy, a sinusoidal slope is fitted to the phase-amplitude histogram. This metric is statically bounded to the interval [0, 1] which allows for the interpretation of absolute PAC changes.
Modulation index (Tort et al., 2008) The modulation index quantifies PAC as the Shannon entropy (Shannon, 1948) of the phaseamplitude histogram between two signals. This metric is not bound to a static interval, and therefore allows only for interpretation of the resulting values in relative changes.
Phase-locking value (Mormann et al., 2005) The phase-locking value evaluates the average phase lag between the phase of the instantaneous amplitude of a first signal and the phase of a second signal. This metric is not bound to a static interval, and therefore allows only for interpretation of the resulting values in relative changes.
Mean vector length (Canolty et al., 2006) The mean vector length metric evaluates whether the amplitude of one signal peaks at a specific phase in a second signal.

Same-frequency coupling
Directional absolute coherency (Scherer et al., 2022c) The directional absolute coherency utilizes the complex coherency function to calculate the magnitude squared coherence (Carter et al., 1973), phase slope index (Nolte et al., 2008), and the imaginary coherence (Nolte et al., 2004). These three metrics are combined to create a single reliable measure of coherence drawing from their individual strengths without incorporating the individual weaknesses. This connectivity metric is directional, can detect volume conduction, and is statically bound to [−1, 1].
Magnitude squared coherence (Carter et al., 1973) Using the complex coherency function, magnitude squared coherence is calculated (Carter et al., 1973) between a source signal and a target signal. To derive a rational number from the complex output of the complex coherence, the squared absolute of the complex valued complex coherence is calculated. This connectivity metric is directionless, cannot detect volume conduction, and is statically bounded to [0, 1].
Imaginary coherence (Nolte et al., 2004) The imaginary coherence is calculated using the complex coherency function by taking the imaginary component of the complex coherence. This connectivity metric is directionless, may detect volume conduction, and its output is not statically bound. The latter renders this metric susceptible to bias if the sensors have been moved (e.g., on different measurement days).
Weighted phase lag index (Vinck et al., 2011) The weighted phase lag index is an extension of the phase lag index (Stam et al., 2007). This function is directionless, can detect volume conduction (approaches zero in vicinity of 0°/180°p hase shift), and is statically bound to [−1, 1].
Phase slope index (Nolte et al., 2008) The phase slope index is calculated using the complex coherency function and is normalized by its standard deviation. Due to the additional need for normalization, more data is required than for the other connectivity estimation methods. This metric is directional, can detect volume conduction (approaches zero in vicinity), and is bound to [−1, 1].
File IO package. When investigating network-wide brain dynamics, larger amounts of data have to be accommodated. FiNN offers the data manager module to handle large file sizes. In Python, pickle (.pkl) is a well-known tool for saving arbitrary variables to disk. However, when loading or writing a file, it has the well-known shortcoming of consuming multiple times more memory than the size of the file itself. This causes memory spikes which may in turn lead to unstable behavior and crashes if not handled carefully. The data manager function circumvents this issue by processing a large file into several smaller ones. This increases the stability of the overall processing pipeline and reduces the likelihood of requiring user intervention, particularly if the pipeline is partially or fully automated.
Miscellaneous package. Due to the large size of neuroscientific datasets, it is beneficial to separate the processing into subprocesses and perform operations in parallel. When using the Python native subprocess pool, however, all subprocesses may receive or send their data at the same time. This behavior is similar to issues of pickle-based file reading and writing (see previous section). The FiNN Toolbox therefore offers the timed pool function, a custom-built subprocess pool that limits the sending or receiving of data to one subprocess at a time. This implementation substantially decreases the risk of memory spikes for a negligible increase in run time. Additionally, the timed pool function has two advantages over the Python native subprocess pool. First, there is an option to delete the original input data immediately after its function is executed, thus releasing additional memory. Second, there is an option to add a configurable lifetime duration to each subprocess. When a subprocess does not return a result within the allotted lifetime duration, it will be restarted. This behavior is particularly suitable in the event that one or more of the subprocesses do not terminate within a reasonable time, for example, when running a randomly initialized optimization loop.

Code Testing and Validation
System tests have been successfully applied to all the functions implemented in FiNN. Functional tests were performed for mathematically more complex components such as the Figure 2. Topography plot of artificial demo data. This figure shows an example of a significantly increased activity above the left motor cortex and significantly decreased activity above the right motor cortex. At the center, the effects were modeled to be significant after multiple comparison correction. This is marked at the electrodes position by using full white dots. The outlying areas of these effects were simulated to be also significant, but only prior to multiple comparison correction, this is indicated by half-filled dots at the electrodes location. Finally, nonsignificant areas are indicated by black dots.
FIR filter and the complex coherence calculation using scipy as a reference implementation. Automated unit tests were implemented in the framework to verify that its behavior is as designated.

Performance Evaluation
We evaluated the performance of the FIR filter against the implementation from MNE (a Python-based framework for the analysis of electrophysiological data, fully defined in the Comparison to Other Frameworks section below) in terms of speed, and the performance of the subprocess pool against both the implementation from joblib used in MNE and the default subprocess pool implemented in the multiprocessing package in Python, in terms of RAM consumption. These two functions were selected as they necessitate efficient implementations (both RAM and CPU time-wise) if evaluation is to be rapid. Biological data recorded from the human subthalamic nucleus was used for the evaluation process (Milosevic et al., 2020). During the three subprocesses, signals of different lengths, varying between 30 seconds and 24 hours (artificially expanded), were filtered using both FIR implementations. The data was appended via repetition as required. All parameters were configured equally to achieve a maximum degree of comparability between the two implementations. The scripts of the above evaluations and comparisons are provided in the toolbox (https://github.com /neurophysiological-analysis/FiNN; Scherer, 2022a).

Comparison to Other Frameworks
A large number of open-source toolboxes have been developed to support the processing and analysis of (neurophysiological) signals. Here, we compare FiNN to a selection of existing toolboxes in terms of scope and computational performance (i.e., processing time and memory consumption). These other toolboxes were selected either on account of their reputation in scientific data analysis (e.g., scipy) or as a result of keyword-based searches (e.g., "Python toolbox EEG"). We selected the search engine startpage.com, as it delivers Google results while anonymizing search requests, and is therefore not subject to a user-specific filter bubble (Salehi et al., 2018). In the following paragraphs, the selection of frameworks with which the FiNN framework is compared is described in more detail.
Two Python-based frameworks that are widely used in general data processing and analysis are scipy (scientific python)  and scikit-learn (Pedregosa et al., 2011). Scipy and scikit-learn are focused on fundamental algorithms (e.g., optimization techniques, basics in digital signal processing, and linear algebra) and machine learning/data analysis, respectively. Currently, scikit-learn provides functionality such as the principal component analysis (Hotelling, 1933;Pearson, 1901) and the independent component analysis (Makeig et al., 1995), which are commonly used for the purpose of artifact identification and removal (Xue et al., 2006).
A major advantage of frameworks such as scipy and scikit-learn is that they are usually heavily modified toward a small memory footprint and limited CPU consumption. Scikit-learn in particular has been optimized for speed, which becomes increasingly important as the complexity of the applied machine learning algorithm increases. Relative to the other frameworks discussed in this paper, scipy may be categorized as a low-level framework. Generally, scipy and scikit-learn offer a great range of functions which can be-and often have to be-highly customized to a problem at hand. This in turn requires in-depth knowledge of the provided methodology in the respective areas. Furthermore, while scipy and scikit-learn excel with regards to the provided functionality (fundamental algorithms and machine learning/data analysis), on their own, these frameworks do not fully cover all needs of electrophysiological data analysis. For example, scipy/scikit-learn lacks some data analysis functions elementary in many experimental, electrophysiology-based neuroscientific analyses, such as connectivity metrics, and proper visualization functions, such as topoplots. Having a different focus, FiNN uses the comparatively low-level functionality provided by scipy (and other toolkits) to provide this functionality for EEG/MEG data analysis.
Another broad Python-based framework specifically developed for the analysis and visualization of neurophysiological data, is MNE (Gramfort et al., 2013). MNE has its roots in the estimation of source-space signals from signal-space signals via EEG or MEG recordings. Unlike scipy and scikit-learn, MNE is a very high level framework that offers a wide range of functions to analyze neurophysiological data. These methods are often configured in predefined ways, making any deviations from the intended application case rather difficult. MNE focuses strongly on mathematical precision. This, in turn, mandates a high memory consumption and slow data processing speed. MNE follows a fundamentally different design philosophy to FiNN. While FiNN aims to provide methods which can be easily integrated into any workflow, MNE is almost exclusively used if the proposed MNE-specific pipelines are implemented for data processing. This results in a lower degree of customizability when using MNE rather than FiNN. Another difference is the focus of optimization. MNE focuses on optimal mathematical accuracy, whereas FiNN allows the user to define the trade-off between high mathematical accuracy, high processing speed, and efficient resource usage, for example, in applications where online analysis is necessary.
NeuroKit2 (Makowski, 2016) is another Python-based framework for the analysis of neuroscientific and neurophysiological data. The framework was designed to work with electrocardiography (ECG), electromyography (EMG), and EEG data. It provides an adequate range of functionality with a different focus to FiNN. While NeuroKit2 aims to be a generalist for any kind of neurophysiological signals, FiNN focuses on EEG/EMG/MEG and LFP signals. This difference in focus can be readily observed in the functionality provided. For instance, NeuroKit2 offers methods to analyze heart rate variability via ECG, or the autocorrelation of EEG signals, while FiNN offers metrics for connectivity between EEG channels, or functionality to visualize a topoplot. Neither connectivity nor topoplot functionality are offered in NeuroKit2.

Performance Comparison Between FiNN and MNE
The FIR filter and the multiprocessing pool of FiNN were compared to the same functions when implemented in MNE. The processing times of the FIR filter as implemented in FiNN and MNE are shown in Figure 3. Both configurations in FiNN were substantially faster than their counterparts in MNE, provided that continuous datasets were of 1-hour length or less. When the fast implementation was chosen in FiNN, this implementation of the FIR filter was always executed more quickly than its MNE counterpart. While the difference was up to 670% more rapid for small datasets, the implementation of FiNN remained approximately ECG: Chest attached sensors used to record information on heart activity patterns.

EMG:
Muscle attached sensors used to record information on local activation patterns.
As shown in Figure 4, the multiprocessing pool implemented in FiNN requires less RAM compared to the default multiprocessing package included in the native Python multiprocessing package and MNE with the multiprocessing package backend.

DISCUSSION
Neuronal information processing takes place on a local level and on a network level (Reimers, 2011;Thorpe, 1989;Zhang et al., 2016). Therefore, to understand how information is processed, it is also crucial to investigate the nonlocal components of information processing. To this end, we present FiNN, a Python-based framework used to find neurophysiological networks in neurophysiological data and subsequently analyze them. FiNN implements both established and novel metrics for the evaluation of the same frequency coupling (Fries, 2005) and cross-frequency coupling (Canolty & Knight, 2010) analyses used to investigate network-level information flows. In particular, FiNN offers implementations for the newly proposed connectivity metrics directional absolute coherence (Scherer et al., 2022c) and direct modulation index .
The amount of data collection in experimental neuroscientific applications is steadily rising with increasingly powerful frequency amplifiers and an ever-growing number of simultaneously recorded channels (Song et al., 2015). Concurrently, the number of features extracted from this raw data also increases (Vaid et al., 2015). For these reasons, efficient data processing is essential. One major benefit of FiNN is its strong optimization toward processing speed and minimal memory consumption. This is reflected not only in the individual functions, which have been optimized to perform as little recalculation as possible, but also in the modules provided. Exemplary modules for this design philosophy are the custom subprocess pool for parallel processing or the data manager for I/O operations, both of which are included in this framework.
Despite the fact that FiNN includes many elements to assist potential users in the analysis of neurophysiological data, modifiability was still a major concern during development. Although the functionality available offers many parameters to calibrate it to a specific application case, edge cases are difficult to identify. The high level of modifiability provided by FiNN should be most helpful in these cases. Since all functionality is implemented in open-source languages, the programming code can be easily amended to cover specific cases encountered in a user's data analysis. Furthermore, not only the uppermost, but any layer of functionality was fully documented to increase support for potential customization efforts.
Although the main focus of FiNN is on the analysis of network-level communication (both same-frequency and cross-frequency) from neurophysiological data, it also provides a full evaluation pipeline for the investigation of local and network-level information processing from EEG, MEG, EMG, and/or LFP-based data. The pipeline offered by FiNN includes modules for data preprocessing such as semi-/fully-automated outlier detection, postprocessing and visualization, and structural functionality to ease parallel processing. Although FiNN was originally developed with EEG and EMG data in mind, the implemented methods are well suited to analyze any kind of neurophysiological signals (e.g., LFP and MEG).

OUTLOOK
Initially, FiNN was developed for in-house evaluation of experimental paradigms generating neurophysiological data. As the number of paradigms and their subsequent analyses increased, so too did the functionality of FiNN. Meanwhile, FiNN has been developed to the extent where it not only supports in-house evaluation of neuroscientific investigation, but also external ones. We are therefore pleased to share FiNN as an open-source software with the neuroscientific community.

ACKNOWLEDGMENTS
We acknowledge support by the Open Access Publishing Fund of the University of Tübingen.