Electroencephalogram (EEG) is a common tool used to understand brain activities. The data are typically obtained by placing electrodes at the surface of the scalp and recording the oscillations of currents passing through the electrodes. These oscillations can sometimes lead to various interpretations, depending on, for example, the subject's health condition, the experiment carried out, the sensitivity of the tools used, or human manipulations. The data obtained over time can be considered a time series. There is evidence in the literature that epilepsy EEG data may be chaotic. Either way, the Embedding Theory in dynamical systems suggests that time series from a complex system could be used to reconstruct its phase space under proper conditions. In this letter, we propose an analysis of epilepsy EEG time series data based on a novel approach dubbed complex geometric structurization. Complex geometric structurization stems from the construction of strange attractors using Embedding Theory from dynamical systems. The complex geometric structures are themselves obtained using a geometry tool, the -shapes from shape analysis. Initial analyses show a proof of concept in that these complex structures capture the expected changes brain in lobes under consideration. Further, a deeper analysis suggests that these complex structures can be used as biomarkers for seizure changes.
A common way to understand brain functions and brain-related diseases is to place electrodes on a subject's head and record the electrical activities that result. These activities, when represented, are often oscillations that change in shape, frequency, and range. They can be analyzed to see if meaningful information can be extracted that gives a clue about the brain state of the subject at a certain time or in a certain region of the brain. The accuracy of the recording highly depends on the instrument used, the region of interest (ROI), the experimenter's experience, the time of the day, the subject's discipline during the recording and many other factors. This is to say that uncertainty in the accuracy of the information recorded is always present. Even when the information is being recorded on the same subject, on the same day, on the same ROI but at different intervals, changes are bound to occur. Electroencephalograms (EEG), a term coined by Berger (1929), are electrical activities recorded on humans or animals that display prominent oscillatory behavior subject to important changes during various behavioral states. These changes show a high degree of nonlinearity in the signals that may be important. Indeed, in the field of biomedical signal processing (analysis of heart rate variability, electrocardiogram, hand tremor, EEG), the presence or absence of nonlinearity often conveys information about the health condition of a subject. In particular, EEG signals are often examined using nonlinearity analysis techniques or by comparing signals that are recorded during different physiological brain states (e.g., epileptic seizure). The differences observed during these analyses can be due to either genuine differences in dynamical properties of the brain or differences in recording parameters. The EEG is often analyzed as times series, and there are many methods for analysis of times series in the literature. The methods can be grouped into two categories: univariate measures and multivariate measures. (For a good review on the topic, see Carney, Myers, & Geyer, 2011.)
Among the methods that have been touted as more efficient at providing insight into the real dynamics of EEG is the famous Embedding Theorem (Takens, 1981). This theorem has been instrumental in understanding how to reconstruct the true dynamics of systems based on the times series measured on these systems. Essentially, this reconstruction theory, in lay terms, suggests that a times series measured over a sufficiently long period of time contains enough information to reconstruct the phase space in which the associated system normally evolves. This shows that there are other intricate subunits that influence the changes observed in the measured variables that are represented in the time series. This theorem was used, for example, by Grassberger and Procaccia (1983) to propose a measure called correlation dimension, which was in turn instrumental to Lehnertz and Elger (1998) in predicting epilepsy seizures. Despite all the methods proposed predicting, there is no agreement on which method constitutes the best tool at extracting the most meaningful information that could be useful for the physician for the prediction of seizure-like diseases. Moreover, with the increasing use of the concepts of chaos and complexity in health sciences, it is becoming more difficult to distinguish their adequate application. For example, there has been evidence of chaos in EEG data (Destexhe & Babloyantz, 1986; Destexhe, 1992; Destexhe, Sepulchre, & Babloyantz, 1998), and since chaotic systems are inherently complicated, they may look complex. Complex systems may also look chaotic. Distinguishing these two notions is important in applications, especially in health sciences. Henceforth, we adopt a terminology along the lines of Rickles, Hawe, and Shiell (2007) for understanding complexity and chaos.
In this letter, we propose a new method for analyzing EEG times series data, which we call complex geometric structurization (CGS). The complex nature of the method stems from the fact that we have multiple subunits interacting together, resulting in a rich collective behavior feeding back into the behavior of individual parts. The method is inspired by the Embedding Theorem (Takens, 1981) for the construction of a geometric structure whose volume can be evaluated from shape analysis technique. The volume of this geometric structure behaves as a key statistic akin to a biomarker for the phenomenon or ROI of interest. Using data-driven approaches to study brain pathologies is now an active field of study due to improvement in life expectancy across the world with its cohorts of problems such as brain disorders (Zheng, Fushing, & Ge, 2019; David, Machado, Ińacio, & Valentin, 2020). Moreover, the push to use data- and methodological-driven approaches to brain pathologies is also evidenced by the numerous grants offered by the National Institutes of Health and private foundations such as the Michael J. Fox Foundation for Parkinson Disease and the Bill and Melinda Gates Foundation, just to name some.
The remainder of the letter is organized as follows. In section 2, we briefly review how to use the Embedding Theorem to construct strange attractors; in section 3, we review important notions of statistical morphometry; in section 4, we introduce the complex geometric structurization method; in section 5, we discuss some applications of the CGS method on real data; and in section 6, we discuss the pros and cons of the proposed method in different contexts.
2 Understanding Embedding Theory
2.1.1 Dynamical Systems
2.1.2 Real Data
There have been many applications of Takens's Embedding Theory with real data. For example, in Fisher, Talathi, Cadotte, and Carney (2009) and Carney et al. (2011), an attractor is constructed from a publicly available (http://www.meb.unibonn.de/epileptologie/science/physik/eegdata.html) epilepsy data set, which we call EDATA for simplicity. The data consist of five sets A, B, C, D, and E. Each contains 100 single-channel EEG segments of 23.6 seconds, each selected after visual inspection for artifacts (such as acoustic and electrical shielding, separation of earth ground for laboratory, interconnectivity of devices on the same phase and ground centrally and locally) and has passed a weak stationarity criterion. Sets A and B were obtained from surface EEG recordings of five healthy subjects with eyes open and closed, respectively. Data were obtained in seizure-free intervals from five patients in the epileptogenic zone for set D and from the hippocampal formation of the opposite hemisphere of the brain for set C. Set E contains seizure activity, selected from all recording sites exhibiting ictal activity. Sets A and B have been recorded extracranially, whereas sets C, D, and E have been recorded intracranially.
3 Statistical Morphometry
Given a set of points in a two- or three-dimensional space, statistical morphology (or morphometrics) amounts to finding an appropriate geometric characterization of the variability of the shape and size of the set of points. This characterization includes volumes and surface area, surface roughness, deviation from convexity, porosity, and permeability. We note that in general, the set of points may not be convex in the strictest sense, so there is a need for a method that relaxes the convexity restriction. The notions of -convexity and -shape represent alternatives that relax the strict convexity assumption. These new -shaped objects are even more flexible than -convex objects in that now controls the spatial scale of the estimator. Note that is a unitless quantity. In fact, as decreases, the -shape shrinks and more space appears among the sample points, whereas as increases, the -shape object converges to the convex hull of the sample. In other words, in -shaped objects, controls the amount of porosity between the sample points. These objects have been used in various fields for the characterization of biological systems (Lafarge, Pateiro-Lopez, Possolo, & Dunkers, 2014; Gardiner, Behnsen, & Brassey, 2018). We mention the pivotal work of Edelsbrunner and Mücke (1994) where the main algorithm for the construction of -shaped objects can be found.
In this example, we illustrate the -shape construction in two and three dimensions, based on a random sample of data taken from the original object .
4 Complex Geometric Structurization
In view of the apparent shape that can be observed from the reconstructed phase space, the question of how to compare these complex structures arises. In other words, this question is related to the question of how to compare strange attractors. Among the methods proposed, we can mention the work of Grassberger and Procaccia (1983) and its many variants. The method we propose stems from the observation that there seems to be a complex geometric structure whose shape changes from healthy patients to seizure-prone ones, so we will rely on the notion of -shape to construct the complex structure related to each situation. We use an algorithm to construct our complex geometric structure (CGS). The motivation comes from the fact that given a time series, if it has been observed long enough, it carries the signature of the original phase-space diagram in which the true system it originated from evolves. If we have many such time series from the same system, we should be able to capture enough information about the true phase-space diagram. The large number of these time series should be enough to eventually eliminate noise or undesired artifacts from the reconstruction. We then expect each reconstructed phase diagram to be topologically equivalent to any other, therefore forming a structure compact enough to be the revolving center of all other reconstructed phase-space diagrams. In general, the dimension of the space in which the true system evolves is greater than three, making it impossible to visualize with the naked eye. Our empirical approach is to consider only a cross-section of the reconstructed phase space in dimension 3 in this case. The reason for this choice is two-fold: we can actually visualize a part of the true phase-space diagram, and we can use the existing method to evaluate the volume of the structure in lower dimensions. This is to say that the volume is preserved in reconstruction. In doing this, we are trying to find a measurable identifier or markup for this group of time series that will vary from group to group and from individual to individual.
4.1 Explanation of the Method
Given time series, we use the Takens reconstruction technique to obtain the embedding dimension (using ACF) and time delay (or sampling interval) for (using the method of false nearest neighbors). Let and . If , then for each time series, we obtain the complex structure CGS. Let . We use the -shape technique to obtain the volume of the in 3D. This step is crucial since we choose to represent the complex structure using only the first three delay coordinates , and even if the actual space has dimension . The motivation for this selection is that the volume of the CGS based on any combination of three coordinates would not be significantly different from any other volume obtained from any other combination of three coordinates. The reasoning behind the choice of is that the volume of the complex structure is bounded by the volume of its convex hull as increases, so we select the smallest alpha that maximizes the volume (see section 4.3.3 for an illustration). This leads to the following algorithm:
Use all the times series collected on patients in different groups.
For each time series, reconstruct a 3D “strange attractor” from the first three delay coordinates.
Use the -shape technique to construct a CGS related to all strange attractors.
Define a measure related to each CGS that that can be statistically analyzed. This could be the volume, the surface area, or the center, for example.
Repeat this procedure for all replicates (if there are any) and thus obtain new data.
For comparison: Use a statistical test to see if there is a significant difference between measures of different groups. This could be a parametric or a nonparametric test.
For Prediction: Use the data obtained for training in machine learning (see Kwessi & Edwards, 2020, preferred to standard generalized linear model models because of the absence of a specific model) and test them on potential new data.
The CGS terminology stems from the fact the structures obtained after reconstruction do not have classical geometric shapes. Rather, their shapes are complex in nature.
We propose to use all time series available for a particular region of the brain, at a specific instant—for example, before seizure, during seizure, or after seizure. It is common to use one time series (see Fisher et al., 2009). We note that repeated measures on the same subject do not yield the same values, and thus in the reconstruction, there could be individual times series whose excursions in the phase space are wider than others, yielding a bigger geometric structure in size and volume. In this case, individual times series could be used, and the subsequent large volumes could be trimmed if necessary. This is particularly important if one is interested, for example, in obtaining a richer data set to analyze.
The embedding dimension often obtained is greater than 3, so in the absence of a visualization mechanism for data in spaces of dimensions higher than 3, we propose to focus on only the first three delay coordinates.
Using the R-package Alphashape3d, measures such as volumes or surface areas can be obtained for the -shape object under consideration.
As we saw in the example above, the choice of is critical in the -shape construction. We propose to select the minimum value of for which the volume is maximized.
Finally, we note that in practice, the experimenter can used data from chaotic or complex systems. One can check for chaos in data by calculating Lyapunov exponents. If the data are sensitive to initial conditions and chaotic, the attractor would be referred to as a “strange attractor” (see Celso, Ott, & Yorke, 1987). Even if the data are sensitive to initial conditions but nonchaotic, we would still keep the same “strange attractor” denomination because in this case (the Lyapunov exponents are nonpositive), the attractor obtained is still strange (see Celso, Ott, Pelikan, & Yorke, 1984; Paladin & Vulpiani, 1987). It also noteworthy to observe that nonchaotic systems that are sensitive to initial conditions are sometimes referred to in the literature as complex systems (see the comparative review between chaotic and complex systems by Rickles et al., 2007).
4.3 Technical Considerations
In this section, we examine some technical considerations to keep in mind when implementing this method.
4.3.1 Choice of
4.3.2 Choice of the Embedding Dimension
We note that the embedding dimension is closely related to the time lag. We are not going to discuss which should be estimated first. However, we note that there are two methods for estimating the time lag and they do not always yield the same embedding dimension. We propose to select the embedding dimension as the smallest value of the two embedding dimensions when they are different.
4.3.3 Choice of the Delay-Coordinates
The variation of the volume as a function of the combination of delay coordinates appears not to be significant. Even for seizure data like subset E, the variation in volume appears to be mild, which is our impetus for conjecturing that a similar observation could be made for other data sets. Obviously we cannot predict what will happen for all data sets; ultimately that endeavor would require a mathematical proof that may go beyond the scope of this letter.
5.1 Analysis of EDATA
Figure 13 is further evidence of what already observable in the raw data in Figures 3 and 4. Sets A to D have comparable amplitude at the time-series level confirmed by the fact that the volumes of CGS are in similar range. The amplitude at the time series level of set E is much higher than that of sets A to D, which is also confirmed by a larger volume at CGS level. This is further evidence that during seizure, these volumes increase substantially when compared to seizure-free intervals. More important, we can extract meaningful statistics from these data as suggested in Table 1.
|.||Min .||.||Median .||.||Max .||Mean .||SD .|
|.||Min .||.||Median .||.||Max .||Mean .||SD .|
Note: The values are of order .
5.2 Analysis of Auditory and Visual Cortex of the Brain under Auditory and Visual Tasks, and Rest
5.2.1 Macrolevel Analysis
In a comparison of the first and second rows, the volumes seem to differ by cortex. Comparing the first, second, and third columns, the differences in volumes of tasks are, respectively, 0.449, 0.5538, and 3.296. These numbers show similarity between the auditory and visual tasks, but they both differ from rest. The increase of volume during rest may not be counterintuitive. In fact, a vast literature links resting brain activities to underlying high cognitive processes such as moral reasoning, self-consciousness, remembering past experiences, or planning for the future (Bruckner, Andrews-Hanna, & Schacter, 2008).
5.2.2 Microlevel Analysis
We note there are 20 patients and three tasks per cortex for a total of 120 observations. When grouping by cortex, we labeled the observations 1 through 60 per cortex. When grouping by task, we labeled the observations 1 through 40, and 1 through 20 when grouping by task within cortex. The numbers shown in the box plots in Figure 17 represent the labels for the corresponding outlying observations. Plots A not suggest a difference between the auditory and visual cortexes, since the notches overlap for the most part. It shows that there are two common outliers (37 and 57) in both cortexes. This means that there are two patients whose brain activities are more pronounced than others in both cortexes, causing wide excursions in the phase space and therefore large volumes of their CGS. Plots B and C suggest a similarity between the tasks since all the notches do overlap. Plots D suggest that the difference observed in box plot A is due to the difference between the auditory tasks in both cortexes. Plots B basically confirm the observations in plots A that all the tasks are similar. A more in-depth analysis is needed to make more meaningful conclusions.
In Tables 2 and 3, we report the intrinsic discrepancy, that is, the minimum between and , where is defined as the Kullback-Leibler (KL) directed divergences between the probability distributions F and G (see section 8.4). So the smaller the intrinsic discrepancy, the bigger the difference between and .
|.||Auditory Cortex .||Visual Cortex .||-Value .|
|.||Auditory Cortex .||Visual Cortex .||-Value .|
|.||Auditory Task .||Visual Task .||Rest .||-Value .|
|.||Auditory Task .||Visual Task .||Rest .||-Value .|
To make a meaningful use of Tables 2 and 3, a reference point is needed. However, the densities above allow us to hypothesize that the volumes are not normally distributed. A nonparametric test such as a Wilcoxon-Mann-Whitney would be necessary to compare, for example, the volumes of the two cortexes, controlling for the tasks performed. Such a test yields a -value of 0.021 small enough to suggest a statistically significant difference between the CGS volumes for auditory and visual cortexes. It also suggests that the intrinsic discrepancy 0.160 obtained above is a sign of a statistical significant difference between the two cortexes. Likewise, the nonparametric Kruskall-Wallis test is used to compare the tasks, and it yields a -value of 0.451, which is large, suggesting that the volumes of the CGS are not significantly different. The -values for the pairwise Wilcoxon rank sum test (in parentheses) with Bonferroni correction are also large. This could be further interpreted as the brain activities of individuals during the auditory task are not significantly different from their brain activities during the visual task. Now, whether these differences or lack thereof are corroborated at the biological level remains to be proved.
5.2.3 Comparison with a Cross-Correlation Function
We offer some items for discussion as a result of our findings:
The method we are proposing has the potential to discriminate brain activities in different parts of the brain. Further, changes in volume could be an indication of changes of activity level in the part of the brain under study. This predictive potential could be used to predict or monitor potential brain disorder.
This method has the potential to be extended to experiments in which EEG data are suitable, such as epilepsy, Alzheimer's, attention deficit disorder, learning disabilities, anxiety disorders, fetal alcohol syndrome, autism, chronic pain, insomnia, and dyslexia, for example (Kumar & Bhuvaneswari, 2012).
Preliminary analysis of local field potential (LFP) data suggests that this method can be extended beyond EEG. In fact LFP differs from EEG in that the electrodes are inserted in the brain tissue rather than at the surface of the scalp. However, we did not obtain the authorization to release the results on the LFP data at this time.
There have been studies suggesting that group averaging of neuroimaging data is not clinically relevant. One possible explanation for this issue is the lack of focus on individuals. The method we propose can remedy that by providing a measure that is individual focused.
We have not addressed the weak stationarity of the EEG data used here and how much stationarity, or lack thereof, plays a role in the method we are proposing. The data set EDATA was checked for weak stationarity, but the data collected for the Brain Core Initiative were too few for stationarity. This overall is an issue in multiple studies where samples tend to be small.
One of the drawbacks of the “distance” above is that its definition may change, and therefore this could affect the interpretation of results. In fact, given two times series and , the distance could also be defined as or by with , where is the time-shifted version of , is the time lag separating the two times series and , and are the respective means of the times series and . (See Arbabshirani, Havlicek, Kiehl, Pearlson, & Calhoun, 2012, for more information.)
In this letter, we have proposed a method called complex geometric structurization to help analyze EEG signals. The method is based on the Embedding Theory of dynamical systems and shape analysis. The method works well on epilepsy data. The method has the ability to discriminate individual EEG and also discriminates groups of EEG signals. More important, the method performs better when compared to cross-correlation function. It is important to note that the results and the method, though empirical, offer an important proof of concept relating time series and the volume of their reconstructed complex in three dimensions that need to be explored further by validating them with more solid mathematical concepts. If successful, this method could be an important addition to the literature of EEG signal analysis and can be used to explore other brain pathologies such as sleep disorder, schizophrenia, or Alzheimer's. Its discrimination potential also can be used to improve our understanding of functional brain connectivity in general.
A.1 -convex and -shape
The quantity in equation A.1 is called the -convex hull of . Now suppose we have a sample obtained from an object in ; then an estimator of is if is assumed -convex. Estimators of -convex objects are constructed from a 2D arc of circles in and 3D spherical caps in.
Let be -convex. An -shaped estimator of is an estimator of its -convex hull obtained by approximating an arc of circles with polygonal curves in 2D and spherical caps with polyhedral surfaces in 3D.
A.2 Estimating the Time Delay and Embedding Dimension
A.2.1 Estimating the Time Delay
A.3 Estimating the Embedding Dimension
The most popular method for estimating the embedding dimension is the so-called false nearest neighbors technique (Kennel, Brown, & Abarbanel, 1992).
A.4 Kullback-Leibler Divergence Function
A.5 Cross-Correlation Function
Kristina Visscher in the Department of Neurobiology at the University of Alabama at Birmingham generously allowed us to use the data collected in her laboratory. E.A.K. acknowledges the administrative and technical support of Corine M. Kwessi.