Abstract

Unsupervised analysis of the dynamics (nonstationarity) of functional brain connectivity during rest has recently received a lot of attention in the neuroimaging and neuroengineering communities. Most studies have used functional magnetic resonance imaging, but electroencephalography (EEG) and magnetoencephalography (MEG) also hold great promise for analyzing nonstationary functional connectivity with high temporal resolution. Previous EEG/MEG analyses divided the problem into two consecutive stages: the separation of neural sources and then the connectivity analysis of the separated sources. Such nonoptimal division into two stages may bias the result because of the different prior assumptions made about the data in the two stages. We propose a unified method for separating EEG/MEG sources and learning their functional connectivity (coactivation) patterns. We combine blind source separation (BSS) with unsupervised clustering of the activity levels of the sources in a single probabilistic model. A BSS is performed on the Hilbert transforms of band-limited EEG/MEG signals, and coactivation patterns are learned by a mixture model of source envelopes. Simulation studies show that the unified approach often outperforms conventional two-stage methods, indicating further the benefit of using Hilbert transforms to deal with oscillatory sources. Experiments on resting-state EEG data, acquired in conjunction with a cued motor imagery or nonimagery task, also show that the states (clusters) obtained by the proposed method often correlate better with physiologically meaningful quantities than those obtained by a two-stage method.

1  Introduction

Unsupervised machine learning techniques play a fundamental role in the analysis of spontaneous (resting-state) neuroimaging signals by exploring the intrinsic statistical structures of such signals without relying on extrinsic covariates about tasks or stimulation protocols. The structures or features obtained can then be examined based on neurophysiological knowledge, often using the features in a group comparison or, possibly, by finding similar structures in other signals already associated with tasks or stimuli.

In recent years, there has been growing interest in exploring the patterns and dynamics of resting-state functional brain connectivity (Friston, 1994) based on unsupervised signal analyses. To find patterns in nonstationary functional connectivity, most studies have relied on such standard techniques as independent component analysis (ICA) (Brookes et al., 2011; Smith et al., 2012), principal component analysis (PCA) (Leonardi et al., 2013), and K-means clustering (Liu, Chang, & Duyn, 2013; Allen et al., 2014). New unsupervised analysis methods have also been developed (e.g., Haufe, Tomioka, Nolte, Müller, & Kawanabe, 2010; Zhang & Hyvärinen, 2010; Hyvärinen, Zhang, Shimizu, & Hoyer, 2010; Hirayama & Hyvärinen, 2012; Ramkumar, Parkkonen, Hari, & Hyvärinen, 2012; Ramkumar, Parkkonen, & Hyvärinen, 2014; Dähne et al., 2014) to incorporate the specific nature of neuroimaging signals.

Such methods for analyzing functional brain connectivity, based primarily on signals’ own statistics, are potentially very useful not only for neuroscientific investigations but also for neuroengineering applications. For example, a key challenge in an emerging new direction in brain-computer interface (BCI) research is to covertly acquire users’ unobserved states during everyday life behaviors (Zander & Kothe, 2011; Lance, Kerick, Ries, Oie, & McDowell, 2012). Since brain activity cannot be very well controlled in everyday life situations, no reliable class labels are available for discriminating the unobservable states of users. Such difficult data could be tackled if unsupervised analysis discovered connectivity patterns reflecting the user’s cognitive states. Unsupervised connectivity analysis might also shed light on the neurophysiological basis of the BCI paradigms commonly used so far, such as those based on motor imagery, that is, imagining body movements (Grosse-Wentrup, 2009, 2011).

Motivated by such potential applications, in this article we focus on developing an unsupervised analysis method for finding connectivity-related signal features from electroencephalography (EEG); our method may also be readily applied to magnetoencephalography (MEG) because of its fundamental similarity. EEG/MEG’s high temporal resolution is particularly useful for analyzing nonstationary connectivity, as compared to functional magnetic resonance imaging (fMRI), which is used in most connectivity studies.

The analysis of functional connectivity patterns in EEG/MEG, however, is not straightforward because the neural sources are mixed by volume conduction (or field spread) into sensor signals. Two-stage analysis has been conventionally performed by first separating the neural sources from the given sensor signals and analyzing the connectivity patterns based on those separated sources. In neuroimaging literature, electromagnetic inverse problems are often solved to separate (estimate) the activity of dipolar sources on the cortical grid, with an additional effort of physical forward modeling. In exploratory signal analysis related to BCI, blind source separation (BSS) methods (including ICA) are especially useful, since they greatly simplify the interpretation of the results by decomposing the data into components, and the inverse problem can be solved to localize the obtained components afterward (Hironaga & Ioannides, 2007; Doesburg & Ward, 2009). Note that these components are actually called “sources” in BSS literature, and a component can be an integration of multiple correlated dipolar sources.

The problem is that conventional two-stage analysis (i.e., first source separation and then connectivity analysis) is “neither a principled nor an optimal solution to the overall problem” (Makeig et al., 2012). Source separation methods rely on specific prior assumptions about the sources, which are not necessarily consistent with what the connectivity analysis assumes about them; such inconsistency between prior assumptions might bias the results. A more desirable unified treatment would be obtained by extending the conventional generative (forward) model used for source separation by including prior assumptions about the sources that are consistent with the connectivity analysis. Typically, connectivity analysis can be formulated as learning of a specific parametric model of the sources, and both layers of the model may be learned simultaneously, unified by the principle of statistical parameter estimation.

Here, we present a unified method for BSS and analyzing functional connectivity patterns in EEG/MEG sources, based on a novel two-layer extension of the conventional BSS/ICA generative model. In line with previous resting-state MEG studies (de Pasquale et al., 2010; Brookes et al., 2011; Ramkumar et al., 2014), we are particularly interested in finding coherent (frequently occurring) patterns of activity levels (coactivations) of oscillatory sources in any frequency band of interest. The connectivities are based on envelopes and ignore phase information, but our model is rather different from power-to-power coherences.

To properly model the source envelopes, our model uses a complex-valued formulation of BSS based on the Hilbert transform, a key departure from related extensions of ICA based on modeling real-valued data (Hyvärinen & Hoyer, 2000; Hyvärinen, Hoyer, & Inki, 2001; Valpola, Harva, & Karhunen, 2004; Karklin & Lewicki, 2005; Kawanabe & Müller, 2005; Osindero, Welling, & Hinton, 2006; Köster & Hyvärinen, 2010; Haufe et al., 2010; Hirayama & Hyvärinen, 2012). Another important novelty here is using a finite mixture model of sources, in which they are assumed to exhibit different coactivation patterns corresponding to a finite number of unobserved “brain states.” This corresponds to performing unsupervised clustering on the coactivations of the sources, inspired by the use of K-means clustering in previous resting-state fMRI studies (Liu et al., 2013; Allen et al., 2014). Due to the simplicity of the mixture model, our two-layer model is tractable, unlike previous two-layer models that are often difficult to learn. Our entire model can be readily estimated (optimized) by the maximum likelihood method without resorting to any approximations.

The rest of this article is organized as follows. First, in section 2, we present our proposed method based on a novel two-layer extension of the generative BSS/ICA model, the latent coactivity mixture model (LCMM). Then we provide simulation studies in section 3 and real EEG data analysis in section 4 to validate the unified approach. Finally, we discuss the results and open issues in section 5. Preliminary results were presented at the International Conference of the IEEE Engineering in Medicine and Biology Society (Hirayama, Ogawa, & Hyvärinen, 2014).

2  Latent Coactivity Mixture Model

2.1  Background: Blind Source Separation

Before introducing our new model, we discuss the generative (forward) model conventionally used for the blind separation of EEG/MEG sources. Let be a multivariate EEG/MEG signal, sampled at discrete time points indexed by , which is assumed to have already been (bandpass) filtered so that each is limited to a certain frequency band of interest. Sensor signal vector is then assumed to follow a linear generative model given by
formula
2.1
where denotes the vector of the source signals and is called the mixing matrix, which is assumed to be nonsingular so that demixing matrix exists. Both and are unknown and estimated from data in the BSS setting.

Note that in equation 2.1, the number of sources is assumed to equal (effective) dimensionality d of the sensor signal as a fundamental setting in a standard ICA; this greatly simplifies the mathematical treatment. In practice, d can be selected to be smaller than the original number of sensor channels, typically by discarding the ineffective dimensions (with too small variances) using PCA. It should also be noted that each source (or component) does not necessarily correspond to any single electrical dipole; instead, each may describe the total effect of multiple correlated dipolar activities.

To solve the BSS problem, we need to make further assumptions about the statistical properties of the sources based on prior knowledge. Independent component analysis (ICA) typically assumes that d sources are non-gaussian and mutually independent (Hyvärinen, Karhunen, & Oja, 2001), which theoretically guarantees the identifiability of both and ’s, up to the scaling and permutation of the sources. However, the independence assumption might lead to a solution that is weakly functionally connected (i.e., weakly statistically dependent) even when the true sources are strongly dependent, which is not consistent with the goal of connectivity analysis. This motivated us to develop an appropriate model of functionally connected (dependent) sources.

2.2  Definition of Latent Coactivity Mixture Model

Our main interest here is modeling the envelopes (i.e., amplitudes) of narrowband source signals  (Onton and Makeig, 2009; Zhang & Hyvärinen, 2010; Brookes et al., 2011) for which the “real-valued” BSS model of equation 2.1 is not convenient. Envelopes can be modeled more easily with complex analytic signals  (Schreier & Scharf, 2010), defined as
formula
2.2
where denotes the Hilbert transform of signal and i is an imaginary unit. The envelope of is given by the modulus of . This simple algebraic dependence of the envelope on greatly simplifies the developments below.
We thus formulate a “complex-valued” BSS with a similar transformation of sensor signal vector ,
formula
2.3
where complex sensor signal can be directly computed from the original one by . Here, sources are further assumed to be centered (i.e., ) without loss of generality by always subtracting the (sample) mean from . Both mixing matrix and complex-valued source signal are again unobserved and estimated from the data.

Note that mixing matrix in equation 2.3 is identical to the original one in equation 2.1 because of the Hilbert transform’s linearity. This ensures that the 's columns can be interpreted directly as defining spatial topographies in the original sensor space. For simplicity, we constrain to be real valued, while a complex-valued could also be straightforwardly used, which might be useful to deal with sources synchronized in different phases  (Hyvärinen, Ramkumar, Parkkonen, & Hari, 2010).

We next define a coactivation (connectivity) structure between sources . The fundamental assumption here is that a system generating the data can be in a finite number of different states, corresponding to different patterns of source amplitudes. Given the state at time point t, sources are generated based on a multivariate Student-t distribution (specified below), which implements the average source amplitudes specific to that state and generates random phases.

Thus, our model of connectivities is not based on explicitly measuring some form of correlations between the sources or their envelopes, as is typically done in electrophysiology. Instead, we characterize the interactions of the sources by dividing their joint activity into a number of typical patterns of envelopes that intuitively express the idea that certain sources tend to be coactivated. Such coactivation does imply correlations of envelopes (i.e., power-to-power coherence) but provides a more detailed analysis of the coactivation than merely computing correlations.

The proposed latent coactivity mixture model (LCMM) is thus summarized as a two-layer generative model of complex sensor signal vector as follows:

  1. At each time point t, the system generating the data takes one of a finite number of different states (clusters) indexed by , according to multinomial probability distribution with cluster probabilities , where and .

  2. Given that the system belongs to the kth state at time t, source vector is specifically generated by a complex multivariate Student-t distribution (Schreier & Scharf, 2010) with circular (see below) and mutually uncorrelated sources; they have state-conditional variances or expected powers (squared amplitudes), given by
    formula
    where denotes the conditional expectation given the kth state, is an integer called the degrees of freedom, and nonnegative vector , called the coactivation pattern, specifies the expected levels of the source envelopes.1
  3. Complex sensor signal is given as a linear instantaneous mixture of with unknown mixing matrix , as in equation 2.3, common to each state k.

The N Hilbert-transformed sensor signal vectors, , are simply assumed to be independent and identically distributed (i.i.d.), as is commonly done in many ICA methods. Such a simplification is mainly done for purposes of mathematical and computational tractability, but it could be relaxed by modeling the autocorrelation structures of the sources (in future work).

The circularity of the sources means that the phase of each source is distributed uniformly and independent of its amplitude. Our method focuses on modeling (analyzing) an amplitude-to-amplitude type of connectivity (coactivation), ignoring phase-to-phase or phase-to-amplitude types of connectivity. In the circular case, the probability density function (pdf) of complex multivariate Student-t distribution  (Schreier & Scharf, 2010) with scatter matrix and degrees of freedom is given by
formula
2.5
where denotes the Hermitian transpose and denotes the gamma function. Source vector in LCMM has a state-conditional pdf given by . Figures 1a and 1b illustrate the conditional pdf for a bivariate case.
Figure 1:

Illustration of complex multivariate Student-t distribution for two circular complex sources. (a, b) Bivariate Student-t pdf with and ; (a) illustrates pairwise marginal density on real and imaginary parts of single source , where spherical equiprobability contours imply circularity; pairwise marginals for the four combinations of real or imaginary parts between and have the same form illustrated in (b) (black solid lines), where two other examples are also shown (gray solid lines) with and . (c) Mixture density of the three state-conditioned pdfs in (b), with the state marginalized out with .

Figure 1:

Illustration of complex multivariate Student-t distribution for two circular complex sources. (a, b) Bivariate Student-t pdf with and ; (a) illustrates pairwise marginal density on real and imaginary parts of single source , where spherical equiprobability contours imply circularity; pairwise marginals for the four combinations of real or imaginary parts between and have the same form illustrated in (b) (black solid lines), where two other examples are also shown (gray solid lines) with and . (c) Mixture density of the three state-conditioned pdfs in (b), with the state marginalized out with .

The particular choice of the Student-t model is mainly motivated by its robustness to outliers in the estimation of (co)variances (i.e., coactivation patterns), as has been thoroughly studied in the literature (Ollila & Koivunen, 2003; Mahot, Pascal, Forster, & Ovarlez, 2013).2 Scatter matrix is proportional to the covariance matrix if it exists, that is, for , and maximum likelihood estimate (MLE) of is often used as a robust alternative of the sample covariance matrix even with . Diagonal scatter matrix in LCMM implies equation 2.4, and the estimate of serves as a robust estimator of the state-conditional variances. Note that even if the variance does not exist when , can have a finite MLE, giving a more general scale parameter estimate. Typically, since EEG/MEG signals contain a large number of noises or artifacts from outside the brain, robustness is a desirable property in practice. Our model, however, includes no explicit noise term in the generative model of equation 2.1, which is mainly for computational simplicity, as in standard ICA methods.

Graphical representations of the dependency structure in LCMM are given in Figure 2. The sources in LCMM are not independent of each other (see Figure 2a; see also Figure 1c), that is, , in contrast to standard complex-valued BSS/ICA models. Unlike the related generalizations of ICA modeling energy correlations, the sources are even conditionally dependent (see Figure 2b), given the higher-order latent variables (here, state k). This is due to our choice of the Student-t model instead of a gaussian model. In the limit of , the pdf is reduced to a complex gaussian, and the sources become conditionally independent (see Figure 2c), but they remain dependent over the whole data set (with state k marginalized out).

Figure 2:

Graphical representations of dependency structure in LCMM.

Figure 2:

Graphical representations of dependency structure in LCMM.

2.3  Parameter Estimation

According to LCMM’s generative model, we obtain the pdf of complex sensor signal by the transformation of random variables from to :
formula
2.6
where we explicitly indicate the model parameters on the left-hand side (after the semicolon) and and collect the coactivation patterns and the state probabilities, respectively. Note that in equation 2.6, the determinant is squared because the same transformation is required for both real and imaginary parts. The pdf can also be expressed as
formula
2.7
The model is thus a constrained form of the mixture of multivariate Student-t distributions, in which the state-dependent scatter matrices are tied with common parameter . It can be readily seen from equation 2.7 that the probability density does not change if (or ) and are simultaneously replaced by (or ) and , respectively, where is any nonsingular diagonal matrix. This is the well-known scaling ambiguity inherent in BSS/ICA. We fix the scale by setting every column of to have unit Euclidean norm.

The parameters of interest in LCMM can be easily estimated by the maximum likelihood method: maximizing with respect to the model parameters when the latent variables are marginalized out. Importantly, this does not require any approximation, in contrast to other hierarchical BSS models. For simplicity, we fix degrees of freedom to a constant (we set in sections 3 and 4 below; this choice leads to an infinite variance and strong robustness) and learn the other parameters since typically has only a small effect on the final solution (at least if set relatively small for ensuring robustness). The detailed form of the objective function and its derivatives are given in the appendix.

We propose to use a quasi-Newton method3 to efficiently optimize the likelihood using reparameterization (Salakhutdinov, Roweis, & Ghahramani, 2003) given by
formula
2.8
so that the s automatically satisfy constraints and . We do not constrain the scaling of or during the optimization, but rescale them after obtaining the final solution so that every column of has a unit norm. Any standard optimization software can be used for solving unconstrained optimization on new parameter set . We found that this quasi-Newton method is more efficient than the well-known expectation-maximization method (simulations not shown).
After estimating the model parameters, the (real-valued) sources are separated by or by taking the real part of for any or . This is possible because the demixing matrix is the same for both the original and the Hilbert-transformed data. The states are inferred by computing their posterior probability:
formula
2.9
The maximum a posteriori (MAP) estimate of the state is given by taking the state that maximizes this posterior, which gives the final result of the model-based clustering of the source coactivations.

2.4  Choosing the Number of States by BIC

Another important issue in learning mixture models is the choice of the number K of states or clusters. We use the Bayesian information criterion (BIC) to select the best K minimizing
formula
2.10
where denotes the maximum of the likelihood obtained numerically, as explained above, and the number of free parameters M in LCMM is given by . The use of BIC for the model order selection in mixture models has been extensively studied in statistics. There are theoretical results regarding statistical consistency (Keribin, 2000), and BIC often exhibits state-of-the-art performance, as shown empirically  (Steele & Raftery, 2010).

2.5  Relation to Previous Two-Layer Extensions of BSS/ICA

Many previous attempts have been made to extend BSS/ICA based on equation 2.1, particularly to deal with the residual dependency structures between power or magnitudes of the sources often observed in ICA results (Hyvärinen & Hoyer, 2000; Hyvärinen, Hoyer et al. 2001; Valpola et al., 2004; Karklin & Lewicki, 2005; Kawanabe & Müller, 2005; Zhang & Hyvärinen, 2010; Hirayama & Hyvärinen, 2012); these works were not necessarily concerned with EEG/MEG.

Initial developments made fixed prior assumptions about the dependencies of the sources without estimating any parameters in the source model (Hyvärinen & Hoyer, 2000). However, we are concerned with models in which the parameters in the second layer (i.e., connectivity patterns) are estimated as well (Karklin & Lewicki, 2005; Osindero et al., 2006; Köster & Hyvärinen, 2010). Typically these models are based on a (generalized) linear model of squared sources , as already proposed by Hyvärinen, Hoyer et al. (2001):
formula
2.11
where and are the second-layer mixing matrix and source vector and is a strictly monotonic function that is applied element-wise. Given variance , each source is then usually assumed to follow a certain probability distribution: either gaussian (Hyvärinen, Hoyer et al., 2001; Valpola et al., 2004) or nongaussian (Karklin & Lewicki, 2005). The real-valued linear mixing (see equation 2.1) finally produces observed signals .

The following are the main problems with these previous models for EEG/MEG connectivity analysis: (1) they do not properly model the envelopes of the oscillatory sources, and (2) the likelihood is often intractable without resorting to approximations, because the continuous latent variable needs to be integrated out. A recent study (Cadieu & Olshausen, 2012) on the statistical modeling of natural movies actually addressed the first issue without resolving the second one.

In fact, a close connection between LCMM and equation 2.11 is implied by the well-known fact that Student-t distribution belongs to the gaussian scale-mixture family (see Bishop, 2006, for a real case). We can equivalently reformulate our model by assuming that is a complex (circular) gaussian conditionally on state k and introducing a scaling variable that follows an inverse gamma distribution. The conditional variance can then be written
formula
2.12
where denotes the squared envelopes. Now consider a simplified complex-valued counterpart of the previously used equation 2.11 given by
formula
2.13
The LCMM in equation 2.12 has essentially the same form if we can constrain it so that only a single variable uk takes a nonzero value at a time.

Hence, although closely related, LCMM has notable differences from previous energy-correlation models. First, it models the (squared) envelopes instead of the (squared) magnitudes , thus properly dealing with oscillatory sources (together with Hilbert transform). Second, the model is tractable and fast to learn because it has only one discrete latent variable instead of multiple continuous ones.

2.6  Real-Valued Variant of LCMM

To separately evaluate the effect of using complex-valued formulation instead of a real-valued kind, we also examine a real-valued counterpart of LCMM in our simulation study below. The model can also be seen as a simplification of a previous two-layer BSS/ICA such that only a single variable uk in equation 2.11 takes a nonzero value at a time, where nonlinearity is set to an identity function, implying that . More specifically, the real-valued LCMM is given as a constrained form of a mixture of multivariate Student-t distributions and its pdf is given by
formula
2.14
where the Student-t pdf for real vector is generally given by
formula
2.15
Again, the pdf can be expressed using as
formula
2.16
and the same quasi-Newton optimization is used for estimating (specifically with in sections 3 and 4). Finally, the state posterior is also given by
formula
2.17

3  Simulations on Artificial Data

We next quantitatively compare the proposed method with existing approaches for the analysis of EEG/MEG data. We start with simulated EEG/MEG data so that the ground truth is known and can be systematically controlled; we provide a real EEG analysis in section 4. The goal of the simulation study is to validate the two key ideas of the proposed method: the complex-valued formulation and the unified estimation principle of the two stages of analysis.

3.1  Methods

The simulated EEG/MEG signals were created as follows. First, we applied a bandpass filter ( Hz) to 10 gaussian temporally white signals sampled virtually at Hz to simulate the alpha-range activities. Then these oscillatory signals were jointly amplitude-modulated block-wise in every 2 second window ( samples) to show the state-dependent coactivation patterns. For this purpose, we first created vectors whose entries were independently sampled from a standard gaussian distribution but set to zero if negative; this was repeated until at least two entries satisfied . Then the jth oscillatory signal was multiplied by with state k randomly chosen for each block with uniform probability (). The sources with nonzero bjk’s were actually coactivated, while very small activity levels were avoided with this procedure.

So that the amplitudes of these coactivated sources have nonzero (positive) correlations within each state, they were further modulated globally by a noisy sinusoidal signal, generated by sampling from a gamma distribution 4 with where Hz and phase was randomly selected. Then gaussian white noise was added, where the noise variance was set to have a given value of a signal-to-noise ratio (SNR), defined as the ratio of the variance. Figure 3 illustrates an example of 10 sources before and after adding the noise. Finally, sources were linearly mixed into the same number (i.e., ) of sensor signals with square mixing matrix generated randomly from the standard gaussian distribution.

Figure 3:

Simulated data. Example of simulated sources and states. (a) Ten simulated sources before (black) and after (gray) adding gaussian noise with a signal-to-noise ratio (SNR) of 0 dB. Clean sources (i.e., before adding noise) exhibit one of five different coactivation patterns in each block (separated by vertical dotted lines) with their amplitudes comodulated within each block (see the text for details). The top horizontal bars and numbers indicate five states , corresponding to five coactivation patterns. (b) Five coactivation patterns. The kth panel shows values of bjk for 10 sources .

Figure 3:

Simulated data. Example of simulated sources and states. (a) Ten simulated sources before (black) and after (gray) adding gaussian noise with a signal-to-noise ratio (SNR) of 0 dB. Clean sources (i.e., before adding noise) exhibit one of five different coactivation patterns in each block (separated by vertical dotted lines) with their amplitudes comodulated within each block (see the text for details). The top horizontal bars and numbers indicate five states , corresponding to five coactivation patterns. (b) Five coactivation patterns. The kth panel shows values of bjk for 10 sources .

The clustering and source separation performances on these simulated data were compared among the following methods: LCMM, LCMM (real), ICA+MixT, and ICA+Kmeans. The first one, LCMM, is the proposed approach, based on the joint maximum likelihood estimation of the two layers of the complex-valued LCMM. The second one, LCMM (real), denotes the real-valued counterpart of LCMM (see section 2.6), which can be seen as lying between our proposed method and the previous two-layer BSS/ICA models. The latter two, ICA+MixT and ICA+Kmeans, perform two-stage analysis. Both first used the complex-valued FastICA (Bingham & Hyvärinen, 2000) (with real-valued ) for separating complex sources ; then ICA+MixT directly learned the mixture of the Student-t model (as in LCMM) on the separated sources, while ICA+Kmeans performed standard k-means clustering on log-amplitudes , where the mean log amplitude over the channels was subtracted at every t to compensate for the global modulation.

We used the adjusted mutual information (AMI) (Vinh, Epps, & Bailey, 2010) and the Amari index (Amari, Cichocki, & Yang, 1996) as specific performance measures for clustering and source separation, respectively. AMI5 corrects normalized mutual information (NMI) between true cluster k and estimated cluster for chance agreements: where () and denotes the expectation of under random permutations of the cluster labels where the numbers of clusters and cluster members are unchanged (I and H denote the sample mutual information and marginal entropy). AMI is thus expected to be zero under this random permutation and is upper-bounded by one; the upper bound is achieved only when the two clusterings are perfectly matched. The Amari index, a standard performance measure for linear BSS problems, is defined by
formula
3.1
where denotes the th element of matrix with estimated and true mixing matrices and , respectively. This index is nonnegative and equals zero if and only if the true mixing matrix is recovered up to the permutation and scaling of the columns.

3.2  Results

Figure 4 quantitatively compares the performances of clustering (panels on the left) and source separation (panels on the right) by the above four methods with different numbers of clusters estimated by the model, , while the number of true states is always 5. Each box plot displays the result of runs in each of the different sample sizes N. The SNR of the sources was specifically set to dB in this figure. In the panels on the right, “ICA” corresponds to both ICA+MixT and ICA+Kmeans.

Figure 4:

Simulated data (with SNR = dB). Performances in clustering (left) and source separation (right) with different settings of number of clusters estimated; (a) , (b) , and (c) , evaluated respectively by adjusted mutual information (AMI) (Vinh et al., 2010) and Amari index (Amari et al., 1996). The actual number of clusters was always five. AMI is scaled between 0 (completely random) and 1 (perfect clustering); the Amari index becomes zero if estimated recovers the true mixing matrix up to the permutation and scaling of the columns. Thus, on the left, high values are good; on the right, low values are good. Each panel displays box plots at different sample sizes N. Each box plot indicates median, interquartile range, and entire data range of runs, excluding outliers indicated by x. See the text for legends of methods.

Figure 4:

Simulated data (with SNR = dB). Performances in clustering (left) and source separation (right) with different settings of number of clusters estimated; (a) , (b) , and (c) , evaluated respectively by adjusted mutual information (AMI) (Vinh et al., 2010) and Amari index (Amari et al., 1996). The actual number of clusters was always five. AMI is scaled between 0 (completely random) and 1 (perfect clustering); the Amari index becomes zero if estimated recovers the true mixing matrix up to the permutation and scaling of the columns. Thus, on the left, high values are good; on the right, low values are good. Each panel displays box plots at different sample sizes N. Each box plot indicates median, interquartile range, and entire data range of runs, excluding outliers indicated by x. See the text for legends of methods.

As is clearly seen in the left three panels, LCMM achieved the highest AMI (medians) in every condition, and LCMM (real) consistently showed a lower AMI. In contrast, on the right-hand-side panels, these two methods showed very similar Amari indices. These results imply that the complex-valued formulation of our LCMM is particularly beneficial for obtaining better clustering without degenerating the source separation. The two-stage methods, ICA+MixT and ICA+Kmeans, also exhibited lower (i.e., worse) AMI values than those of LCMM (but not necessarily of LCMM (real)), while they exhibited higher (i.e., worse) Amari indices. The two-stage methods are less accurate in both clustering and source separation than the proposed (complex-valued) LCMM method.

Although we obtained the best performance with (the true number of clusters), the relative performance among the four methods was qualitatively similar for different Ks. That is, the proposed method outperforms other methods even when the number of clusters K is misspecified.

To further examine how the result changes with different noise levels, we also conducted simulations with different SNRs for generating the source signals. The number of clusters K in the model was simply set at the true one (). Figure 5 shows the result in the same format as that of Figure 4. The relative performance of the four methods was qualitatively similar to Figure 4 in every SNR setting. This showed that LCMM improved clustering without degenerating the source separation over the other methods even when the SNR is relatively low.

Figure 5:

Simulated data (with (true)): clustering (left) and source separation (right) performance with different noise levels. See the caption of Figure 4 for details.

Figure 5:

Simulated data (with (true)): clustering (left) and source separation (right) performance with different noise levels. See the caption of Figure 4 for details.

Finally, we demonstrated the use of BIC for selecting the number of clusters. Here, we computed the BIC for and chose the K that minimizes the value. Note that in the simulation setting here, LCMM does not completely match the true data-generating model due to the additional gaussian noise in the sources. The number of clusters selected thus often exceeded five, as shown in Figure 6, while higher SNRs (e.g., or dB) resulted in values closer to five. In practice, since EEG/MEG usually has a low SNR, these results indicate that the number of clusters will likely be overestimated by BIC. However, spatial topographies also learned by LCMM can be used to identify and discard such irrelevant clusters that contain only noise or artifacts instead of physiologically meaningful patterns.

Figure 6:

Simulated data. Normalized counts of number of clusters K selected by minimizing BIC over . Rows and columns correspond respectively to different noise levels (signal-to-noise ratios) of simulated sources and to different sample sizes N. The true number of clusters was five.

Figure 6:

Simulated data. Normalized counts of number of clusters K selected by minimizing BIC over . Rows and columns correspond respectively to different noise levels (signal-to-noise ratios) of simulated sources and to different sample sizes N. The true number of clusters was five.

4  Experiments on Resting-State EEG Data

Next, we demonstrate the advantages of using LCMM compared to two-stage methods in a real EEG data analysis. The target data are resting-state EEGs acquired before and after a BCI-related task, which we expect to contain task-relevant brain states, possibly due to mental rehearsal or retrieval. We examined the patterns (states) found in the resting-state EEGs based on the labeled EEG data during task, as well as their spatial topographies on sensor channels.

4.1  EEG Data

Five healthy subjects (three males, two females, 28+/−11 years old) participated in our EEG experiment. We placed a head cap with EEG electrodes on their heads with electric-conductive gel SIGNAGEL (Parker Laboratories, Fairfield, NJ, USA) to reduce the impedance of the electrodes. We positioned 64-channel active electrodes based on the international 10–20 system, and connected them to an ActiveTwo amplifier (BioSemi, Amsterdam, Netherlands). An experimental protocol of this study was approved by the ethical committee at ATR.

The brain activity was measured in each subject during resting states with eyes open and while the subjects performed a cued motor imagery or nonimagery task. The experiment consisted of two resting-state (RS) sessions and six task sessions between the two RS sessions. In each RS session (5 minutes), the subject was instructed to relax without thinking of anything in particular and without sleeping and to focus on a fixation point at the screen’s center. In each task session, the subjects performed a number of task trials in each of which they randomly took one of the following three actions for 3 seconds after a visually cued onset: (1) left (covert imagination of a left-hand movement), (2) right (covert imagination of a right-hand movement), and (3) idle (no imagination of hand movements).

The EEG data were acquired at a sampling frequency of 256 Hz, bandpass-filtered offline to 1 to 50 Hz (fourth-order Butterworth, zero phase), and re-referenced to the common average. Some of the RS data and some trials in the task data were rejected due to gross contamination. Typical ocular, cardiac, and muscular artifacts were also identified and removed by FastICA (Hyvärinen, 1999) with visual inspection and frequency analysis, which was done separately for each subject and also for the RS and task data. The data were further bandpass-filtered in certain frequency bands of interest and then Hilbert-transformed. We focused on two frequency bands of interest, 8 to 12 Hz (alpha) and 13 to 30 Hz (beta), in line with previous neurophysiological studies on motor imagery (Pfurtscheller & Neuper, 1997) as well as those on resting-state brain networks (Brookes et al., 2011).

4.2  Method

All the LCMM parameters were estimated from the RS data alone where the two (pre- and post-) RS sessions were combined. We emphasize that the task data and labels were not used for the parameter estimation, only to validate the learned model in a post hoc manner. Before the parameter estimation, the RS data were spatially prewhitened and dimensionality-reduced by PCA, so that % of the sample variance was kept. Note that the effective dimensionality of the data had already been reduced above by removing the artifactual dimensions using ICA. As a result, the number of sources d was selected as , and for the five subjects in the alpha band and , and in the beta band. For comparison, we also applied the two-stage method ICA+MixT, as explained in section 3.1, to the same data to examine the effect of unifying the two stages of the analysis by LCMM. In both methods, we ran the algorithm 10 times, each from different initial parameters to converge, for every preselected number of states . The best K in LCMM was chosen so that the median of the 10 BIC values achieved the minimum, and the same number K was also used in ICA+MixT.

We then evaluated how well each coactivation pattern found in the RS data discriminates the two physiologically different brain states corresponding to the motor imagery (left and right) and nonimagery (idle) labeled in the task EEG data. We used a standard performance criterion for binary discrimination, the area under the receiver operating characteristic (ROC) curve or AUC, and evaluated them as follows. We used the kth signal model to detect whether a task trial was in the kth state. This state was supposed to be detected if log-likelihood , averaged over the imagery or nonimagery period after the cued onset (where the initial 0.5 second of this 3-second period was discarded to avoid the transient effect), was above or below a certain threshold. The log likelihood of the kth state, up to the irrelevant scaling and additive constants, can be evaluated at each time point by , as derived by setting in equation 2.5 and removing the time-invariant constant terms from its logarithm. The occurrence of the kth state may be associated arbitrarily with motor imagery or with nonimagery for each case of which the ROC curve was drawn by plotting the true-positive rate against the false-positive rate at many different threshold values. Thus, we had two AUC values (computed by a trapezoidal rule) for each state from which the greater one was simply chosen; the AUC becomes close to one if the state occurrence discriminates well between motor imagery and nonimagery, and it is close to if the occurrence does not discriminate it at all. Note that this AUC does not distinguish between left and right because that seemed too difficult in these data based on our preliminary analysis (not shown here).

To compare these states that exhibited high AUC values, we further examined the time courses of their log likelihood and the sensor-level topographies corresponding to them. The topographies were drawn by evaluating how the occurrence of each state changed the power in the frequency band of interest at each site of the electrodes. We first obtained a robust estimate of the state-conditional variance (power) for each sensor channel by (), where acj denotes the estimated mixing coefficient from the jth source to the cth sensor channel and total variance was simply normalized to one because it was undefined due to the choice of . The topographies were then plotted by spatially interpolating the percentage deviations from the “grand variance” over electrodes (see Figure 7 for the layout and channel names). Grand variance was set to the expectation of state-conditional variances during the task period, , where was reestimated by the posterior state probabilities during the imagery and nonimagery periods of the task trials.

Figure 7:

Real EEG analysis. Channel names and spatial layout of 64 electrodes. Layout corresponds to the spatial topographies on the scalp shown in Figures 9 and 10.

Figure 7:

Real EEG analysis. Channel names and spatial layout of 64 electrodes. Layout corresponds to the spatial topographies on the scalp shown in Figures 9 and 10.

4.3  Results

The discriminability of each state between the motor imagery and idling trials was examined in terms of the sample distribution of their AUC values. Figure 8 shows the normalized histograms of the AUC collected from the 10 different runs of the parameter estimation, where each panel shows the two normalized histograms by LCMM (“unified”) and ICA+MixT (“two-stage”) for a particular subject and frequency band. The number of samples (AUC values) summarized in each panel was thus , where the number of states K was selected by BIC as , and for the five subjects in the alpha band and , and in the beta band. The figure shows that in most cases, the unified method exhibited greater variation in AUC than the two-stage method. For example, the range of AUC clearly expanded to both sides of the distribution in subjects 1 (beta), 3 (beta), and 5 (alpha), while mostly to the right (greater) side in subjects 3 (alpha), 4 (alpha, beta), and 5 (beta). The greater variation in the AUC distribution, even if it was both-sided, implies that it has a greater chance to contain brain states that exhibit higher AUC values. The two vertical lines in each panel further indicate the upper quartiles of AUC for the unified (solid line) and two-stage methods (dashed line), showing consistent increases of the upper quartiles by LCMM from those by ICA+MixT. In other words, the lower bound of the top 25% of the AUC values consistently shifted to the right as a consequence of unifying the two stages of analysis.

Figure 8:

Real EEG analysis. Distributional difference of AUC values evaluated for each obtained state between the unified and two-stage methods. Each panel shows two histograms corresponding to LCMM (unified) and ICA+MixT (two-stage) for particular subjects and frequency bands, as indicated on the left- and right-sides of the figure, respectively. In each panel, two normalized histograms are superimposed, with their overlap shown by transparency. The vertical axis is scaled in each panel, and the horizontal axis denotes AUC. Two vertical lines indicate the upper quartiles for two methods: LCMM (solid line) and ICA+MixT (dashed line).

Figure 8:

Real EEG analysis. Distributional difference of AUC values evaluated for each obtained state between the unified and two-stage methods. Each panel shows two histograms corresponding to LCMM (unified) and ICA+MixT (two-stage) for particular subjects and frequency bands, as indicated on the left- and right-sides of the figure, respectively. In each panel, two normalized histograms are superimposed, with their overlap shown by transparency. The vertical axis is scaled in each panel, and the horizontal axis denotes AUC. Two vertical lines indicate the upper quartiles for two methods: LCMM (solid line) and ICA+MixT (dashed line).

To gain insight into the difference of these high-AUC states between the unified and two-stage methods, we further analyzed the average dynamics of the states that exhibited the best (highest) AUC in each subject and frequency band. Figures 9 and 10 display the trial-averaged time courses of the log likelihood of those states obtained for the alpha and the beta bands, respectively, around the 3 second period of motor imagery or nonimagery. For the alpha band (see Figure 9), the dynamics clearly differ between the motor imagery (left and right) and the non-imagery (idle) conditions in LCMM (left column), especially in subjects 1, 2, and 5, with smaller differences among them in ICA+MixT (right column). For the beta band (see Figure  10), the dynamics again exhibited similar differences between the imagery and non imagery conditions in every subject. The results by LCMM (left column) and ICA+MixT (right column) are very similar in subjects 2, 3, and 4, but in subjects 1 and 5, the dynamics again clearly differ between the imagery and nonimagery conditions in LCMM with fewer differences among them in ICA+MixT.

Figure 9:

Real EEG analysis. Temporal dynamics of log likelihood of states that exhibited best (highest) AUC in Figure 8 for the alpha band (8–12 Hz). Each panel displays time courses of log likelihood, trial-averaged in each of three task conditions— left (blue), right (red), and idle (green)—for a particular subject and by either LCMM (unified) or ICA+MixT (2-stage), as indicated on the left and top of the figure. Solid lines indicate moving averages using time windows of 0.5 seconds, where shaded intervals indicate the standard deviation in each moving window. Corresponding scalp topographies show percentage deviations in frequency-band power at each electrode (see Figure 7 for channel names) interpolated spatially, which represents how the occurrence of state changes the power from the grand mean at the sensor level.

Figure 9:

Real EEG analysis. Temporal dynamics of log likelihood of states that exhibited best (highest) AUC in Figure 8 for the alpha band (8–12 Hz). Each panel displays time courses of log likelihood, trial-averaged in each of three task conditions— left (blue), right (red), and idle (green)—for a particular subject and by either LCMM (unified) or ICA+MixT (2-stage), as indicated on the left and top of the figure. Solid lines indicate moving averages using time windows of 0.5 seconds, where shaded intervals indicate the standard deviation in each moving window. Corresponding scalp topographies show percentage deviations in frequency-band power at each electrode (see Figure 7 for channel names) interpolated spatially, which represents how the occurrence of state changes the power from the grand mean at the sensor level.

Figure 10:

Real EEG analysis. Temporal dynamics of the log likelihood of states that exhibited the best (highest) AUC in Figure 8 for the beta band (13–30 Hz). See Figure 9 for more details. The moving average was taken by time windows of 0.25 seconds.

Figure 10:

Real EEG analysis. Temporal dynamics of the log likelihood of states that exhibited the best (highest) AUC in Figure 8 for the beta band (13–30 Hz). See Figure 9 for more details. The moving average was taken by time windows of 0.25 seconds.

The corresponding spatial topographies of the power deviations, given in Figures 9 and 10, provide further neurophysiological insights in conjunction with those of the averaged dynamics. For example, Figure 9 (left column) suggests that the best-AUC states of subjects 1 and 2 are both associated with the event-related decrease of frontal alpha power in the idle condition but not in the two motor imagery conditions. This is readily seen in the decrease of the log likelihood during the idle condition (green line) from the baseline level and with those topographies that exhibit positive values on the central areas. The figure also suggests that the best-AUC state of subject 5 is associated with the event-related decrease of the alpha powers during the left and right conditions at the bilateral Rolandic (central) areas, the major regions of interest related to motor imagery (Pfurtscheller & Neuper, 1997). This is seen in the log-likelihood increase during motor imagery in conjunction with the negative deviations in the alpha power around those areas, as indicated by the topographies. The topographies obtained for the beta band (see Figure 9) seem more difficult to interpret. Further neurophysiological interpretation is beyond the scope of this article.

These topographic changes in power, seen at the sensor level, are actually caused by different coactivation patterns at the level of the underlying sources. Figure 11 shows an example of the coactivation patterns () and the topographies ( and their element-wise squares) corresponding to each source obtained by LCMM. The top row shows the coactivation pattern of the 12 sources, obtained for subject 2 in the alpha band, which corresponds to the best-AUC state shown in Figure 9 (left column, subject 2). Here in this state, sources 3 and 10 have the largest powers, followed by 5, 6, and 7, and the rest have relatively small powers. With the coactivation patterns for other states, perhaps these high-AUC states can be characterized by the relative deactivation of sources 1 and 2 or the relative activation of sources 5, 6, and 7, for example. Although we omit the details, the topographies given at the bottom will be useful for further interpretation.

Figure 11:

Real EEG analysis. Examples of coactivation patterns of sources and spatial topographies corresponding to each source obtained by LCMM (subject 2, alpha band). The 10 bar plots display coactivations associated with 10 different states k. The five from the top are those of highest AUC values, and the rest are lowest AUC values, as indicated at the right of each row. The vertical length of the jth bar (from left) in the kth row represents the relative value of bjk with the vertical axis scaled separately in each row. Scalp topographies at the bottom show values of mixing coefficients acj (lower) and its squared (upper), where blue and red correspond to negative and positive signs, respectively. Note that the signs of coefficients acj may be arbitrarily flipped for each j due to indeterminacy inherent in LCMM (or ICA). Numbers j at the bottom indicate 12 sources.

Figure 11:

Real EEG analysis. Examples of coactivation patterns of sources and spatial topographies corresponding to each source obtained by LCMM (subject 2, alpha band). The 10 bar plots display coactivations associated with 10 different states k. The five from the top are those of highest AUC values, and the rest are lowest AUC values, as indicated at the right of each row. The vertical length of the jth bar (from left) in the kth row represents the relative value of bjk with the vertical axis scaled separately in each row. Scalp topographies at the bottom show values of mixing coefficients acj (lower) and its squared (upper), where blue and red correspond to negative and positive signs, respectively. Note that the signs of coefficients acj may be arbitrarily flipped for each j due to indeterminacy inherent in LCMM (or ICA). Numbers j at the bottom indicate 12 sources.

5  Discussion

We presented a novel unsupervised method for nonstationary functional connectivity analysis of EEG/MEG sources. Our simulation studies confirmed that the proposed unified method often outperformed the conventional two-stage method in terms of both source separation and clustering performances. Real EEG data analysis also showed that the unified method finds coactivity patterns that discriminate well between motor imagery and idling states with a higher probability than the two-stage method. In addition, the proposed method performs clustering in the source space to give further neurophysiological insights beyond sensor-space clustering (Britz, Ville, & Michel, 2010; Shi & Lu, 2008), as demonstrated in section 4.3.

Our real EEG data analysis suggested that the log-likelihood values of the states, obtained by our unsupervised analysis, may provide better higher-order signal features relevant to BCI (e.g., for the onset detection of motor imagery) than those by conventional methods. However, as seen in section 4.3, LCMM often did not consistently improve the discriminability (AUC) of every state but rather strengthened their contrasts: the states obtained by LCMM may contain both more discriminative (higher AUC) and more indiscriminative (lower AUC) ones than those by the two-stage methods. Hence, we must carefully select the discriminative features (states) while avoiding the indiscriminative ones to successfully apply our method to BCI. Such a feature selection could be done—for example, based on neurophysiological interpretations or using additional task-based experimental calibration. This remains open for future investigation.

Some recent studies have combined BSS with effective (directional) connectivity analysis to analyze the causality between neural activities, for example, autoregressive (AR) models (Gómez-Herrero, Atienza, Egiazarian, & Cantero, 2008; Haufe et al., 2010, see also Fukushima, Yamashita, Knösche, & Sato, 2015, to solve the inverse problem rather than BSS), generalized autoregressive conditional heterokcedasticity (GARCH) models (Zhang & Hyvärinen, 2010), or a structural equation model (SEM) (Hirayama & Hyvärinen, 2012). Our idea of unifying BSS and EEG/MEG connectivity analysis in a hierarchical statistical model is thus not completely novel, but in this study, we focused on a different type of statistical connectivity: functional connectivity based on envelope correlations. More important, we focused on analyzing nonstationary functional connectivity in terms of underlying patterns or states and their dynamics. This is a conceptually crucial difference from previous studies that were concerned with static connectivity (i.e., those unchanging over time).

The LCMM proposed here was shown to be a reasonable statistical model to perform our specific analysis on resting-state EEG/MEG signals, but obviously it has some limitations from the perspective of generative signal modeling. First, each LCMM state can describe only the positive correlations in the source envelopes, as is readily seen from the interpretation of equation 2.12, where latent factor and coefficients are both positive. The states themselves are correlated negatively since they do not occur simultaneously due to the assumption of the finite mixture model. Hence, our method cannot analyze any brain states characterized by negative envelope correlations or states that are positively correlated with each other. Second, the i.i.d. assumption is too simplistic since brain activity is obviously not independent over time. More general models corresponding to equation 2.13, possibly with nonlinearly in equation 2.11, as well as some temporal dynamics model on latent variables, will make the model more realistic. However, the model complexity must be carefully controlled to achieve good predictability and maintain the model’s tractability.

Another practical limitation of our method is that it can currently handle only a single frequency band of interest. Although it is very typical to limit an EEG/MEG analysis to a certain frequency band, cross-frequency interactions are sometimes of particular interest. Some recent studies have actually combined a complex-valued ICA with time-frequency decomposition for spontaneous EEG/MEG analysis with more flexibility on the spectral nature (Hyvärinen, Ramkumar et al., 2010; Ramkumar et al., 2012). A similar technique can probably be used to extend our method to analyze the data in multiple frequency bands.

Appendix:  Maximum Likelihood Estimation of LCMM

In our LCMM, the maximum likelihood estimates of parameters of interest are given by the minimizer of the negative (average) log-likelihood , where the tth term, , is given by
formula
A.1
where , according to equations 2.5 and 2.6. We minimize by a quasi-Newton method with respect to , , and using the reparameterization of equation 2.8. This requires the first derivatives of , given by
formula
A.2
formula
A.3
formula
A.4
where , the asterisk denotes a complex conjugate, and denotes the -element of the transposed inverse matrix of . denotes the posterior probability of the kth state, given by
formula
A.5
where means that the left-hand side is proportional to the right-hand side up to a constant factor independent of k. Notice that equation A.5 is equivalent to equation 2.9. To obtain equation A.3, above, we used the relation given by
formula
A.6
formula
A.7
formula
A.8
formula
A.9
where and denote the real and imaginary parts of complex number z, respectively.

Acknowledgments

We gratefully thank Motoaki Kawanabe and Okito Yamashita for their helpful comments and discussions and Takayuki Suyama, who provided the opportunity for this research. This work was supported by a contract with the Ministry of Internal Affairs and Communications “Novel and Innovative R&D Making Use of Brain Structures” and by JSPS KAKENHI Grant Number 25730155. J.H. was partially supported by Strategic International Research Cooperative Program, Japan Science and Technology Agency (JST). A.H. was supported by the Academy of Finland, Centre-of-Excellence in Inverse Problems Research and Centre-of-Excellence Algorithmic Data Analysis.

References

Allen
,
E. A.
,
Damaraju
,
E.
,
Pils
,
S. M.
,
Erhardt
,
E.
,
Eichele
,
T.
, &
Calhoun
,
V. D.
(
2014
).
Tracking whole-brain connectivity dynamics in the resting state
.
Cerebral Cortex
,
24
(
3
),
663
676
.
Amari
,
S.
,
Cichocki
,
A.
, &
Yang
,
H. H.
(
1996
).
A new learning algorithm for blind signal separation
. In
D.
Touretzky
,
M.
Jordan
, &
T.
Petsche
(Eds.),
Advances in neural information processing systems
,
8
(pp.
757
763
).
Cambridge, MA
:
MIT Press
.
Bingham
,
E.
, &
Hyvärinen
,
A.
(
2000
).
A fast fixed-point algorithm for independent component analysis of complex valued signals
.
International Journal of Neural Systems
,
10
(
1
),
1
8
.
Bishop
,
C. M.
(
2006
).
Pattern recognition and machine learning
.
Berlin
:
Springer
.
Britz
,
J.
,
Ville
,
D. V. D.
, &
Michel
,
C. M.
(
2010
).
BOLD correlates of EEG topography reveal rapid resting-state network dynamics
.
NeuroImage
,
52
,
1162
1170
.
Brookes
,
M. J.
,
Woolrich
,
M.
,
Luckhoo
,
H.
,
Price
,
D.
,
Hale
,
J. R.
,
Stephenson
,
M. C.
, … 
Morris
,
P. G.
(
2011
).
Investigating the electrophysiological basis of resting state networks using magnetoencephalography
.
Proceedings of the National Academy of Sciences of the United States of America
,
108
(
40
),
16783
16788
.
Cadieu
,
C. F.
, &
Olshausen
,
B. A.
(
2012
).
Learning intermediate-level representations of form and motion from natural movies
.
Neural Computation
,
24
(
4
),
827
866
.
Dähne
,
S.
,
Nikulin
,
V. V.
,
Ramirez
,
D.
,
Schreier
,
P. J.
,
Müller
,
K. R.
, &
Haufe
,
S.
(
2014
).
Finding brain oscillations with power dependencies in neuroimaging data
.
NeuroImage
,
96
,
334
348
.
de Pasquale
,
F.
,
Della Penna
,
S.
,
Snyder
,
A. Z.
,
Lewis
,
C.
,
Mantini
,
D.
,
Marzetti
,
L.
,… 
Corbetta
,
M.
(
2010
).
Temporal dynamics of spontaneous MEG activity in brain networks
.
Proceedings of the National Academy of Sciences of the United States of America
,
107
(
13
),
6040
6045
.
Doesburg
,
S. M.
, &
Ward
,
L. M.
(
2009
).
Synchronization between sources: Emerging methods for understanding large-scale functional networks in the human brain
. In
J. L.
Perez-Velazquez
and
R.
Wennberg
(Eds.),
Coordinated activity in the brain
, (pp.
25
42
).
New York
:
Springer.
Friston
,
K. J.
(
1994
).
Functional and effective connectivity in neuroimaging: A synthesis
.
Human Brain Mapping
,
2
,
56
78
.
Fukushima
,
M.
,
Yamashita
,
O.
,
Knösche
,
T.
, &
Sato
,
M.
(
2015
).
MEG source reconstruction based on identification of directed source interactions on whole-brain anatomical networks
.
NeuroImage
,
105
,
408
427
.
Gómez-Herrero
,
G.
,
Atienza
,
M.
,
Egiazarian
,
K.
, &
Cantero
,
J.
(
2008
).
Measuring directional coupling between EEG sources
.
NeuroImage
,
43
,
497
508
.
Grosse-Wentrup
,
M.
(
2009
).
Understanding brain connectivity patterns during motor imagery for brain-computer interfacing
. In
D.
Köller
,
D.
Schuurmans
,
Y.
Bengio
, &
L.
Bottou
(Eds.),
Advances in neural information processing systems
,
21
(pp.
561
568
).
Cambridge, MA
:
MIT Press
.
Grosse-Wentrup
,
M.
(
2011
).
What are the causes of performance variation in brain-computer interfacing?
International Journal of Bioelectromagnetism
,
13
(
3
),
115
116
.
Haufe
,
S.
,
Tomioka
,
R.
,
Nolte
,
G.
,
Müller
,
K.-R.
, &
Kawanabe
,
M.
(
2010
).
Modeling sparse connectivity between underlying brain sources for EEG/MEG
.
IEEE Transactions on Biomedical Engineering
,
57
(
8
),
1954
1963
.
Hirayama
,
J.
, &
Hyvärinen
,
A.
(
2012
).
Structural equations and divisive normalization for energy-dependent component analysis
. In
J.
Shawe-Taylor
,
R. S.
Zemel
,
P. L.
Bartlett
,
F.C.N.
Pereira
, &
K. Q.
Weinberger
(Eds.),
Advances in neural information processing systems, 24
(pp.
1872
1880
).
Red Hook, NY
:
Curran
.
Hirayama
,
J.
,
Ogawa
,
T.
, &
Hyvärinen
,
A.
(
2014
).
Simultaneous blind separation and clustering of coactivated EEG/MEG sources for analyzing spontaneous brain activity
. In
Proceedings of the 30th Annual International Conference of the IEEE Engineering in Medicine and Biology Society
(pp.
4932
4935
).
Piscataway, NJ
:
IEEE
.
Hironaga
,
N.
, &
Ioannides
,
A. A.
(
2007
).
Localization of individual area neuronal activity
.
NeuroImage
,
34
(
4
),
1519
1534
.
Hyvärinen
,
A.
(
1999
).
Fast and robust fixed-point algorithms for independent component analysis
.
IEEE Transactions on Neural Networks
,
10
(
3
),
626
634
.
Hyvärinen
,
A.
, &
Hoyer
,
P. O.
(
2000
).
Emergence of phase and shift invariant features by decomposition of natural images into independent feature subspaces
.
Neural Computation
,
12
(
7
),
1705
1720
.
Hyvärinen
,
A.
,
Hoyer
,
P. O.
, &
Inki
,
M.
(
2001
).
Topographic independent component analysis
.
Neural Computation
,
13
(
7
),
1527
1558
.
Hyvärinen
,
A.
,
Karhunen
,
J.
, &
Oja
,
E.
(
2001
).
Independent component analysis
.
Hoboken, NJ
:
Wiley
.
Hyvärinen
,
A.
,
Ramkumar
,
P.
,
Parkkonen
,
L.
, &
Hari
,
R.
(
2010
).
Independent component analysis of short-time Fourier transforms for spontaneous EEG/MEG analysis
.
NeuroImage
,
49
,
257
271
.
Hyvärinen
,
A.
,
Zhang
,
K.
,
Shimizu
,
S.
, &
Hoyer
,
P. O.
(
2010
).
Estimation of a structural vector autoregression model using non-gaussianity
.
Journal of Machine Learning Research
,
11
,
1709
1731
.
Karklin
,
Y.
, &
Lewicki
,
M. S.
(
2005
).
A hierarchical Bayesian model for learning nonlinear statistical regularities in nonstationary natural signals
.
Neural Computation
,
17
,
397
423
.
Kawanabe
,
M.
, &
Müller
,
K.-R.
(
2005
).
Estimating functions for blind separation when sources have variance dependencies
.
Journal of Machine Learning Research
,
6
,
453
482
.
Keribin
,
C.
(
2000
).
Consistent estimation of the order of mixture models
.
Sankhyā: Indian Journal of Statistics, Series A
,
62
(
1
),
49
66
.
Köster
,
U.
, &
Hyvärinen
,
A.
(
2010
).
A two-layer model of natural stimuli estimated with score matching
.
Neural Computation
,
22
,
2308
2333
.
Lance
,
B. J.
,
Kerick
,
S. E.
,
Ries
,
A. J.
,
Oie
,
K. S.
, &
McDowell
,
K.
(
2012
).
Brain–computer interface technologies in the coming decades
.
Proceedings of the IEEE
,
100
,
1585
1599
.
Leonardi
,
N.
,
Richiardi
,
J.
,
Gschwind
,
M.
,
Simioni
,
S.
,
Annoni
,
J.-M.
,
Schluep
,
M.
, … 
Van De Ville
,
D.
(
2013
).
Principal components of functional connectivity: A new approach to study dynamic brain connectivity during rest
.
NeuroImage
,
83
,
937
950
.
Liu
,
X.
,
Chang
,
C.
, &
Duyn
,
J. H.
(
2013
).
Decomposition of spontaneous brain activity into distinct fMRI co-activation patterns
.
Frontiers in Systems Neuroscience
,
7
,
101
.
Mahot
,
M.
,
Pascal
,
F.
,
Forster
,
P.
, &
Ovarlez
,
J.-P.
(
2013
).
Asymptotic properties of robust complex covariance matrix estimates
.
IEEE Transactions on Signal Processing
,
61
(
13
),
3348
3356
.
Makeig
,
S.
,
Kothe
,
C.
,
Mullen
,
T.
,
Bigdeley-Shamlo
,
N.
,
Zhang
,
Z.
, &
Kreutz-Delgado
,
K.
(
2012
).
Evolving signal processing for brain-computer interfaces
.
Proceedings of the IEEE
,
100
,
1567
1584
.
Ollila
,
E.
, &
Koivunen
,
V.
(
2003
).
Robust antenna array processing using M-estimators of pseudo-covariance
. In
Proceedings of the 14th IEEE International Symposium on Personal, Indoor and Mobile Radio Communications
(pp.
7
10
).
Piscataway, NJ
:
IEEE
.
Onton
,
J.
, &
Makeig
,
S.
(
2009
).
High-frequency broadband modulations of electroencephalographic spectra
.
Frontiers in Neuroscience
,
159
,
99
120
.
Osindero
,
S.
,
Welling
,
M.
, &
Hinton
,
G. E.
(
2006
).
Topographic product models applied to natural scene statistics
.
Neural Computation
,
18
,
381
414
.
Pfurtscheller
,
G.
, &
Neuper
,
C.
(
1997
).
Motor imagery activates primary sensorimotor area in humans
.
Neuroscience Letters
,
239
,
65
68
.
Ramkumar
,
P.
,
Parkkonen
,
L.
,
Hari
,
R.
, &
Hyvärinen
,
A.
(
2012
).
Characterization of neuromagnetic brain rhythms over time scales of minutes using spatial independent component analysis
.
Human Brain Mapping
,
33
(
7
),
1648
1662
.
Ramkumar
,
P.
,
Parkkonen
,
L.
, &
Hyvärinen
,
A.
(
2014
).
Group-level spatial independent component analysis of Fourier envelopes of resting-state MEG data
.
NeuroImage
,
86
,
480
491
.
Salakhutdinov
,
R.
,
Roweis
,
S. T.
, &
Ghahramani
,
Z.
(
2003
).
Optimization with EM and expectation-conjugate-gradient
. In
Proceedings of the 20th International Conference on Machine
(pp.
672
679
).
Cambridge, MA
:
AAAI Press
.
Schreier
,
P. J.
, &
Scharf
,
L. L.
(
2010
).
Statistical signal processing of complex-valued data
.
Cambridge
:
Cambridge University Press
.
Shi
,
L. C.
, &
Lu
,
B. L.
(
2008
).
Dynamic clustering for vigilance analysis based on EEG
. In
Proceedings of the 30th Annual International Conference of the IEEE Engineering in Medicine and Biology Society
, (pp.
54
57
).
Piscataway, NJ
:
IEEE
.
Smith
,
S. M.
,
Miller
,
K. L.
,
Moeller
,
S.
,
Xu
,
J.
,
Auerbach
,
E. J.
,
Woolrich
,
M. W.
, … 
Ugurbil
,
K.
(
2012
).
Temporally-independent functional modes of spontaneous brain activity
.
Proceedings of the National Academy of Sciences of the United States of America
,
109
(
8
),
3131
3136
.
Steele
,
R. J.
, &
Raftery
,
A. E.
(
2010
).
Performance of Bayesian model selection criteria for gaussian mixture models
. In
M. H.
Chen
,
D. K.
Dey
,
P.
Müller
,
D.
Sun
, &
K.
 
Ye
(Eds.),
Frontiers of statistical decision making and bayesian analysis
(pp.
113
130
).
New York
:
Springer
.
Valpola
,
H.
,
Harva
,
M.
, &
Karhunen
,
J.
(
2004
).
Hierarchical models of variance sources
.
Signal Processing
,
84
(
2
),
267
282
.
Vinh
,
N. X.
,
Epps
,
J.
, &
Bailey
,
J.
(
2010
).
Information theoretic measures for clusterings comparison: Variants, properties, normalization and correction for chance
.
Journal of Machine Learning Research
,
11
,
2837
2854
.
Zander
,
T. O.
, &
Kothe
,
C.
(
2011
).
Towards passive braincomputer interfaces: applying braincomputer interface technology to humanmachine systems in general
.
Journal of Neural Engineering
,
8
(
2
),
025005
.
Zhang
,
K.
, &
Hyvärinen
,
A.
(
2010
).
Source separation and higher-order causal analysis of MEG and EEG
. In
Proceedings of the 26th Conference on Uncertainty in Artificial Intelligence
(pp.
709
716
).
N.p.
:
AUAI Press
.

Notes

1

To be precise, the exact relation between and the source variances in equation 2.4 is no longer valid if , since the variance is infinite or undefined. However, even if , the estimated can still be interpreted as modeling the variance levels, while small implies canceling the effects from the outliers from the viewpoint of robust estimation (as explained in the text).

2

Note that other heavy-tailed distributions in the complex elliptical family (Schreier & Scharf, 2010) commonly have the robustness property and thus could also be used as the source model in LCMM. However, the examination of this option is beyond the scope of this article.

3

We used a Matlab implementation of the limited-memory BFGS by Mark Schmidt, available at http://www.di.ens.fr/∼ mschmidt/Software/minFunc.html.

4

The pdf is given by if .

5

We used the Matlab code available at https://sites.google.com/site/vinhnguyenx/softwares.