## Abstract

Blind source separation—the extraction of independent sources from a mixture—is an important problem for both artificial and natural signal processing. Here, we address a special case of this problem when sources (but not the mixing matrix) are known to be nonnegative—for example, due to the physical nature of the sources. We search for the solution to this problem that can be implemented using biologically plausible neural networks. Specifically, we consider the online setting where the data set is streamed to a neural network. The novelty of our approach is that we formulate blind nonnegative source separation as a similarity matching problem and derive neural networks from the similarity matching objective. Importantly, synaptic weights in our networks are updated according to biologically plausible local learning rules.

## 1 Introduction

Extraction of latent causes, or sources, from complex stimuli is essential for making sense of the world. Such stimuli could be mixtures of sounds, mixtures of odors, or natural images. If supervision, or ground truth, about the causes is lacking, the problem is known as blind source separation.

The goal of ICA is to infer source signals, , from the stimuli . Whereas many ICA algorithms have been developed by the signal processing community (Comon & Jutten, 2010), most of them cannot be implemented by biologically plausible neural networks. Yet our brains can solve the blind source separation problem effortlessly (Bronkhorst, 2000; Asari, Pearlmutter, & Zador, 2006; Narayan et al., 2007; Bee & Micheyl, 2008; McDermott, 2009; Mesgarani & Chang, 2012; Golumbic et al., 2013; Isomura, Kotani, & Jimbo, 2015). Therefore, discovering a biologically plausible ICA algorithm is an important problem.

For an algorithm to be implementable by biological neural networks, it must satisfy (at least) the following requirements. First, it must operate in the online (or streaming) setting. In other words, the input data set is not available as a whole but is streamed one data vector at a time, and the corresponding output must be computed before the next data vector arrives. Second, the output of most neurons in the brain (either a firing rate or the synaptic vesicle release rate) is nonnegative. Third, the weights of synapses in a neural network must be updated using local learning rules; they depend on the activity of only the corresponding pre- and postsynaptic neurons.

Given the nonnegative nature of neuronal output, we consider a special case of ICA where sources are assumed to be nonnegative, termed nonnegative independent component analysis (NICA; Plumbley, 2001, 2002). Of course, to recover the sources, one can use standard ICA algorithms that do not rely on the nonnegativity of sources, such as fastICA (Hyvärinen & Oja, 1997, 2000; Hyvarinen, 1999). Neural learning rules have been proposed for ICA (e.g., Linsker, 1997; Eagleman et al., 2001; Isomura & Toyoizumi, 2016). However, taking into account nonnegativity may lead to simpler and more efficient algorithms (Plumbley, 2001, 2003; Plumbley & Oja, 2004; Oja & Plumbley, 2004; Yuan & Oja, 2004; Zheng, Huang, Sun, Lyu, & Lok, 2006; Ouedraogo, Souloumiac, & Jutten, 2010; Li, Liang, & Risteski, 2016).

While most of the existing NICA algorithms have not met the biological plausibility requirements, in terms of online setting and local learning rules, there are two notable exceptions. First, Plumbley (2001) successfully simulated a neural network on a small data set, yet no theoretical analysis was given. Second, Plumbley (2003) and Plumbley and Oja (2004) proposed a nonnegative PCA algorithm for a streaming setting; however, its neural implementation requires nonlocal learning rules. Further, this algorithm requires prewhitened data, yet no streaming whitening algorithm was given.

Here, we propose a biologically plausible NICA algorithm. The novelty of our approach is that the algorithm is derived from the similarity matching principle, which postulates that neural circuits map more similar inputs to more similar outputs (Pehlevan, Hu, & Chklovskii, 2015). Previous work proposed various objective functions to find similarity matching neural representations and solved these optimization problems with biologically plausible neural networks (Pehlevan et al., 2015; Pehlevan & Chklovskii, 2014, 2015a, 2015b; Hu, Pehlevan, & Chklovskii, 2014). Here we apply these networks to NICA.

The rest of the letter is organized as follows. In section 2, we show that blind source separation, after a generalized prewhitening step, can be posed as a nonnegative similarity matching (NSM) problem (Pehlevan & Chklovskii, 2014). In section 3, using results from Pehlevan and Chklovskii (2014, 2015a), we show that both the generalized prewhitening step and the NSM step can be solved online by neural networks with local learning rules. Stacking these two networks leads to the two-layer NICA network. In section 4, we compare the performance of our algorithm to other ICA and NICA algorithms for various data sets.

## 2 Offline NICA via NSM

In this section, we first review Plumbley's analysis of NICA and then reformulate NICA as an NSM problem.

### 2.1 Review of Plumbley's Analysis

When source signals are nonnegative, the source separation problem simplifies. It can be solved in two straightforward steps: noncentered prewhitening and orthonormal rotation (Plumbley, 2002).

Noncentered prewhitening transforms to , where and is a whitening matrix,^{1} such that , where angled brackets denote an average over the source distribution and is the identity matrix. Note that the mean of is not removed in the tranformation; otherwise, one would not be able to use the constraint that the sources are nonnegative (Plumbley, 2003).

The second step of NICA relies on the following observation (Plumbley, 2002):

Suppose sources are independent, nonnegative, and well grounded, that is, Prob for any . Consider an orthonormal transformation . Then is a permutation matrix with probability 1 if and only if is nonnegative.

In the second step, we look for an orthonormal such that is nonnegative. When found, Plumbley's theorem guarantees that is a permutation of the sources. Several algorithms have been developed based on this observation (Plumbley, 2003; Plumbley & Oja, 2004; Oja & Plumbley, 2004; Yuan & Oja, 2004).

Note that only the sources but not necessarily the mixing matrix must be nonnegative. Therefore, NICA allows generative models, where features not only add up but also cancel each other, as in the presence of a shadow in an image (Plumbley, 2002). In this respect, NICA is more general than nonnegative matrix factorization (NMF; Lee & Seung, 1999; Paatero & Tapper, 1994) where both the sources and the mixing matrix are required to be nonnegative.

### 2.2 NICA as NSM

Next we reformulate NICA as an NSM problem. This reformulation will allow us to derive an online neural network for NICA in section 3. Our main departure from Plumbley's analysis is to work with similarity matrices rather than covariance matrices and a finite number of samples rather than the full probability distribution of the sources.

First, let us switch to the matrix notation, where data matrices are formed by concatenating data column vectors, for example, so that , and so that . In this notation, we introduce a time-centering operation such that, for example, time-centered stimuli are where and is a -dimensional column vector whose elements are all 1's.

Our goal is to recover from , where is unknown. We make two assumptions. First, sources are nonnegative and decorrelated, . Note that while general ICA and NICA problems are stated with the independence assumption on the sources, it is sufficient for our purposes that they are only decorrelated. Second, the mixing matrix, , is full rank.

We propose that the source matrix, , can be recovered from in two steps, illustrated in Figure 1. The first step is generalized prewhitening: transform to , where is with , so that has unit eigenvalues and zero eigenvalues. When , is whitened; otherwise, channels of are correlated. Such prewhitening is useful because it implies according to the following theorem:

If , the channels of are correlated, except in the special case . The whitening used in Plumbley's analysis (Plumbley, 2002) requires .

^{2}, we can rewrite equation 2.6 as Since both and are nonnegative, rank- matrices, , where is a permutation matrix, is a solution to this optimization problem, and the sources are successfully recovered.

Uniqueness of the solutions (up to permutations) is hard to establish. While both sufficient conditions, and necessary and sufficient conditions for uniqueness exist, these are nontrivial to verify, and usually the verification is NP-complete (Donoho & Stodden, 2003; Laurberg, Christensen, Plumbley, Hansen, & Jensen, 2008; Huang, Sidiropoulos, & Swami, 2014). A review of related uniqueness results can be found in Huang et al. (2014). A necessary condition for uniqueness given in Huang et al. (2014) states that if the factorization of to is unique (up to permutations), then each row of contains at least one element that is equal to 0. This necessary condition is similar to Plumbley's well-groundedness requirement used in proving theorem ^{1}.

## 3 Derivation of NICA Neural Networks from Similarity Matching Objectives

Our analysis in the previous section revealed that the NICA problem can be solved in two steps: generalized prewhitening and nonnegative similarity matching. Here, we derive neural networks for each of these steps and stack them to give a biologically plausible two-layer neural network that operates in a streaming setting.

In a departure from the previous section, the number of output channels is reduced to the number of sources at the prewhitening stage rather than the later NSM stage (). This assumption simplifies our analysis significantly. The full problem is addressed in appendix B.

### 3.1 Noncentered Prewhitening in a Streaming Input Setting

To derive a neurally plausible online algorithm for prewhitening, we pose generalized prewhitening, equation 2.2, as an optimization problem. Online minimization of this optimization problem gives an algorithm that can be mapped to the operation of a biologically plausible neural network.

The theorem implies that, first, , and hence satisfies the generalized prewhitening condition, equation 2.2. Second, , the linear mapping between and , can be constructed using an SVD decomposition of and equation 3.2.

Equation 3.9 describes the dynamics of a single-layer neural network with two populations (see Figure 2). represents the weights of feedforward synaptic connections, and and represent the weights of synaptic connections between the two populations. Remarkably, synaptic weights appear in the online algorithm despite their absence in the optimization problem formulations 3.3 and 3.4. Furthermore, neurons can be associated with the principal neurons of a biological circuit and neurons with interneurons.

Equations 3.9 and 3.10 define a neural algorithm that proceeds in two phases. After each stimulus presentation, equation 3.9 is iterated until convergence by the dynamics of neuronal activities. Then synaptic weights are updated according to local, anti-Hebbian (for synapses from interneurons), and Hebbian (for all other synapses) rules, equation 3.10. Biologically, synaptic weights are updated on a slower timescale than neuronal activity dynamics.

Our algorithm can be viewed as a special case of the algorithm proposed in (Plumbley 1994, 1996). Plumbley (1994) analyzed the convergence of synaptic weights in a stochastic setting by a linear stability analysis of the stationary point of synaptic weight updates. His results are directly applicable to our algorithm and show that if the synaptic weights of our algorithm converge to a stationary state, they whiten the input.

#### 3.1.1 Computing

The optimization problem, equation 3.3, and the corresponding neural algorithm, equations 3.9 and 3.10, almost achieve what is needed for noncentered prewhitening, but we still need to find , since for the NSM step, we need . We now discuss how can be learned along with using the same online algorithm.

### 3.2 Online NSM

Next, we derive the second-layer network, which solves the NSM optimization problem 2.6 in an online setting (Pehlevan & Chklovskii, 2014).

After the arrival of each data vector, the operation of the complete two-layer network algorithm, Figure 2, is as follows. First, the dynamics of the prewhitening network runs until convergence. Then the output of the prewhitening network is fed to the NSM network, and the NSM network dynamics runs until convergence to a fixed point. Synaptic weights are updated in both networks for processing the next data vector.

#### 3.2.1 NICA Is a Stationary State of Online NSM

Suppose the stimuli obey the NICA generative model, equation 1.1, and the observed mixture, , is whitened with the exact (generalized) prewhitening matrix described in theorem ^{2}. Then input to the network at time is . Our claim is that there exist synaptic weight configurations for which (1) for any mixed input, , the output of the network is the source vector, that is, , where is a permutation matrix, and (2) this synaptic configuration is a stationary state.

^{3}Furthermore, these weights define a stationary state as defined in equation 3.27, assuming a fixed learning rate. To see this substitute weights from equation 3.28 into the last two equations of 3.26 and average over the source distribution. The fixed learning rate assumption is valid in the large- limit when changes to become small ( (see Pehlevan et al., 2015).

## 4 Numerical Simulations

Here we present numerical simulations of our two-layered neural network using various data sets and compare the results to that of other algorithms.

In all our simulations, , except in Figure 5B, where . Our networks were initialized as follows:

In the prewhitening network, and were chosen to be random orthonormal matrices. is initialized as because of its definition in equation 3.8 and the fact that this choice guarantees the convergence of the neural dynamics, equation 3.19 (see appendix A).

In the NSM network, was initialized to a random orthonormal matrix and was set to zero.

The learning rates were chosen as follows:

- For the prewhitening network, we generalized the time-dependent learning rate, equation 3.21, to and performed a grid search over and to find the combination with the best performance. Our performance measures will be introduced below.
- For the NSM network, we generalized the activity-dependent learning rate, equation 3.26, to, and performed a grid search over several values of and to find the combination with the best performance. The parameter introduces “forgetting” to the system (Pehlevan et al., 2015). We hypothesized that forgetting will be beneficial in the two-layer setting because the prewhitening layer output changes over time and the NSM layer has to adapt. Further, for comparison purposes, we also implemented this algorithm with a time-dependent learning rate of the form 4.1 and performed a grid search with and to find the combination with best performance.

For the NSM network, to speed up our simulations, we implemented a procedure from Plumbley & Oja (2004). At each iteration, we checked whether there is any output neuron that has not fired up until that iteration. If so, we flipped the sign of its feedforward inputs. In practice, the flipping occurred only within the first 10 or 50 iterations.

For comparison, we implemented five other algorithms. First is the offline algorithm, equation 2.8; the other two are chosen to represent major online algorithm classes:

*Offline projected gradient descent:*We simulated the projected gradient descent algorithm, 2.8. We used variable step sizes of the form 4.1 and performed a grid search with and to find the combination with the best performance. We initialized elements of the matrix, , by drawing a gaussian random variable with zero mean and unit variance and rectifying it. The input data set was whitened offline before passing to projected gradient descent.*fastICA:*fastICA (Hyvärinen & Oja, 1997, 2000; Hyvarinen, 1999) is a popular ICA algorithm that does not assume nonnegativity of sources. We implemented an online version of fastICA (Hyvärinen & Oja, 1998) using the same parameters except for feedforward weights. We used the time-dependent learning rate, equation 4.1, and performed a grid search with and to find the combination with the best performance. fastICA requires whitened and centered input (Hyvärinen & Oja, 1998) and computes a decoding matrix that maps mixtures back to sources. We ran the algorithm with whitened and centered input. To recover nonnegative sources, we applied the decoding matrix to noncentered but whitened input.*Infomax ICA:*Bell and Sejnowski (1995) proposed a blind source separation algorithm that maximizes the mutual information between inputs and outputs—the Infomax principle (Linsker, 1988). We simulated an online version due to Amari, Cichocki, and Yang (1996). We chose cubic neural nonlinearities compatible with subgaussian input sources. This differs from our fastICA implementation where the nonlinearity is also learned online. Infomax ICA computes a decoding matrix using centered, but not whitened, data. To recover nonnegative sources, we applied the decoding matrix to noncentered inputs. Finally, we rescaled the sources so that their variance is 1. We experimented with several learning rate parameters for finding optimal performance.*Linsker's network:*Linsker (1997) proposed a neural network with local learning rules for Infomax ICA. We simulated this algorithm with cubic neural nonlinearities and preprocessing and decoding done as in our Infomax ICA implementation.*Nonnegative PCA:*A nonnegative PCA algorithm (Plumbley & Oja, 2004) solves the NICA task and makes explicit use of the nonnegativity of sources. We use the online version given in Plumbley and Oja (2004). To speed up our simulations, we implemented a procedure from Plumbley and Oja (2004). At each iteration, we checked for any output neuron that had not fired up until that iteration. If so, we flipped the sign of its feedforward inputs. For this algorithm, we again used the time-dependent learning rate of equation 4.1 and performed a grid search with and to find the combination with the best performance. Nonnegative PCA assumes whitened but not centered input (Plumbley & Oja, 2004).

Next, we present the results of our simulations on three data sets.

### 4.1 Mixture of Random Uniform Sources

The source independent and identically distributed samples were set to zero with probability 0.5 and sampled uniformly from iterval with probability 0.5. The dimensions of source vectors were . The mixing matrices are given in appendix C. For each run, source vectors were generated. (For a sample of the original and mixed signals, see Figure 3A.)

The inputs to fastICA and Nonnegative PCA algorithms were prew-hitened offline, and in the case of fastICA, they were also centered. We ran our NSM network as both a single-layer algorithm with prewhitening done offline and as part of our two-layer algorithm with whitening done online.

In Figure 3B, we show the performances of all algorithms we implemented. Our algorithms perform as well as or better than others, especially as the dimensionality of the input increases. Offline whitening is better than online whitening, however, as dimensionality increases, online whitening becomes competitive with offline whitening. In fact, our two-layer and single-layer networks perform better than online fastICA and nonnegative PCA for which whitening was done offline.

We also simulated a fully offline algorithm by taking projected gradient descent steps until the residual error plateaued (see Figure 3B). The performance of the offline algorithm quantifies two important metrics. First, it establishes the loss in performance due to online (as opposed to offline) processing. Second, it establishes the lowest error that could be achieved by the NSM method for the given data set. The lowest error is not necessarily zero due to the finite size of the data set. This method is not perfect because the projected gradient descent may get stuck in a local minimum of equation 2.6.

We also tested whether the learned synaptic weights of our network match our theoretical predictions. In Figure 4A, we show examples of learned feedforward and recurrent synaptic weights at and what is expected from our theory, equation 3.28. We observed an almost perfect match between the two. In Figure 4B, we quantify the convergence of simulated synaptic weights to the theoretical prediction by plotting a normalized error metric defined by .

### 4.2 Mixture of Random Uniform and Exponential Sources

Our algorithm can demix sources sampled from different statistical distributions. To illustrate this point, we generated a data set with two uniform and three exponential source channels. The uniform sources were sampled as before. The exponential sources were either zero (with probability 0.5) or sampled from an exponential distribution, scaled so that the variance of the channel is 1. In Figure 5A, we show that the algorithm succesfully recovers sources.

To test the denoising capabilities of our algorithm, we created a data set where source signals are accompanied by background noise. Sources to be recovered were three exponential channels that were sampled as before. Background noises were two uniform channels that were sampled as before, except scaled to have variance 0.1. To denoise the resulting five-dimensional mixture, the prewhitening layer reduced its five input dimensions to three. Then the NSM layer successfully recovered sources (see Figure 5B). Hence, the prewhitening layer can act as a denoising stage.

### 4.3 Mixture of Natural Scenes

Next, we consider recovering images from their mixtures (see Figure 6A), where each image is treated as one source. Four image patches of size pixels were chosen from a set of images of natural scenes previously used in Hyvärinen and Hoyer (2000) and Plumbley and Oja (2004). The preprocessing was as in Plumbley and Oja (2004): (1) images were downsampled by a factor of 4 to obtain patches, (2) pixel intensities were shifted to have a minimum of zero, and (3) pixel intensities were scaled to have unit variance. Hence, in this data set, there are sources, corresponding image patches, and a total of samples. These samples were presented to the algorithm 5000 times with randomly permuted order in each presentation. The mixing matrix, which was generated randomly, is given in appendix C.

In Figure 6B, we show the performances of all algorithms we implemented in this task. We see that our algorithms, when compared to fastICA and nonnegative PCA, perform much better.

## 5 Discussion

In this letter, we presented a new neural algorithm for blind nonnegative source separation. We started by assuming the nonnegative ICA generative model (Plumbley, 2001, 2002) where inputs are linear mixtures of independent and nonnegative sources. We showed that the sources can be recovered from inputs by two sequential steps: (1) generalized whitening and (2) NSM. In fact, our argument requires sources to be only uncorrelated, and not necessarily independent. Each of the two steps can be performed online with single-layer neural networks with local learning rules (Pehlevan & Chklovskii, 2014, 2015a). Stacking these two networks yields a two-layer neural network for blind nonnegative source separation (see Figure 2). Numerical simulations show that our neural network algorithm performs well.

Because our network is derived from optimization principles, its biologically realistic features can be given meaning. The network is multilayered because each layer performs a different optimization. Lateral connections create competition between principal neurons, forcing them to differentiate their outputs. Interneurons clamp the activity dimensions of principal neurons (Pehlevan & Chklovskii, 2015a). Rectifying neural nonlinearity is related to nonnegativity of sources. Synaptic plasticity (Malenka & Bear, 2004), implemented by local Hebbian and anti-Hebbian learning rules, achieves online learning. While Hebbian learning is famously observed in neural circuits (Bliss & Lømo, 1973; Bliss & Gardner-Medwin, 1973), our network also makes heavy use of anti-Hebbian learning, which can be interpreted as the long-term potentiation of inhibitory postsynaptic potentials. Experiments show that such long-term potentiation can arise from pairing action potentials in inhibitory neurons with subthreshold depolarization of postsynaptic pyramidal neurons (Komatsu, 1994; Maffei, Nataraj, Nelson, & Turrigiano, 2006). However, plasticity in inhibitory synapses does not have to be Hebbian, that is, require correlation between pre- and postsynaptic activity (Kullmann, Moreau, Bakiri, & Nicholson, 2012).

For improved biological realism, the network should respond to a continuous stimulus stream by continuous and simultaneous changes to its outputs and synaptic weights. Presumably this requires neural timescales to be faster and synaptic timescales to be slower than that of changes in stimuli. To explore this possibility, we simulated some of our data sets with a limited number of neural activity updates (not shown) and found that about 10 updates per neuron are sufficient for successful recovery of sources without significant loss in performance. With a neural time scale of 10 ms, this should take about 100 ms, which is sufficiently fast given that, for example, the temporal autocorrelation timescale of natural image sequences is about 500 ms (David, Vinje, & Gallant, 2004; Bull, 2014).

It is interesting to compare the two-layered architecture we present to the multilayer neural networks of deep learning approaches (LeCun, Bengio, & Hinton, 2015):

For each data presentation, our network performs recurrent dynamics to produce an output, while the deep networks have feedforward architecture.

The first layer of our network has multiple neuron types, principal and interneurons, and only principal neurons project to the next layer. In deep learning, all neurons in a layer project to the next layer.

Our network operates with local learning rules, while deep learning uses backpropagation, which is not local.

We derived the architecture, the dynamics, and the learning rules of our network from a principled cost function. In deep learning, the architecture and the dynamics of a neural network are designed by hand; only the learning rule is derived from a cost function.

In building a neural algorithm, we started with a generative model of inputs, from which we inferred algorithmic steps to recover latent sources.

These algorithmic steps guided us in deciding which single-layer networks to stack. In deep learning, no such generative model is assumed, and network architecture design is more of an art. We believe that starting from a generative model might lead to a more systematic way of network design. In fact, the question of generative model appropriate for deep networks is already being asked (Patel, Nguyen, & Baraniuk, 2016).

## Notes

^{1}

In his analysis Plumbley (2002) assumed (mixture channels are the same as source channels), but this assumption can be relaxed, as we show.

^{2}

Without loss of generality, a scalar factor that multiplies a source can always be absorbed into the corresponding column of the mixing matrix.

^{3}

**Proof:** The net input to neuron at the claimed fixed point is . Plugging in equation 3.26 and , and using equation 2.5, one gets that the net input is , which is also the output since sources are nonnegative. This fixed point is unique and globally stable because the NSM neural dynamics is a coordinate descent on a strictly convex cost given by .

## Appendix A: Convergence of the Gradient Descent-Ascent Dynamics

. This implies that and . is in the null space of . Since is with , the null space is -dimensional, and one has degenerate eigenvalues.

. Substituting for in the first equation of equation A.3, this implies that . Hence, is an eigenvector of . For each eigenvalue of , there are two corresponding eigenvectors . can be solved uniquely from the first equation in equation A.3.

Hence, there are degenerate eigenvalues and pairs of conjugate eigenvalues , one pair for each eigenvalue of . Since are real and positive (we assume is full rank and by definition ), real parts of all are negative and hence the neural dynamics, equation 3.9, is globally convergent.

## Appendix B: Modified Objective Function and Neural Network for Generalized Prewhitening

While deriving our online neural algorithm, we assumed that the number of output channels is reduced to the number of sources at the prewhitening stage (). However, our offline analysis did not need such reduction; one could keep for generalized prewhitening. Here we provide an online neural algorithm that allows .

First, we point out why the prewhitening algorithm given in the main text is not adequate for . In appendix A, we proved that the neural dynamics described by equation 3.9 converges to the saddle point of the objective function, equation 3.6. This proof assumes that is full rank. However, if , this assumption breaks down as the network learns because perfectly prewhitened has rank (low rank) and a perfectly prewhitening network would have , which would also be low rank. We simulated this network with and observed that the condition number of increased with and the neural dynamics took longer to converge. Although the algorithm was still functioning well for practical purposes, we present a modification that fully resolves the problem.

Using this cost function, we will derive a neural algorithm that does not suffer from the described convergence issues even if . We now need to choose the parameter , and for that we need to know the spectral properties of .

^{4}Synaptic weight updates are the same as before, equation 3.10. Finally, this network can be modified to also compute following the steps before.

## Appendix C: Mixing Matrices for Numerical Simulations

## Appendix D: Learning Rate Parameters for Numerical Simulations

For Figures 3 to 6 the following parameters were found to be best performing as a result of our grid search:

. | fastICA . | NPCA . | NSM (activity) . | NSM (time) . |
---|---|---|---|---|

(10, 0.01) | (10, 0.1) | (10, 0.8) | (10, 0.1) | |

(100, 0.01) | (10, 0.01) | (10, 0.9) | (10, 0.01) | |

(100, 0.01) | (100, 0.01) | (10, 0.9) | (10, 0.1) | |

(100, 0.01) | (1000, 0.01) | (10, 0.9) | (10, 0.01) | |

Images | (, 0.01) | (1000, 0.01) | (100, 0.9) | NA |

Two-Layer | Offline | Infomax ICA | Linsker's Algorithm | |

(100, 1, 10, 0.8) | (, 0.001) | (1000, 0.2) | (1000, 0.2) | |

(100, 1, 10, 0.9) | (, 0.001) | (1000, 0.2) | (1000, 0.2) | |

(100, 1, 10, 0.9) | (, 0.01) | (1000, 0.2) | (1000, 0.2) | |

(100, 1, 10, 0.9) | (, 0.01) | (1000, 0.2) | (1000, 0.2) | |

Images | (100, 1, 100, 0.9) | NA | NA | NA |

. | fastICA . | NPCA . | NSM (activity) . | NSM (time) . |
---|---|---|---|---|

(10, 0.01) | (10, 0.1) | (10, 0.8) | (10, 0.1) | |

(100, 0.01) | (10, 0.01) | (10, 0.9) | (10, 0.01) | |

(100, 0.01) | (100, 0.01) | (10, 0.9) | (10, 0.1) | |

(100, 0.01) | (1000, 0.01) | (10, 0.9) | (10, 0.01) | |

Images | (, 0.01) | (1000, 0.01) | (100, 0.9) | NA |

Two-Layer | Offline | Infomax ICA | Linsker's Algorithm | |

(100, 1, 10, 0.8) | (, 0.001) | (1000, 0.2) | (1000, 0.2) | |

(100, 1, 10, 0.9) | (, 0.001) | (1000, 0.2) | (1000, 0.2) | |

(100, 1, 10, 0.9) | (, 0.01) | (1000, 0.2) | (1000, 0.2) | |

(100, 1, 10, 0.9) | (, 0.01) | (1000, 0.2) | (1000, 0.2) | |

Images | (100, 1, 100, 0.9) | NA | NA | NA |

## Acknowledgments

We thank Andrea Giovannucci, Eftychios Pnevmatikakis, Anirvan Sengupta, and Sebastian Seung for useful discussions. D.C. is grateful to the IARPA MICRONS program for support.