## Abstract

As an extension of prior work, we studied inspecific Hebbian learning using the classical Oja model. We used a combination of analytical tools and numerical simulations to investigate how the effects of synaptic cross talk (which we also refer to as synaptic inspecificity) depend on the input statistics. We investigated a variety of patterns that appear in dimensions higher than two (and classified them based on covariance type and input bias). We found that the effects of cross talk on learning dynamics and outcome is highly dependent on the input statistics and that cross talk may lead in some cases to catastrophic effects on learning or development. Arbitrarily small levels of cross talk are able to trigger bifurcations in learning dynamics, or bring the system in close enough proximity to a critical state, to make the effects indistinguishable from a real bifurcation. We also investigated how cross talk behaves toward unbiased (“competitive”) inputs and in which circumstances it can help the system productively resolve the competition. Finally, we discuss the idea that sophisticated neocortical learning requires accurate synaptic updates (similar to polynucleotide copying, which requires highly accurate replication). Since it is unlikely that the brain can completely eliminate cross talk, we support the proposal that is uses a neural mechanism that “proofreads” the accuracy of the updates, much as DNA proofreading lowers copying error rate.

## 1. Introduction

### 1.1. Synaptic Plasticity and Cross Talk.

It is generally believed that synaptic plasticity (i.e., activity-dependent adjustments of synaptic connection strengths) is the basis of most processes in the nervous system, such as development, learning, creation and storage of memories, cognition, and ultimately behavior (Katz & Shatz, 1996). The term plasticity may reflect a variety of phenomena, from actual new synapse creation and deletion, to silencing and unsilencing of existing synapses, to only changes in existing synapse strengths. In 1949, Hebb proposed that learning occurs in response to local signals, such as the conjoint activity of pre- and postsynaptic neurons: “When an axon of cell A is near enough to excite cell B or repeatedly or consistently takes part in firing it, some growth or metabolic change takes place in one or both cells such that A's efficiency, as one of the cells firing B, is increased” (Hebb, 2002).

Those who interpret and use Hebb's rule generally assume that synaptic modifications act in a local, connection-specific manner (i.e., only synapses between the neurons presenting correlated activity are modified, independent of activity at other synaptic sites). In the literature, the most representative models for long-term changes in synaptic efficacy (Malenka & Bear, 2004; Elliott, 2012) are long-term potentiation (LTP; Bliss & Lømo, 1973) and long-term depression (LTD; Lynch, Dunwiddie, & Gribkoff, 1977). A variety of initial studies of long-term potentiation and depression initially reported synapses updates to be local (i.e., “specific”) (Isaac, Nicoll, & Malenka, 1995; Dudek & Bear, 1992). However, ulterior data failed to replicate synaptic specificity (Chevaleyre & Castillo, 2004; Matsuzaki, Honkura, Ellis-Davies, & Kasai, 2004). Rather, they started to suggest that there is “cross talk” that likely occurs during Hebbian plasticity (Kossel, Bonhoeffer, & Bolz, 1990; Bonhoeffer, Staiger, & Aertsen, 1989; Engert & Bonhoeffer, 1997; Schuman & Madison, 1994; Bi, 2002; Bi & Poo, 2001)—that activity-induced synaptic modification may trigger changes in other, unstimulated synapses (possibly the ones that are geometrically close to or adjacent to the target ones). More recent experimental work (Harvey & Svoboda, 2007) has shown quite unequivocally that induction of LTP at one synapse increases the likelihood of LTP to be induced at closely neighboring synapses.

This source of “error,” or noise, is believed to be due to the imperfection of chemical synaptic transmission, in which some degree of diffusion of neuromessengers combines with the high synapse density (especially for highly connected neurons), making it difficult, or even impossible, for a triggered synaptic change to remain completely connection specific.

A proposed list of such factors that contribute to cross talk (Elliott, 2012) includes early-phase LTP/LTD presynaptic (Bonhoeffer et al., 1989; Kossel et al., 1990; Schuman & Madison, 1994) or postsynaptic (Engert & Bonhoeffer, 1997; Harvey & Svoboda, 2007) diffusion of intracellular (Harvey & Svoboda, 2007; Harvey, Yasuda, Zhong, & Svoboda, 2008) and extracellular messengers (Lemann, Gottmann, & Heumann, 1994; Korte et al., 1995; Levine, Dreyfus, Black, & Plummer, 1995), as well as late-phase LTP and LTD factors, on longer timescales (Frey & Morris, 1998; Navakkode, Sajikumar, & Frey, 2004; see also section 4). The necessity for close synaptic packing (DeFelipe, Marco, Busturia, & Merchán-Pérez, 1999) creates a geometric conflict. In NMDA-mediated sites, for example, the spine neck must be sufficiently narrow to reduce Ca escape to other sites (Koch & Zador, 1993; Sabatini, Oertner, & Svoboda, 2002), but also sufficiently wide to allow synaptic currents through. In this light, complete chemical isolation and accuracy seem, and may indeed be, impossible to achieve in the brain.

### 1.2. Plasticity Models and the Effects of Cross Talk.

A variety of models have been used to investigate the effects of synaptic cross talk on brain function. Since many different models can produce the same behavior, it is not possible to use behavior to test whether a model is correct; rather, models can be used to determine whether certain types of interactions are capable of replicating certain outcomes, generating testable hypotheses. In our context, modeling is used to predict in principle whether and when cross talk can lead to a complete breakdown in the outcome otherwise obtained in the synapse-specific case.

In most mathematical models of synaptic plasticity, the system develops, or learns, one or more patterns of synaptic configurations, which are typically stable equilibria but could also be cycles or more complex invariant sets in the case of nonlinear models (Wiskott & Sejnowski, 1998; Elliott, 2003). In this framework, synaptic cross talk can be regarded as an internal noise parameter, whose increase may not only alter performance but, past a critical value, may trigger radical crashes (bifurcations) in the system's dynamics, actually destroying its capacity to reach the stable states (the desired developmental or learning outcomes). It has been argued that in order to avoid such crashes, very accurate connection strength adjustments must be required but that such levels of accuracy are biophysically impossible (Cox & Adams, 2009). Furthermore, it has been shown that the critical level of cross talk sufficient to induce bifurcations in these models is very sensitive to the input statistics and postsynaptic connectivity, and in some cases, it can be made arbitrarily small (Elliott, 2012). Either way, many nonlinear models of synaptic plasticity are fatally compromised by even tiny amounts of cross talk (Elliott, 2012), supporting the idea that some parallel circuitry (proofreading) might be necessary to boost robustness to synaptic inspecificity, and thus permit or facilitate useful development and learning, even in the presence of cross talk (see section 4.3 for additional comments on proofreading).

The possibility that synaptic cross talk can have such catastrophic effects makes it very important for us to assess its impact on nonlinear models of synaptic plasticity as a way toward understanding its actual impact in the brain. One cannot expect, however, a generic proof of principle for all learning models, especially given the vastness of the field; rather, one can point out relevant examples of such behavior in models that are biologically plausible.

We study here the effect of cross talk in the Oja rule, a very simple, multiplicative normalization of Hebbian learning. Oja's model is driven only by second-order statistics, hence works as a principal component (PCA) rather than an independent component analyzer (ICA; Cox & Adams, 2009). We are not proposing that the brain actually does PCA, but we consider this very simple particular case of the general unsupervised learning problem because it is completely tractable by a combination of analytical and numerical tools. While our approach incorporates some aspects of biological realism, many simplifications are made along the way (described in the following sections) with the goal to investigate cross talk in a simple and relevant context rather than to propose a detailed model of biological learning. Although the existence of stable equilibria relates here only to second-order input statistics, this model captures a feature observed in other nonlinear, more elaborate models: synaptic cross talk is able to induce catastrophic breakdowns in learning in a manner that is highly idiosyncratic, depending in a very input-specific and model-specific manner on the learning rule.

The rest of the letter is organized as follows. In section 2, we present the model (the Oja rule in the presence
of cross-talk, or “inspecificity”) and some properties of the
input patterns to be learned, and we provide an overview of the basics of the
rule's dynamic behavior. In section 3.1 we investigate numerically the three-dimensional Oja inspecific network; we
focus in particular on how it processes different classes of input
distributions, preserving some of the dynamical aspects found in the
two-dimensional phase plane (Rădulescu & Adams, 2013), but also introducing new features specific to
higher dimensions. In section 3.2, we
study analytically, in an *n*-dimensional example, the behavior
observed numerically in the previous section. In section 4, we put the numerical and analytical results in the
biological context of a learning cortical network. Section 4.1 focuses on the meaning and importance of input
bias and on its effects in conjunction with cross talk. Section 4.2 discusses the biological plausibility of an
Oja-type learning model and reviews a possible biophysical implementation of the
rule, as described in the literature. Section 4.3 briefly discusses the analogy between neural cross talk and DNA
copying errors, and the necessity of a proofreading mechanism in both cases.

## 2. Methods

### 2.1. The Oja Model with Synaptic Cross Talk.

Oja (1982) showed that a simple neuronal model can perform unsupervised learning based on Hebbian synaptic weight updates incorporating an implicit “multiplicative” weight normalization to prevent unlimited weight growth (von der Malsburg, 1973). Oja's rule has been extensively studied and used (Hertz, Krogh & Palmer, 1991; Taylor & Coombes, 1993) in its original or modified forms (Oja & Karhunen, 1985; Diamantaras & Kung, 1996).

*n*input neurons,

*n*signals drawn from an input distribution , transmitted via synaptic connections of strengths . The resulting scalar output

*y*is generated as the weighted sum of the inputs . The synaptic weights are modified by implementing first a Hebb-like strengthening proportional to the product of

*x*and

_{i}*y*(), followed by an approximate “normalization” step, maintaining the Euclidean norm of the weight vector close to one, where

*yx*is the effective change in the synaptic strength

_{i}*w*, while can be interpreted as a “decay,” or “forgetting,” term. The input covariance matrix can be used as an appropriate long-term characterization of the inputs to study the asymptotic convergence of the expected weight vector . Then equation 2.1 becomes or, in continuous time, the case studied in this letter, Since it depends only on second-order statistics of the incoming input, this model acts as a principal component analyzer for the input distribution (Oja, 1982), one simplified way of modeling data compression and transmission in the brain. Although the normalization is implemented in this equation via an approximation, one can easily check that when , so that the

_{i}*n*-dimensional sphere is an attracting hypersurface for the system (in particular, the stable equilibria are the two normalized principal eigenvectors of

**C**, which lie on this sphere).

In previous work, we (Rădulescu, Cox, & Adams, 2009; Rădulescu & Adams, 2013) and others (Botelho & Jamison, 2002, 2004) have examined how cross talk affects the Oja model. We
formalized the effects of synaptic cross talk via a time-dependent (but not
input or weight-dependent) error matrix , whose elements reflect at each time *t* the
fractional contribution that the activity across weight makes to the update of .

*y*

^{2}and its multiplication by (see section 4.2 for a more extensive discussion of the biophysical implementation of these steps). The average (mean field) form of the rule becomes (since is input and weight independent) where, as before, . The average error matrix has positive entries, and is symmetric and equal to the identity matrix for zero cross talk. To fix our ideas, we considered the error matrix

**E**to be isotropic, that is, of the form where represents synaptic cross talk, or “error,” and is the synaptic “quality,” satisfying .

One can easily show that equation 2.5 preserves the dot product (where , for all ). Furthermore, an equilibrium for equation 2.5 is an eigenvector of **EC**,
normalized so that , where is its corresponding eigenvalue of **EC**.

Notice that equation 2.5 has
equilibria that are tightly related to those of the averaged corresponding form
of equation 2.4; our working
form is, however, simpler computationally, in the sense that stability of
equilibria is more easily tractable. We have shown that the eigenvalues of the
Jacobian matrix at an equilibrium **w** are given by and , where and are the *n* eigenvalues of **EC** (noting first that , the completion of **w** to a basis of eigenvectors
of **EC**, orthogonal with respect to the dot product , also forms an eigenvector basis for the Jacobian). We
concluded that if **EC** has a unique largest eigenvalue (which is
generically true), then a normalized eigenvector **w** is a local
hyperbolic attracting equilibrium for equation 2.5 iff it corresponds to this maximal
eigenvalue. If **EC** has a multiple largest eigenvalue, the system
will have a set of nonisolated, neutrally attracting equilibria (all normalized
eigenvectors spanning the principal eigenspace in this case of dimension ). Some of the computations are summarized in appendix A (e.g., a description of the attraction
basins, supporting the absence of cycles in the phase space) and are expanded in
more detail in our previous work (Rădulescu et al., 2009; Rădulescu & Adams, 2013).

Since the nature and position of the equilibria depend on the spectral properties
of **EC**, the next task is to study the spectral changes of **EC** when perturbing the system by increasing cross talk. In our
previous work on the model, we investigated the effects of cross talk on the
system's dynamics and their dependence on the characteristics of the input
distribution (correlation sign, degree of bias). However, in our first study, we
considered learning only of positively correlated *n*-dimensional
input distributions; we found a smooth degradation of the learning outcome with
increasing error but no sudden changes in dynamics (Rădulescu et al., 2009). In our second study, we showed
that negatively correlated inputs can induce a bifurcation (stability swap of
equilibria, through a critical stage) when increasing the error, even in a case
as simple as a two-dimensional system. This bifurcation occurred only in the
case of unbiased inputs (Rădulescu & Adams, 2013), and we interpreted it in the context of ocular
dominance and input segregation.

For our general computations, we assume that the inputs have mutual covariances *c* uniform in absolute value, and small with respect to the
diagonal variances. More precisely, we assume , making the matrix diagonally dominant (see section 2.2 for considerations on the input
statistics). Throughout the letter, will be called the input biases. Without loss of generality,
we set . For any , we say that the input has bias loss of order *k* if . In particular, we say that the input is unbiased if it has
bias loss of order *n*, that is, if . Although the background covariance is taken for simplicity to be uniform in absolute value, we
expect the inspecific learning rule to lead to interesting dynamics, in
particular when the inputs exhibit a certain degree of mutual correlation.

### 2.2. Oja's Rule and the Input Statistics.

The goal of this work is to investigate the effects of one particular aspect of biological realism (cross talk) in the context of a model that is otherwise as transparent as possible. We chose the Oja principal component analyzer as a widely known and simple example of a Hebbian model of unsupervised learning, important in cortical processing (Hinton & Sejnowski, 1999) and involving repeated adjustment driven only by statistical properties of the input. While a connectionist model may capture some of the desired basic aspects of learning dynamics, the situation in the brain is far from being this simple.

To begin the Oja model may appear rather unbiological by its very use of a rate-coding scheme and a simple multiplicative Hebbian learning rule, in conjunction with a local (and controversially plausible) normalization procedure (section 4.2 gives more detail on possible empirical bases of the rule and their implementation). While in our approach we incorporated cross talk, we neglected many other biological aspects inherent in synaptic transmission (e.g., timed spikes, external noise, temporal correlations, synaptic homeostasis; Cox & Adams, 2009); a more biologically realistic model would use spike-timing-dependent plasticity and natural inputs (Hyvärinen, Hurri, & Hoyer, 2009). We simply used positive or negative continuous-time activations and weights (one can interpret negative weights as disconnections), and we assumed the input patterns to be zero mean and have identical mutual correlations (Rădulescu et al., 2009). More elaborate models, incorporating detailed spiking patterns, may automatically learn the principal component of the zero-mean inputs, without explicit centering or normalization. Gerstner and Kistler (2002) have developed a model that assumes an Oja-type rate-coding scheme, with Poisson spikes and spike-time-dependent plasticity with LTP and LTD lobes, and postsynaptic spikes triggered by presynaptically generated EPSPs. One could in principle study the effects of cross talk on such a model by applying an error matrix to the LTP or LTD parts; a direct analysis, however, might turn out to be much more difficult than in the case at hand here.

Since our analysis focuses on symmetric matrices **C** with positive or
negative off-diagonal elements, we have to ask whether and when such a matrix
can constitute the covariance matrix of a centered *n*-dimensional distribution. While establishing equivalent
conditions may be difficult even for small dimensions (Vasudeva, 1998), one can find sufficient criteria (e.g., any
positive semidefinite **C** is a covariant matrix).^{1}

In our initial computations, we assumed sufficiently weak pairwise correlations
to make **C** diagonally dominant (in this case, equivalent to ). Any symmetric diagonally dominant matrix with nonnegative
diagonal entries is automatically positive semidefinite, hence a covariance
matrix. Such segregated inputs can be found in a variety of contexts in the
brain. For example, studies of cortico-striate projections (Yim, Aertsen, &
Kumar, 2011) have observed weak pairwise
correlations within the pool of inputs to individual striatal neurons, which are
believed to enhance the saliency of signal representation in the striatum. On
the other hand, **C** will not remain diagonally dominant for strong
pairwise correlations, which are also likely to occur biologically. A known
example of cells with strongly correlated activity is that of retinal ganglion
cells, placed in topographic proximity of each other and innervating the same
cell in the LGN (Mastronarde, 1989; Trong
& Rieke, 2008). Our work in sections 3.1 and 3.2 assumes diagonal dominance (as a mathematical
convenient assumption that allows us to establish a useful classification and
illustrate typical behaviors that can occur in the system). In appendix C, we complete our analysis with a
numerical approach to a larger collection of matrices, with extended parameter
ranges.

### 2.3. The Error Matrix.

Together with the uniform magnitude of input cross-correlations (i.e, uniform
absolute value of the off-diagonal elements of **C**), we also
assumed, for simplicity, uniform error (the Hebbian adjustment of any weight was
equally affected by error and did not depend on either the strength of that
weight or on geometry). Such “isotropicity” seems like a
reasonable basic assumption and has been discussed in our previous work
(Rădulescu et al., 2009;
Rădulescu & Adams, 2013).
Furthermore, it allowed us to identify other features of the input distribution,
crucially consequential on the learning dynamics and outcome: the sign of the
mutual correlations and the input bias. However, cross talk has been documented
experimentally, for technical reasons, mostly between synapses that are
anatomically neighboring each other (Harvey & Svoboda, 2007; Bi, 2002;
Bonhoeffer et al., 1989). In previous work
(Rădulescu et al., 2009), we have
justified isotropicity based on the fact that individual cortical connections
are composed of multiple synapses scattered over the dendritic tree (Varga, Jia,
Sakmann, & Konnerth, 2011; Chen,
Leischner, Rochefort, Nelken, & Konnerth, 2011; Jia, Rochefort, Chen, & Konnerth, 2010), but we have also considered other (more
metric-dependent although all symmetric) forms of **E**. Cross-talk
effects could probably be captured when using more general, nonisotropic forms
for **E** without affecting the main conclusions. In this letter, the
distinction between local and global cross talk is not that relevant, since our
main results concern a low- (three-)dimensional network.

## 3. Results

### 3.1. Classes of Inputs and Bias Effects on Three-Dimensional Dynamics.

In this section, we study how input patterns can influence the effects of cross talk in driving the dynamics of a three-dimensional network—the lowest dimension for which the question applies but which seems to capture the essence of this problem even in higher-dimensional systems. In this section, we inspect all combinatorial possibilities of input bias and correlation sign (as defined below) and determine the effect of increasing cross talk on dynamics in each case. In section 3.2 and the appendixes, we support with rigorous proofs some of the results obtained through numerical simulations (we used Matlab software package, version 7.2.1).

We found that in special highly unbiased cases, cross talk has no effect on the presence and position of the asymptotic attractors (see Figure 1E). In other cases, the depreciation of the asymptotic outcome with error is so slow that small levels of cross talk have virtually no effect on learning (see Figure 1A and 1B; also see Figure 2 for a phase-space illustration). Other significant classes of inputs, however, showed a sudden change of the attractor states, from a reliable principal component estimator to an almost orthogonal direction. This occurred either in the form of an eigenvalue swapping bifurcation in dynamics (producing the instantaneous loss of learning accuracy at a critical error value; see Figures 1C and 3 for an illustration of phase-space transitions) or in the milder form of an eigenvalue “avoided crossing,” (inducing a smooth yet very steep depreciation of the learned direction at a specific error; see Figures 1D and 1G). As discussed in our previous work, bifurcations and avoided crossings can be practically indistinguishable: learning works reasonably well for small enough errors. For errors past the crash value, the outcome becomes irrelevant to the input statistics, and the system is essentially encoding information on the cross-talk pattern itself.

None of these possibilities is a priori excluded in the brain, but previous work has suggested that nature may favor bias. Segregated outcomes (disconnected completely, forming wiring patterns that are then subject to more subtle synaptic learning) are considered to be an important part of normal development. In our previous work, we argued that cross talk seems to act against this desymmetrizing tendency and prevent segregation, especially for inputs close to unbiased. We viewed this as a limitation of symmetry-breaking mechanisms that generate specific wiring, and we further argued that other factors, such as strong mutual inhibition (large negative correlations) or special specificity-enhancing circuitry (“proofreading”), might act to overcome the equalizing effect of cross talk. The current study completes this idea with new aspects.

One can say, then, that efficient cross-talk-induced segregation happens in our
model for a balance of positive and negative correlations in the input
distribution. Since the presence, number, and strength of the negative
correlations appeared to be crucial in determining the behavior of the system,
we defined a formal classification of all possible correlation matrices based on
the number of negative upper-diagonal entries of **C** and then used
the three classes to understand the corresponding behavior with respect to cross
talk.

We distinguished four combinatorial classes: **Class (+, +, +)**,
comprising the unique matrix configuration with all positive entries; **Class (+, +, −)**, made of the three matrix configurations
with one negative upper-diagonal entry; **Class (+, −,
−)**, for the three configurations with two negative
upper-diagonal entries; **Class (−, −, −)**, for
the one configuration with all negative off-diagonal entries. We studied the
matrix **EC**, and the differences that occur in its spectrum when
considering different classes of input, in conjunction with different degrees of
bias: from fully biased () to partly biased () to fully unbiased (). In this section, *q* will be restricted to
the interval (1/3, 1] (representing quality higher than error). Based on these
combinatorial classes of input, we distinguished three main qualitative
behaviors: separated leading eigenvalues, crossing leading eigenvalues and
“avoided crossing.”^{2}

#### 3.1.1. Separated Leading Eigenvalues.

The largest eigenvalue remains separated from the second largest eigenvalue
for the whole range of *q* (as illustrated in Figure 1A, top panel), determining the
corresponding leading eigenvector to gradually drift from the direction of
the principal component of **C**, as *q* decreases
(blue curve in Figure 1A, bottom
panel). For any value of *q*, the system has two
hyperbolically attracting equilibria: the normalized principal eigenvectors
of **EC**, whose basins are separated by an invariant plane. In
Figure 2, we show the evolution of a
set of trajectories to illustrate convergence to the two attractors in the
phase space, as well the dynamics within the separating plane.

In the presence of cross talk, the network will process the input in a very
similar qualitative fashion as in absence of cross talk, observing the main
statistical trends, even though the quantitative outcome might be slightly
or more substantially altered, depending on the input pattern and the degree
of cross talk. Depending on parameters, the eigenvalue curves with respect
to *q* may exhibit a significant point of minimal separation,
where the learning outcome (leading eigenvector of **EC**)
deteriorates very fast (see section 3.1.3).

This case is generally associated with biased inputs (the only possible
behavior when ). That is, no negative correlations are required to
maintain segregated inputs in their segregated state when cross talk is
introduced. However, this behavior can be found in conjunction with loss of
bias, provided the mutual negative correlations are limited: it also appears
in partial loss of bias () for class **(+, +, +)** (see Figures 1B and 2), as well as in full loss of bias () for classes **(+, +, +)** and **(+, +,
−)** (see Figures 1G, 1E, and 2).

An interesting, quite extreme case of separated eigenvalues occurs for
symmetric inputs that are fully unbiased and all positively correlated: the
leading eigenvalue is separated from the second eigenvalue (which has
multiplicity two), but neither the leading eigenvalue nor the corresponding
eigenvector of **EC** changes when the cross talk is increased.
Hence, in this case, the learning is fully accurate for any degree of cross
talk (see Figure 1E); one may argue
that this particular class of input statistics is completely error
proof.

#### 3.1.2. Crossing of Leading Eigenvalues.

This behavior sits, in a sense, at the opposite pole of the “separated
eigenvalues” case, and in its most standard form, it is typical to
partial loss of bias () in combination with all negative correlations, that is,
class **(−, −, −)**; see Figure 1C and section 3.2. The term describes an instantaneous swap of the attractors
from one eigendirection to another direction that could be as much as
orthogonal to the original principal component swap, which produces a crash
in the learning outcome. This behavior occurs when the two leading
eigenvalue branches cross and switch at a critical value of the quality . (We have described this phenomenon in a two-dimensional
model in Rădulescu & Adams, 2013.) Very small levels of cross talk () in fact have very little effect on learning in this case.
Although the leading eigenvalue changes, the direction of the leading and
attracting eigenvector is preserved, so that the system will converge to the
same outcome as in the absence of error.

This may seem like a very desirable input distribution to learn in the
presence of low cross talk; however, one has to keep in mind that if the
cross-correlations are small in absolute value with respect to the variance *v*, then the
critical gets arbitrarily close to 1. Such perfect learning will
therefore happen only when inspecificity is infinitesimally small, which
makes this scenario lose its appeal, especially when we recall that at the
end of the “good” interval lies the bifurcation, crashing the
equilibrium to a direction completely irrelevant to the input statistics. In
this light, one might expect the network to have an additional, quite
precise estimator of the degree of cross talk involved, so that when
learning an irrelevant outcome, it would at least be aware of it. Any slight
error of the system toward miscalculating the limits for the permissible
error could have dire consequences.

In Figure 3, we represent three phase-space plots: before, at, and after the bifurcation point . While Figures 3A and 3C illustrate the typical phase space with two hyperbolically stable equilibria (one representing accurate, error-free learning and the other inaccurate learning for a postcritical error), the phase space at the bifurcation point is qualitatively different: the system has no hyperbolic attractors but rather a closed curve (ellipse) of half-stable equilibria (neutral along the direction of the curve). Clearly, the outcome of learning is in this case extremely dependent on the initial conditions (although, as we commented in Rădulescu & Adams, 2013, the stochastic version of the system will have noise-driven stationary solutions that drift around this neutrally attracting ellipse).

The neutrally attracting ellipse phase-plane dynamics is not specific to this
critical bifurcation state (and thus it cannot be ignored as improbable in
the context of generic behavior). For some classes of inputs, such an
attracting-ellipse slice represents the natural state of the cross-talk free
system and persists for an entire inspecificity range (see Figure 4). This is the case for bias of order
two () when occurring in conjunction with substantial negative
correlations, that is, classes **(+, −, −)** and **(−, −, −)**. The computations are quite
simplified in the absence of any bias, so for the case of fully unbiased
inputs we carried out analytically a complete classification in theorem 1 in
appendix A. We describe these two
fully unbiased cases in more detail below.

We found that in instances of highly unbiased inputs, learning may lead to an
ambiguous outcome even in the absence of cross talk (see Figures 1F, 1H, and 4). Indeed, in the
cross-talk-free class **(+, −, −)**, the matrix **C** has a double leading eigenvalue to begin, and the system
has a whole closed curve of neutrally attracting equilibria (in the
eigenplane spanned by the corresponding eigenvectors). When cross talk is
introduced, the two leading eigenvalues segregate, and one of the
eigenvectors takes over, which determines an immediate complete switch in
the learning outcome. In this case, even the smallest degree of
inspecificity leads to favoring one specific direction, slightly detaching
off the plane that contains the curve of accurate equilibria (notice that
the cosine of the accuracy angle, represented by the blue curve in Figure 1F, does not fall too far off the
perfect value ).

We may interpret this as the error helping the system “make up its mind” in the presence of too much ambiguity in the input statistics. This is an occurrence we have not encountered in our previous, more restrictive versions of the model, since it requires inputs with concomitant negative cross-correlations and loss of bias of order >2. This ambiguity can be interpreted as the basis of a competitive process in which any input channel has equal chances to win. Competitive dynamics has been studied at large in developmental and learning models in the context of imposed (by means of multiplicative or subtractive normalization) or emergent competition. It has become clear that a linear Hebb rule, even when coupled with a multiplicative normalization or winner-takes-all type nonlinearities, is not able to produce segregation of positively correlated inputs (von der Malsburg, 1973; Goodhill & Barrow, 1994; Miller & MacKay, 1994). When used in conjunction with unbiased inputs, it will lead to an equal-weight outcome (Dayan & Abbott, 2002). A variety of known nonlinear mechanisms can break the inherent symmetry, even when the input per se does not favor segregated outcomes (Elliott, 2003), including subtractive normalization (Miller & MacKay, 1994; Goodhill & Barrow, 1994), the BCM rule (Bienenstock, Cooper, & Munro, 1982) and spike-time-dependent-plasticity (Elliott, 2008). As interpreted in one of our previous discussions on ocular dominance wiring (Rădulescu & Adams, 2013), such mechanisms may lead, for example, to ocular segregation under unbiased statistics (the two eyes are likely receiving similar, positively correlated inputs from the visual field). One context that permits segregation under multiplicative normalization is having negatively correlated inputs.

Our current analysis illustrates this issue and shows that when sufficient
negative correlations are present, the fashion in which the cross talk
handles inherent input ambiguity or competition depends quite significantly
on the number (and, to a lesser extent, the positions) of the negative
mutual correlations within the input. In our model, at least two negative
mutual correlations are necessary for cross talk to produce segregation of
symmetric inputs. For two out of three negative correlations, even the
smallest degree of cross talk helps the system make an asymptotic selection
for one particular direction in the eigenspace spanned by the multiple
eigenvalue. For all negative correlations, no small degree of cross talk can
resolve this competitive state. The level of critical cross talk that can
finally destroy the curve of neutrally stable equilibria also pushes the
system to learn an orthogonal direction, hence becomes irrelevant to the
main features of the original input statistics. Indeed, in the
cross-talk-free class **(−, −, −)**, the
matrix **C** has a double leading eigenvalue, and the system again
has a whole ellipse of neutral equilibria, contained in the corresponding
eigenplane. When subject to errors up to a critical value , the two larger eigenvalues change but remain equal;
furthermore, the subspace spanned by the two corresponding eigenvectors
remains unchanged, hence the learning process preserves the original
ambiguity. Past the critical error value, the eigenvalues swap, and the
eigendirection for the new leading eigenvalue (of multiplicity one) is
orthogonal to the previous plane (see Figure 1H). In other words, past the critical error value, the system
will finally choose a particular direction to learn, but this direction will
be highly inaccurate, and thus the task of learning the input statistics
will be performed very poorly.

#### 3.1.3. “Avoided Crossing” of Leading Eigenvalues.

This can be seen as a hybrid case in which the principal eigenvalues never
actually swap but get very close (arbitrarily close, depending on the values
of *v* and ), so that learning has a significantly rapid depreciation
around the critical value (which also depends on all other parameter values; see the
blue curve in Figure 1D). This
situation can be observed when the input has partial bias loss in mixed
cases from classes (+, +, −) and (+, −, −).

Biologically, such a “pseudobifurcation,” if occurring over a
narrow enough range of *q*, is indistinguishable from a real
bifurcation, induced by crossing eigenvalues; for this reason, we refer to
it as a for-all-practical-purposes (fapp) bifurcation. Since it represents a
sudden (although smooth) depreciation of the principal direction, one may
consider calculating the “susceptibility” or
“sensitivity” of the angle with respect to the quality *q*.

In Figure 5, we illustrate the
difference between the discontinuous breakdown of the derivative in the case of a real bifurcation (discontinuity of ) and the continuous blow-up of in the case of a fapp bifurcation ( has a significant although finite variation over a narrow
interval of *q*). One may regard this dichotomy to be in
principle analogous to the difference between discontinuous and continuous
phase transitions. Formally, an avoided crossing can be defined to produce a
fapp bifurcation if the size of the blow-up exceeds a certain threshold
(which may depend on the particular network and the accuracy level desired
for learning).

With this definition, there are circumstances in which fapp bifurcations can
occur even at arbitrarily small cross talk (*q* arbitrarily
close to 1). For example, Figure 5 shows the difference between the effect of cross talk in the case of two
input distributions, both with loss of bias of degree one. For the first
type of distribution, class (−, −, −), the all-negative
mutual cross-correlations determine eigenvalue crossing (the blue curves,
which exhibit discontinuous blow-ups). The second type, class (+, −,
−), can lead to avoided crossing. We compared the behavior of the
network in these two situations, inspecting a few values of the bias (left panel versus right panel), and mutual
cross-correlation values (different curves in the same panel, as explained in the
caption). We found that increasing the bias and decreasing the cross-correlations transports the point of maximum sensitivity (the location
of the blow-ups) closer to *q*=1. Moreover, the size
of the continuous blow-up (the height of the finite peak in the case of
avoided crossing) gets larger as *q* migrates toward 1, so
that the smaller the values of , the lower the level of cross talk sufficient to produce a
blow-up, and the more indistinguishable the fapp bifurcation looks from the
bifurcation-induced discontinuity. This reiterates the idea that a fapp
bifurcation can be as detrimental to learning as a real bifurcation,
especially since it can arise at arbitrarily small levels of cross-talk,
just like an actual bifurcation.

In appendix C, consider inputs with
stronger pairwise correlations (so that **C** is no longer
diagonally dominant). When we consider high negative mutual correlations,
the fapp bifurcation, associated with arbitrarily small levels of cross
talk, appears in conjunction with an actual bifurcation, at very high
cross-talk levels. This suggests that for such inputs, after undergoing the
fapp degradation in outcome, the system may suddenly reverse to accurate
computation of the learning attractor at very high cross-talk levels.

### 3.2. An Analytical Application in Higher Dimensions.

*n*-dimensional computation, with the aim of showing that the phenomena described in section 3.1 may also apply to describe behavior in a higher-dimensional Oja learning model with cross talk. In previous work (Rădulescu, Cox, & Adams, 2009), we have investigated the

*n*-dimensional case for all positively cross-correlated inputs and showed that it does not induce stability swapping bifurcations, even in higher dimensions. Since negative correlations are the key ingredient for the presence of bifurcations, we consider for our application in this section the case of all negative cross-correlations: where here

*c*>0. Our three-dimensional numerical results suggest that combined covariance matrices that encompass other patterns of positive and negative mutual cross-correlations are expected to produce hybrid dynamics between these two extreme ends. In a higher-dimensional network, the dynamics may depend strongly not only on the number of negative correlations, but also on their distribution and geometry within the covariance matrix. A random matrix approach may help classify the behavior for all input patterns, but this is not within the scope of this study.

In this section, we present only the main analytical results we obtained for our
application; proofs of the statements and additional comments can be found in
appendix D. Propositions 1 and 2
differentiate between behaviors in response to biased versus unbiased *n*-dimensional negatively correlated inputs, and illustrate
a situation that extends the behavior found in the three-dimensional model. As
before, in the case of biased inputs, the eigenvalues remain separated, and the
attracting direction degrades smoothly as the cross talk increases. Moreover,
also similar to the three-dimensional case, order one loss of bias is not enough
to trigger an eigenvalue-crossing bifurcation (for which bias loss of order is required), but may be enough to produce fapp bifurcations.
Depending on the parameter values, both actual and fapp bifurcations can occur
for arbitrarily small levels of cross talk (see Figure 6).

#### 3.2.1. Fully Biased Case.

We consider ; clearly: . In appendix B, we show how these values can be used to partition the real line and separate the roots of . This leads to:

*In the biased case , the matrix EC has n real distinct eigenvalues , for any error *.

*f*>0 iff . As increases from 0 to 1/

_{j}*n*, it traverses the values . When is in the intervals between two consecutive critical values , each two consecutive roots of are separated by at least one . When reaches each critical value , the root crosses from one interval to another through the stage .

#### 3.2.2. Losing the Bias.

Suppose now that for , , and allow some of the ; in the limit, this results in a loss of bias in the
covariance matrix **C** ( for some index *j*). In consequence, . It follows that in the limit of and , so that the maximal eigenvalue of **EC** preserves its multiplicity =1. This situation changes if we introduce
an order two bias loss (i.e., if we make both and approach zero simultaneously). Then and , so that the two leading roots collide into a double root . This justifies the following proposition:

*Suppose . An order k bias loss of the
covariance matrix C of the type results in a leading eigenvalue of multiplicity k−1 for the modified covariance matrix EC*.

## 4. Discussion

### 4.1. Specific Comments on Our Model.

In this study, we considered a learning network based on the classical unsupervised learning model of Oja, extended to incorporate synaptic cross talk; we aimed to show how different input patterns can exacerbate or, on the contrary, efface the effects of cross talk on the asymptotic outcome of learning. We gave central attention to differences in second-order input statistics, studied how cross talk affected the outcome in each case, and observed that the effects can vary widely depending on these second-order statistics.

Efficient cross-talk-induced segregation happens in our model for a balance of positive and negative correlations. It could be argued that the model itself may artificially impose such a condition by being linear Hebbian, with multiplicative normalization. To address this critique, one may chose to study an equivalent model with subtractive normalization; that would, however, produce a different collection of issues, since subtractive normalization may be less biologically plausible. A better solution would be performing a similar cross-talk analysis on a extended nonlinear model with multiplicative normalization. The fact that certain nonlinear Hebbian models are reducible to linear Hebbian models (Miller, 1990; Elliott & Shadbolt, 2002) has led to a general belief that no Hebbian model, linear or nonlinear, can segregate positively correlated afferents under multiplicative normalization. Recently, Elliott and Shadbolt (2002) offered an explicit counterexample.

In this letter, we focus on a rule that is based only on second-order statistics, but the concept of unbiased distribution can be generalized for nonlinear Hebbian rules, sensitive to a lack of bias of higher order. The work of Elliott and others has shown that segregated outcomes are quite typical of nonlinear Hebbian rules with unbiased statistics (Elliott, 2003), and that cross talk can induce bifurcations in these cases (Elliott, 2012). We have suggested before the example of radially symmetric distributions considered by Lyu and Simoncelli (2009), with joint PDF equal density contour lines being nested hyperspheres with nongaussian spacings. We expect that in this setup, completely unbiased (spherical) input statistics would favor no particular direction in the weight space, so that the outcomes would be signed combinations of equal magnitude weights, nontrivially determined by the higher-order correlations. The presence of enough cross talk in the processing of such inputs may amount to suddenly switching the outcome between two such states.

### 4.2. Some Biophysical Aspects of Oja's Rule.

Since our focus is on a biological realistic phenomenon (cross talk), it may seem
odd to study a linear Hebbian model with multiplicative normalization, which may
appear to be very formal and unbiological. But as argued in Rădulescu and
Adams (2013), Oja's rule is not as
biophysically implausible as first appears.^{3}

In our analysis of the Oja rule, we allowed both inputs and weights to be negative. However, if only positive patterns are allowed, the Hebbian part of the rule would always be positive (and correspond to LTP only), and the normalizing part of the rule would always be negative (and represent LTD only). It seems that in the brain, the negative and positive parts of signals are represented using different neurons, such that the two halves of the Oja rule would operate biologically with fixed and opposite polarities (LTP and LTD). However, the overall effect of the biological implementation would be the same as in our version of Oja's rule, which allows either polarity in both parts of the rule.

Experimental studies at single synapses suggest that reliable LTP may be implemented through repeated pairing of correctly timed pre- and postsynaptic spikes, which occur in an all-or-none manner (Petersen, Malenka, Nicoll, & Hopfield, 1998; Markram, Lübke, Frotscher, Roth, & Sakmann, 1997). Averaged over the many synapses comprising a connection, the overall outcome would be the multiplicative Hebbian rule. A simple mechanism for such batching would be if the coincidence-induced calcium increase at a synapse activated (by binding of Ca-Calmodulin) some fraction of its CaMKinase molecules, as follows: after each calcium pulse, Ca-Calmodulin would dissociate but leave some of the CaMKinase molecules phosphorylated; with successive pulses, enough would eventually be activated that the entire set of CaMKinases would fully autophosphorylate, triggering strengthening (Lisman, 1989, 1994; De Koninck & Schulman, 1998).

The normalizing (LTD) part of the Oja rule is, on the other hand, an elegant
implementation of an approximate nonlocal normalization step that leads to a
purely local online rule. Two obvious requirements of its biophysical
implementation are the calculation of *y*^{2} and the multiplication by . Recent work in neocortex (Sjöström, Turrigiano,
& Nelson, 2003, 2004) suggests that LTD occurs in the following way:
backpropagating spikes lead to a synapse-related calcium signal that triggers
endocannabinoid release from the local dendrite, which then diffuses back to the
presynaptic specialization, where it activates a G-protein-coupled
endocannabinoid receptor. If there is near-simultaneous activation of
presynaptic NMDARs by spike-release glutamate, transmitter release is depressed.
This dismisses a previously favored theory (Nevian & Sakmann, 2006) that the level of the spine calcium
achieved by LTP or LTD is a sign determinant of the strength change (Lisman, 1994; Shouval, Bear, & Cooper, 2002). This explanation of LTD seems
well suited to meet the two biophysical requirements of the normalizing part of
the Oja rule (and in this sense, the rule would be more than a formal
description). The calcium-dependent endocannabinoid enzyme triggered by calcium
entering through voltage-dependent channels activated by backpropagating spikes
would implement *y*^{2}, and the multiplication would be achieved by the requirement for
simultaneous activation of the NMDAR. The dependence on could be achieved in two ways: the endocannabinoid signal
might be proportional to the postsynaptic strength of the synapse, or the extent
of activation of the presynaptic NMDAR could depend on the amount of glutamate
released, which would depend on the extent of the active zone, which is known in
the long term to adjust to match the PSD area (and hence presumably the synaptic
strength). Thus, the synaptic strength would slowly adjust, by a combination of
matched but distinct post- and presynaptic adjustments, to reflect the arriving
spikes, in the way required by the Oja rule (Rădulescu & Adams, 2013).

**F**so the averaged rule would become

At first glance, it appears that the normalization errors could cancel out the
Hebbian errors if **F** is appropriately matched to **E** (i.e., both “error-onto-all” with adjustment of quality). Such
cancelation would correspond to a weight erroneously “forgetting”
exactly what it erroneously learns for each pattern. The problem is that while
the averaged values of **E** and **F** are simple and closely
related, the instantaneous values and can be, at least locally, quite different, because one
involves intracellular diffusion and the other extracellular diffusion.
Furthermore, the stability of the algorithm will also be affected. The observed
biological implementation appears to avoid these problems in an elegant way.

### 4.3. General Comments.

In previous work (Rădulescu et al., 2009; Rădulescu & Adams, 2013), we have suggested an analogy and between the Oja rule (even without cross talk) and Eigen's equation of DNA replication and mutation. Indeed, biologically, Darwinian evolution and neural learning are both adaptive processes, encoding inputs based on repeated interactions with the environment (Baum, 2004; Volkenshte, 1991; Adami, 1998), and mathematically, both models describe normalized growth. However, we have argued that unlike Eigen's model, Oja's equation shows a bifurcation at a critical cross-talk value in only very narrow conditions. We have further suggested that while there may not be an actual “isomorphism” (Fernando & Szathmáry, 2009; Fernando, Goldstein, & Szathmáry, 2010) (or other formal mathematical equivalence) between the two models in all parameter ranges, their analogy resides in their common need for accuracy in the adaptation process. While biology is well known for instances in which it affords to be inaccurate, polynucleotide copying requires superaccuracy, and neural learning also seems to require superaccurate synaptic updates (Elliott, 2012; Adams & Cox, 2012).

Indeed, successful and effective reproduction requires copying the entire genome, with an appropriately small error per base rate. The known “proofreading” operation of this replication process is essential in lowering the copying error rate to acceptable levels. The proofreading mechanism copies bases twice, and replication is allowed only when coincidence of the two results is detected. Since proofreading seems to be in general an effective strategy for overcoming physical limitations, it has been proposed that the same operation is being performed in the neocortex in order to ensure the synaptic specificity necessary for effective learning. The mechanism underlying “neural proofreading,” as proposed by Adams and Cox (2012), assigns to each thalamocortical connection (responsible for the tuned responses of cortical neurons) a corticothalamic “proofreading neuron,” which receives and detects “coincidence” between the input and output spikes arriving at that connection and then sends a double signal to both sides of the connection, confirming the validity of the synaptically detected coincidence. Other aspects, consequences, challenges, and limitations of this elaborate neocortical proofreading circuitry are further investigated in Adams and Cox (2012).

## 5. Conclusion

A lot of work has been aimed recently toward finding key biological factors that may explain the network architectures and computational algorithms that the brain develops to perform learning. The fact that the activity-dependent processes that lead to synaptic strength adjustments cannot be completely synapse specific constitutes a central problem for biological learning. While this model considers only a very simple setup, it helps us better illustrate an important idea, which we have formulated previously (Rădulescu et al., 2009; Rădulescu & Adams, 2013): a performant synaptic updating algorithm may not suffice for accurate learning, and the process may fail (partly or completely, depending on the input pattern to be learned) even when faced with only infinitesimal amounts of synaptic cross talk. It appears therefore increasingly possible that high-level (e.g., neocortical) learning may require not only performant learning algorithms but also special apparatus for enhancing specificity (Adams & Cox, 2006). The brain may thus have to dedicate comparable effort to developing proofreading for its plasticity machinery (all the more necessary in the face of inaccuracy that seems to not merely degrade learning but rather is able to prevent it altogether). Our model does not exclude either possibility but suggests that learning problems (and perhaps, more generally, all problems of survival or reproduction) are so diverse that no single algorithm can solve them all, so that no universal or canonical cortical circuit should be expected.

## Appendix A: Stability of Equilibria in the Oja Model

In consequence, **EC** has a basis of eigenvectors, orthogonal with respect
to the dot product .

The following theorem, describing the equilibria of system 2.5, is immediate.

*An equilibrium for the system is any vector such that*

**=(***ECw**w*^{T}**)***Cw***, that is, an eigenvector of***w***(with corresponding eigenvalue ), normalized with regard to the norm , so that :***EC**If we additionally assume (generically) that*

*EC**has a strictly positive maximal eigenvalue of multiplicity one, then the corresponding eigendirection is orthogonal in*

*to all other eigenvectors of*

**.**

*EC***w**to be an equilibrium of system (2.5)—an eigenvector of

**EC**, with eigenvalue . To establish stability, we calculate the Jacobian matrix at

**w**to be Then we get the following:

*Suppose**EC**has multiplicity one largest eigenvalue. An equilibrium**w**(i.e., by theorem 1, an eigenvector of**EC**with eigenvalue*, *normalized so that*) *is a local hyperbolic attractor for equation 2.5 iff it is an eigenvector
corresponding to the maximal eigenvalue of*** EC**.

**w**of

**EC**, with . Then: Recall that the vector

**w**can be completed to a basis of eigenvectors, orthogonal with respect to the dot product . Let , , be any other arbitrary vector in this basis, so that , and . We calculate So is also a basis of eigenvectors for

*Df*

^{E}

_{w}. The corresponding eigenvalues are (for eigenvector

**w**) and (for any other eigenvector ). An equivalent condition for

**w**to be a hyperbolic attractor for system 2.5 is that all the eigenvalues of

*Df*

^{E}

_{w}are <0. Since , this condition is equivalent to having . In conclusion, an equilibrium

**w**is a hyperbolic attractor if and only if (i.e., is the maximal eigenvalue—in other words, if

**w**is in the direction of the principal eigenvector of

**EC**).

Such attractors always exist provided that the condition of theorem 2 is met (i.e., **EC** has a maximal eigenvalue of multiplicity one). Then the network
learns, depending on its initial state, one of the two stable equilibria, which are
the two (opposite) maximal eigenvectors of the modified input distribution,
normalized so that . Next, we aim to show that these two attractors are the system's
only hyperbolic attractors.

*Suppose the the modified covariance matrix EC has a unique maximal eigenvalue . Then the two eigenvectors corresponding to , normalized such that , are the only two attractors of the system. More
precisely, the phase space is divided into two basins of attraction, of w_{EC} and −w_{EC}, respectively, separated by the subspace *.

**EC**. More precisely,

**w**is an eigenvector of

**EC**with eigenvalue iff is an eigenvector of

**A**with eigenvalue ; hence, any two distinct eigenvectors of

**A**are orthogonal in the regular Euclidean dot product.

**v**to be the leading eigenvector of

**A**, and let

**u=u**(

*t*) be a trajectory of the system A.1. We want to observe the evolution in time of the angle between the variable vector

**u**and the fixed vector

**v**, measured as We differentiate and obtain The numerator of this expression is We are interested in the sign of

*h*(

**u**). To make our computations simpler, we can diagonalize

**A**in a basis of orthogonal eigenvectors

**A**=

**P**

^{t}

**DP**, where

**D**is the diagonal matrix of eigenvalues and

**P**is an orthogonal matrix whose columns are the eigenvectors. Then where

**y**=

**Pv**and

**z**=

**Pu**, so that (where is the largest eigenvalue of

**EC**, assumed to have multiplicity one). Then Hence, if

**y**

^{t}

**z**>0, then

*h*(

**u**)>0. In other words, if

**v**

^{t}

**u**>0, —hence, . For our original system, this means that any trajectory starting at a

**w**with converges in time toward the principal eigenvector

**w**of the matrix

_{EC}**EC**.

## Appendix B: A Direct Computation for Unbiased Inputs

*For order two input bias , the dynamic behavior of the system is classified by the
classification of the input covariance sign: (+, +, +), (+, +, −),
(+, −, −) and (−, −, −)*.

**C**

_{1}with

*c*>0, and class (−, −, −) represents Structure

**C**

_{1}with

*c*<0. Class (+, +, −) can be obtained from Structures

**C**

_{2}and

**C**

_{3}with

*c*>0, while class (+, −, −) can be obtained from Structures

**C**

_{2}and

**C**

_{3}with

*c*<0.

Computing directly the spectrum for **C**_{1}, we get one simple error-independent eigenvalue (whose eigenvector is also error independent) and one double
eigenvalue . If *c*>0 (class (+, +, +)), always dominates (see Figure 1E). If *c*<0 (class (−, −,
−)), the double eigenvalue takes over for error smaller than the critical value (see Figure 1H).

Also by direct computation, one notices that **C**_{1} and **C**_{2} have the same spectral decomposition. One eigenvalue is given by , while the other two, , are the roots of the quadratic polynomial *P*(*X*)=*X*^{2}+(*c*−2*v*−5*ec*+3*ev*)*X*+(6*ec*^{2}−*cv*−3*ev*^{2}−2*c*^{2}+*v*^{2}+3*ecv*). It is easy to see that . If *c*>0 (class (+, +, −)), then ; hence, , with equality at , and , with equality when (see Figure 1G). If *c*<0 (class (+, −, −)), then and , hence , with equality when and (see Figure 1F).

## Appendix C: A Numerical Extension to Weakly Correlated Inputs

In this section, we loosen the assumption of weakly mutually correlated
three-dimensional inputs (i.e., of a diagonally dominant input covariance matrix **C**) and investigate numerically the behavior of the system under a
wider class of input schemes, corresponding to larger ranges for the parameters *c*, , , and *q*. We will be studying sensitivity to these
parameters in all four combinatorial input classes: **(+, +, +)**, **(+, +, −)**, **(+, −, −)**, and **(−, −, −)**.

Without losing generality, we will be normalizing our matrix **C** so that *v*=1, which will be considered fixed throughout this
analysis. The range for the mutual covariance *c* will be extended in
each case to the largest interval for which **C** remains positive
definite. While the parameter *q* was restricted before to live in
the interval [1/3, 1] (representing the constraint for the quality to be larger than
the error), in the following illustrations, we will allow *q* to
change within [0, 1]. This allows us to better understand how bifurcations and fapp
bifurcations appear in the more plausible biological interval [1/3, 1] and also
reveals interesting behavior that occurs in the poor-quality range for strongly
negatively correlated inputs.

As before, in order to quantify and illustrate the effects of cross talk (error) on
the outcome of learning, we use the cosine of the angle between the system's attractors with and without cross talk (i.e.,
between the directions of the leading eigenvectors of the matrices **EC** and **C**, respectively). Generally the behavior of the system with respect
to error, as observed in section 3.1, extends
naturally to the range of high mutual correlations within the input distribution.
The learning outcome depreciates when gradually increasing the error (decreasing *q*). As discussed in section 3.1, this decay is smooth for some types of input distributions, but for
others, it exhibits jump discontinuities (corresponding to bifurcations in the
dynamics) or just smooth but very sharp drops (fapp bifurcations) with very steep
but bounded slope at the inflection point. We have discussed, in the context of
small mutual correlations *c* (**C** had been assumed to be
diagonally dominant, i.e., with ), that both fapp and actual bifurcations can appear at arbitrarily
small cross-talk values (*q* arbitrarily close to 1). While these
effects still occur for higher values of , the presence of highly negatively correlated inputs introduces an
interesting new effect that is not accounted for by the analysis in the main
text.

Figure 7 shows a few instances of bifurcations
and fapp bifurcations for one negative pairwise correlation and the slight
differences between its two possible off-diagonal positions (next to the diagonal or
in the corner of the matrix **C**). When increasing past the value , while keeping it within the range that preserves positive
definiteness of **C**, the behavior of with respect to *q* remains qualitatively the same,
whether it is a smooth depreciation of the output when decreasing *q* (for biased inputs) or a sharp drop (some unbiased inputs trigger bifurcations; see
the pink curve in Figure 7A), with only the
position and shape of the transitions being altered in the process.

When increasing the number of negative pairwise correlations, the results change
qualitatively, in particular for very high levels of cross talk, as shown in Figures 8 and 9. Typically for **(+, −, −)**, there is a fapp
bifurcation at low values of cross talk, which in fact can shift to arbitrarily
small levels of cross talk depending on the bias parameters. When increasing past in class **(+, −, −)**, a bifurcation
appears in the low *q* range, so that after having passed the
inflection point (fapp) in its degradation from the correct attractor, the system
suddenly reverses, for very large levels of cross talk, to computing the principal
direction of **C** more accurately (the cosine is close to 1 for small
values of *q*). While this jump discontinuity also exists in class **(+, +, −)**, it does not appear in Figure 7 because it occurs for *q*<0. For class **(+, −, −)**, this high cross-talk bifurcation is
brought within the interval by the increase in the number of negative correlations, together
with increasing the pairwise-correlation strength.

The effect is exacerbated when increasing the number of negative pairwise
correlations further and observing class **(−, −,
−)**. The high cross-talk bifurcations shown in Figure 9 are more pronounced and occur for higher values of *q* (i.e., more biologically plausible levels of cross talk).

## Appendix D: An Extension to Higher Dimensions

**EC**. To begin, we can express the matrices

**E**and

**C**individually as and , where

**I**is the identity matrix,

**M**is the matrix with uniform unit entries, and, for any ,

**A**

_{j}is the matrix with zero entries except

*A*(

_{j}*j*,

*j*)=1. Note, for future computations, that

**M**

^{2}=

*n*

**M**and that

**MA**

_{j}is the matrix with the only nonzero entries being ones along the

*j*th column. Unless otherwise specified, the summations are for . The product

**EC**will then be In matrix form, this translates as where , we called and .

**D.1** Fully Biased Case. We first consider the covariance
biases ’s to be distinct: . We will prove that the polynomial has *n* real roots , and we will find approximating bounds for their positions on the
real line.

Recall that ; hence, *f*_{1}>*f*_{2}>⋅⋅⋅>*f _{n}*. To continue our discussion and establish the signs of at all partition points , we need to establish the index

*j*for which the values

*f*switch sign.

_{j}The diagonal dominance assumption allows us to study all cases that may appear, since it
guarantees , . This ensures a complete discussion, since then is allowed to reach and cross over all the critical values , creating a possible swap in the order of the eigenvalues of **EC**, as we will show later. The proof for the other cases will
be omitted, since it is just a simplification of the argument. In fact, the only
crossover of true interest to us is , where the eigenvalue swap involves the two largest
eigenvalues and thus affects the position of the system's attracting equilibria,
corresponding to the normalized eigenvectors of the maximal eigenvalue. The
other critical values , for , affect only the stable and unstable spaces of the
saddle-equilibria. In this light, the condition on the entries of the covariance
matrix can be loosened to .

We distinguish the following cases:

In particular, we have proved the following proposition in the main text:

*In the biased case , the matrix EC has n real distinct eigenvalues , for any error *.

**D.2** Losing the Bias. Suppose now that for , , and allow some of the ; in the limit, this results in a loss of bias in the covariance
matrix **C** ( for some index *j*). In consequence, .

**C**). Suppose . This calculation can be extended to the other intervals for ; however, we will discuss here only the case , since it is the only one that relates directly to the position and multiplicity of the leading root of . It also agrees with our goal to study the behavior of the system for small enough errors. According to equation D.5, we have

Since , it follows that in the limit of and , so that the maximal eigenvalue of **EC** preserves its
multiplicity = 1. This situation changes if we introduce an order two bias
loss (i.e., if we make both and approach zero simultaneously). Then and , so that the two leading roots collide into a double root . This justifies the following proposition:

*Suppose . An* order *k* bias loss *of
the covariance matrix C of the type results in a leading eigenvalue of multiplicity k−1 for the modified covariance matrix EC.*

This proposition can be generalized to encompass bias loss anywhere in the inputs and any interval for the error . Below, we give a more general statement, which follows by repeating the argument for the case we already analyzed but could also be proved more directly.

*Suppose that the matrix*

**is allowed to exhibit bias loss in all possible ways, so that it can be written in block form as equation 3.2, where there exist , with and such that***C**with*

*Then the characteristic polynomial*

*of*

*EC**has all real eigenvalues. More precisely, these eigenvalues are*

*with multiplicity k*−1,

_{j}*for all*,

*and N additional eigenvalues*.

The order of these eigenvalues, depending on the the error value with respect to the critical error values , is the same as described in cases 1 to 3.

## References

^{2+}ions in dendritic spines

## Notes

^{1}

If **X** is an column vector-valued random variable whose covariance matrix
is the identity matrix, then .

^{2}

Since the spectra depend qualitatively on all parameter values, we present here the results of a numerical investigation rather than a rigorous analytical study, which would be extremely cumbersome. The only case in which the computations are more tractable and for which we preferred an analytical approach is the fully unbiased case, presented in appendix A.

^{3}

Thanks to Paul Adams for the useful conversations and generous contributions to this section.

## Author notes

Color versions of all figures in this letter are presented in the online supplement available at http://www.mitpressjournals.org/doi/suppl/10.1162/NECO_a_00565.