Abstract
As an extension of prior work, we studied inspecific Hebbian learning using the classical Oja model. We used a combination of analytical tools and numerical simulations to investigate how the effects of synaptic cross talk (which we also refer to as synaptic inspecificity) depend on the input statistics. We investigated a variety of patterns that appear in dimensions higher than two (and classified them based on covariance type and input bias). We found that the effects of cross talk on learning dynamics and outcome is highly dependent on the input statistics and that cross talk may lead in some cases to catastrophic effects on learning or development. Arbitrarily small levels of cross talk are able to trigger bifurcations in learning dynamics, or bring the system in close enough proximity to a critical state, to make the effects indistinguishable from a real bifurcation. We also investigated how cross talk behaves toward unbiased (“competitive”) inputs and in which circumstances it can help the system productively resolve the competition. Finally, we discuss the idea that sophisticated neocortical learning requires accurate synaptic updates (similar to polynucleotide copying, which requires highly accurate replication). Since it is unlikely that the brain can completely eliminate cross talk, we support the proposal that is uses a neural mechanism that “proofreads” the accuracy of the updates, much as DNA proofreading lowers copying error rate.
1. Introduction
1.1. Synaptic Plasticity and Cross Talk.
It is generally believed that synaptic plasticity (i.e., activity-dependent adjustments of synaptic connection strengths) is the basis of most processes in the nervous system, such as development, learning, creation and storage of memories, cognition, and ultimately behavior (Katz & Shatz, 1996). The term plasticity may reflect a variety of phenomena, from actual new synapse creation and deletion, to silencing and unsilencing of existing synapses, to only changes in existing synapse strengths. In 1949, Hebb proposed that learning occurs in response to local signals, such as the conjoint activity of pre- and postsynaptic neurons: “When an axon of cell A is near enough to excite cell B or repeatedly or consistently takes part in firing it, some growth or metabolic change takes place in one or both cells such that A's efficiency, as one of the cells firing B, is increased” (Hebb, 2002).
Those who interpret and use Hebb's rule generally assume that synaptic modifications act in a local, connection-specific manner (i.e., only synapses between the neurons presenting correlated activity are modified, independent of activity at other synaptic sites). In the literature, the most representative models for long-term changes in synaptic efficacy (Malenka & Bear, 2004; Elliott, 2012) are long-term potentiation (LTP; Bliss & Lømo, 1973) and long-term depression (LTD; Lynch, Dunwiddie, & Gribkoff, 1977). A variety of initial studies of long-term potentiation and depression initially reported synapses updates to be local (i.e., “specific”) (Isaac, Nicoll, & Malenka, 1995; Dudek & Bear, 1992). However, ulterior data failed to replicate synaptic specificity (Chevaleyre & Castillo, 2004; Matsuzaki, Honkura, Ellis-Davies, & Kasai, 2004). Rather, they started to suggest that there is “cross talk” that likely occurs during Hebbian plasticity (Kossel, Bonhoeffer, & Bolz, 1990; Bonhoeffer, Staiger, & Aertsen, 1989; Engert & Bonhoeffer, 1997; Schuman & Madison, 1994; Bi, 2002; Bi & Poo, 2001)—that activity-induced synaptic modification may trigger changes in other, unstimulated synapses (possibly the ones that are geometrically close to or adjacent to the target ones). More recent experimental work (Harvey & Svoboda, 2007) has shown quite unequivocally that induction of LTP at one synapse increases the likelihood of LTP to be induced at closely neighboring synapses.
This source of “error,” or noise, is believed to be due to the imperfection of chemical synaptic transmission, in which some degree of diffusion of neuromessengers combines with the high synapse density (especially for highly connected neurons), making it difficult, or even impossible, for a triggered synaptic change to remain completely connection specific.
A proposed list of such factors that contribute to cross talk (Elliott, 2012) includes early-phase LTP/LTD presynaptic (Bonhoeffer et al., 1989; Kossel et al., 1990; Schuman & Madison, 1994) or postsynaptic (Engert & Bonhoeffer, 1997; Harvey & Svoboda, 2007) diffusion of intracellular (Harvey & Svoboda, 2007; Harvey, Yasuda, Zhong, & Svoboda, 2008) and extracellular messengers (Lemann, Gottmann, & Heumann, 1994; Korte et al., 1995; Levine, Dreyfus, Black, & Plummer, 1995), as well as late-phase LTP and LTD factors, on longer timescales (Frey & Morris, 1998; Navakkode, Sajikumar, & Frey, 2004; see also section 4). The necessity for close synaptic packing (DeFelipe, Marco, Busturia, & Merchán-Pérez, 1999) creates a geometric conflict. In NMDA-mediated sites, for example, the spine neck must be sufficiently narrow to reduce Ca escape to other sites (Koch & Zador, 1993; Sabatini, Oertner, & Svoboda, 2002), but also sufficiently wide to allow synaptic currents through. In this light, complete chemical isolation and accuracy seem, and may indeed be, impossible to achieve in the brain.
1.2. Plasticity Models and the Effects of Cross Talk.
A variety of models have been used to investigate the effects of synaptic cross talk on brain function. Since many different models can produce the same behavior, it is not possible to use behavior to test whether a model is correct; rather, models can be used to determine whether certain types of interactions are capable of replicating certain outcomes, generating testable hypotheses. In our context, modeling is used to predict in principle whether and when cross talk can lead to a complete breakdown in the outcome otherwise obtained in the synapse-specific case.
In most mathematical models of synaptic plasticity, the system develops, or learns, one or more patterns of synaptic configurations, which are typically stable equilibria but could also be cycles or more complex invariant sets in the case of nonlinear models (Wiskott & Sejnowski, 1998; Elliott, 2003). In this framework, synaptic cross talk can be regarded as an internal noise parameter, whose increase may not only alter performance but, past a critical value, may trigger radical crashes (bifurcations) in the system's dynamics, actually destroying its capacity to reach the stable states (the desired developmental or learning outcomes). It has been argued that in order to avoid such crashes, very accurate connection strength adjustments must be required but that such levels of accuracy are biophysically impossible (Cox & Adams, 2009). Furthermore, it has been shown that the critical level of cross talk sufficient to induce bifurcations in these models is very sensitive to the input statistics and postsynaptic connectivity, and in some cases, it can be made arbitrarily small (Elliott, 2012). Either way, many nonlinear models of synaptic plasticity are fatally compromised by even tiny amounts of cross talk (Elliott, 2012), supporting the idea that some parallel circuitry (proofreading) might be necessary to boost robustness to synaptic inspecificity, and thus permit or facilitate useful development and learning, even in the presence of cross talk (see section 4.3 for additional comments on proofreading).
The possibility that synaptic cross talk can have such catastrophic effects makes it very important for us to assess its impact on nonlinear models of synaptic plasticity as a way toward understanding its actual impact in the brain. One cannot expect, however, a generic proof of principle for all learning models, especially given the vastness of the field; rather, one can point out relevant examples of such behavior in models that are biologically plausible.
We study here the effect of cross talk in the Oja rule, a very simple, multiplicative normalization of Hebbian learning. Oja's model is driven only by second-order statistics, hence works as a principal component (PCA) rather than an independent component analyzer (ICA; Cox & Adams, 2009). We are not proposing that the brain actually does PCA, but we consider this very simple particular case of the general unsupervised learning problem because it is completely tractable by a combination of analytical and numerical tools. While our approach incorporates some aspects of biological realism, many simplifications are made along the way (described in the following sections) with the goal to investigate cross talk in a simple and relevant context rather than to propose a detailed model of biological learning. Although the existence of stable equilibria relates here only to second-order input statistics, this model captures a feature observed in other nonlinear, more elaborate models: synaptic cross talk is able to induce catastrophic breakdowns in learning in a manner that is highly idiosyncratic, depending in a very input-specific and model-specific manner on the learning rule.
The rest of the letter is organized as follows. In section 2, we present the model (the Oja rule in the presence of cross-talk, or “inspecificity”) and some properties of the input patterns to be learned, and we provide an overview of the basics of the rule's dynamic behavior. In section 3.1 we investigate numerically the three-dimensional Oja inspecific network; we focus in particular on how it processes different classes of input distributions, preserving some of the dynamical aspects found in the two-dimensional phase plane (Rădulescu & Adams, 2013), but also introducing new features specific to higher dimensions. In section 3.2, we study analytically, in an n-dimensional example, the behavior observed numerically in the previous section. In section 4, we put the numerical and analytical results in the biological context of a learning cortical network. Section 4.1 focuses on the meaning and importance of input bias and on its effects in conjunction with cross talk. Section 4.2 discusses the biological plausibility of an Oja-type learning model and reviews a possible biophysical implementation of the rule, as described in the literature. Section 4.3 briefly discusses the analogy between neural cross talk and DNA copying errors, and the necessity of a proofreading mechanism in both cases.
2. Methods
2.1. The Oja Model with Synaptic Cross Talk.
Oja (1982) showed that a simple neuronal model can perform unsupervised learning based on Hebbian synaptic weight updates incorporating an implicit “multiplicative” weight normalization to prevent unlimited weight growth (von der Malsburg, 1973). Oja's rule has been extensively studied and used (Hertz, Krogh & Palmer, 1991; Taylor & Coombes, 1993) in its original or modified forms (Oja & Karhunen, 1985; Diamantaras & Kung, 1996).













In previous work, we (Rădulescu, Cox, & Adams, 2009; Rădulescu & Adams, 2013) and others (Botelho & Jamison, 2002, 2004) have examined how cross talk affects the Oja model. We
formalized the effects of synaptic cross talk via a time-dependent (but not
input or weight-dependent) error matrix , whose elements reflect at each time t the
fractional contribution that the activity across weight
makes to the update of
.













One can easily show that equation 2.5 preserves the dot product (where
, for all
). Furthermore, an equilibrium for equation 2.5 is an eigenvector of EC,
normalized so that
, where
is its corresponding eigenvalue of EC.
Notice that equation 2.5 has
equilibria that are tightly related to those of the averaged corresponding form
of equation 2.4; our working
form is, however, simpler computationally, in the sense that stability of
equilibria is more easily tractable. We have shown that the eigenvalues of the
Jacobian matrix at an equilibrium w are given by and
, where
and
are the n eigenvalues of EC (noting first that
, the completion of w to a basis of eigenvectors
of EC, orthogonal with respect to the dot product
, also forms an eigenvector basis for the Jacobian). We
concluded that if EC has a unique largest eigenvalue (which is
generically true), then a normalized eigenvector w is a local
hyperbolic attracting equilibrium for equation 2.5 iff it corresponds to this maximal
eigenvalue. If EC has a multiple largest eigenvalue, the system
will have a set of nonisolated, neutrally attracting equilibria (all normalized
eigenvectors spanning the principal eigenspace in this case of dimension
). Some of the computations are summarized in appendix A (e.g., a description of the attraction
basins, supporting the absence of cycles in the phase space) and are expanded in
more detail in our previous work (Rădulescu et al., 2009; Rădulescu & Adams, 2013).
Since the nature and position of the equilibria depend on the spectral properties of EC, the next task is to study the spectral changes of EC when perturbing the system by increasing cross talk. In our previous work on the model, we investigated the effects of cross talk on the system's dynamics and their dependence on the characteristics of the input distribution (correlation sign, degree of bias). However, in our first study, we considered learning only of positively correlated n-dimensional input distributions; we found a smooth degradation of the learning outcome with increasing error but no sudden changes in dynamics (Rădulescu et al., 2009). In our second study, we showed that negatively correlated inputs can induce a bifurcation (stability swap of equilibria, through a critical stage) when increasing the error, even in a case as simple as a two-dimensional system. This bifurcation occurred only in the case of unbiased inputs (Rădulescu & Adams, 2013), and we interpreted it in the context of ocular dominance and input segregation.
For our general computations, we assume that the inputs have mutual covariances c uniform in absolute value, and small with respect to the
diagonal variances. More precisely, we assume , making the matrix diagonally dominant (see section 2.2 for considerations on the input
statistics). Throughout the letter,
will be called the input biases. Without loss of generality,
we set
. For any
, we say that the input has bias loss of order k if
. In particular, we say that the input is unbiased if it has
bias loss of order n, that is, if
. Although the background covariance
is taken for simplicity to be uniform in absolute value, we
expect the inspecific learning rule to lead to interesting dynamics, in
particular when the inputs exhibit a certain degree of mutual correlation.
2.2. Oja's Rule and the Input Statistics.
The goal of this work is to investigate the effects of one particular aspect of biological realism (cross talk) in the context of a model that is otherwise as transparent as possible. We chose the Oja principal component analyzer as a widely known and simple example of a Hebbian model of unsupervised learning, important in cortical processing (Hinton & Sejnowski, 1999) and involving repeated adjustment driven only by statistical properties of the input. While a connectionist model may capture some of the desired basic aspects of learning dynamics, the situation in the brain is far from being this simple.
To begin the Oja model may appear rather unbiological by its very use of a rate-coding scheme and a simple multiplicative Hebbian learning rule, in conjunction with a local (and controversially plausible) normalization procedure (section 4.2 gives more detail on possible empirical bases of the rule and their implementation). While in our approach we incorporated cross talk, we neglected many other biological aspects inherent in synaptic transmission (e.g., timed spikes, external noise, temporal correlations, synaptic homeostasis; Cox & Adams, 2009); a more biologically realistic model would use spike-timing-dependent plasticity and natural inputs (Hyvärinen, Hurri, & Hoyer, 2009). We simply used positive or negative continuous-time activations and weights (one can interpret negative weights as disconnections), and we assumed the input patterns to be zero mean and have identical mutual correlations (Rădulescu et al., 2009). More elaborate models, incorporating detailed spiking patterns, may automatically learn the principal component of the zero-mean inputs, without explicit centering or normalization. Gerstner and Kistler (2002) have developed a model that assumes an Oja-type rate-coding scheme, with Poisson spikes and spike-time-dependent plasticity with LTP and LTD lobes, and postsynaptic spikes triggered by presynaptically generated EPSPs. One could in principle study the effects of cross talk on such a model by applying an error matrix to the LTP or LTD parts; a direct analysis, however, might turn out to be much more difficult than in the case at hand here.
Since our analysis focuses on symmetric matrices C with positive or negative off-diagonal elements, we have to ask whether and when such a matrix can constitute the covariance matrix of a centered n-dimensional distribution. While establishing equivalent conditions may be difficult even for small dimensions (Vasudeva, 1998), one can find sufficient criteria (e.g., any positive semidefinite C is a covariant matrix).1
In our initial computations, we assumed sufficiently weak pairwise correlations
to make C diagonally dominant (in this case, equivalent to ). Any symmetric diagonally dominant matrix with nonnegative
diagonal entries is automatically positive semidefinite, hence a covariance
matrix. Such segregated inputs can be found in a variety of contexts in the
brain. For example, studies of cortico-striate projections (Yim, Aertsen, &
Kumar, 2011) have observed weak pairwise
correlations within the pool of inputs to individual striatal neurons, which are
believed to enhance the saliency of signal representation in the striatum. On
the other hand, C will not remain diagonally dominant for strong
pairwise correlations, which are also likely to occur biologically. A known
example of cells with strongly correlated activity is that of retinal ganglion
cells, placed in topographic proximity of each other and innervating the same
cell in the LGN (Mastronarde, 1989; Trong
& Rieke, 2008). Our work in sections 3.1 and 3.2 assumes diagonal dominance (as a mathematical
convenient assumption that allows us to establish a useful classification and
illustrate typical behaviors that can occur in the system). In appendix C, we complete our analysis with a
numerical approach to a larger collection of matrices, with extended parameter
ranges.
2.3. The Error Matrix.
Together with the uniform magnitude of input cross-correlations (i.e, uniform
absolute value of the off-diagonal elements of C), we also
assumed, for simplicity, uniform error (the Hebbian adjustment of any weight was
equally affected by error and did not depend on either the strength of that
weight or on geometry). Such “isotropicity” seems like a
reasonable basic assumption and has been discussed in our previous work
(Rădulescu et al., 2009;
Rădulescu & Adams, 2013).
Furthermore, it allowed us to identify other features of the input distribution,
crucially consequential on the learning dynamics and outcome: the sign of the
mutual correlations and the input bias. However, cross talk has been documented
experimentally, for technical reasons, mostly between synapses that are
anatomically neighboring each other (Harvey & Svoboda, 2007; Bi, 2002;
Bonhoeffer et al., 1989). In previous work
(Rădulescu et al., 2009), we have
justified isotropicity based on the fact that individual cortical connections
are composed of multiple synapses scattered over the dendritic tree (Varga, Jia,
Sakmann, & Konnerth, 2011; Chen,
Leischner, Rochefort, Nelken, & Konnerth, 2011; Jia, Rochefort, Chen, & Konnerth, 2010), but we have also considered other (more
metric-dependent although all symmetric) forms of E. Cross-talk
effects could probably be captured when using more general, nonisotropic forms
for E without affecting the main conclusions. In this letter, the
distinction between local and global cross talk is not that relevant, since our
main results concern a low- (three-)dimensional network.
3. Results
3.1. Classes of Inputs and Bias Effects on Three-Dimensional Dynamics.
In this section, we study how input patterns can influence the effects of cross talk in driving the dynamics of a three-dimensional network—the lowest dimension for which the question applies but which seems to capture the essence of this problem even in higher-dimensional systems. In this section, we inspect all combinatorial possibilities of input bias and correlation sign (as defined below) and determine the effect of increasing cross talk on dynamics in each case. In section 3.2 and the appendixes, we support with rigorous proofs some of the results obtained through numerical simulations (we used Matlab software package, version 7.2.1).



We found that in special highly unbiased cases, cross talk has no effect on the presence and position of the asymptotic attractors (see Figure 1E). In other cases, the depreciation of the asymptotic outcome with error is so slow that small levels of cross talk have virtually no effect on learning (see Figure 1A and 1B; also see Figure 2 for a phase-space illustration). Other significant classes of inputs, however, showed a sudden change of the attractor states, from a reliable principal component estimator to an almost orthogonal direction. This occurred either in the form of an eigenvalue swapping bifurcation in dynamics (producing the instantaneous loss of learning accuracy at a critical error value; see Figures 1C and 3 for an illustration of phase-space transitions) or in the milder form of an eigenvalue “avoided crossing,” (inducing a smooth yet very steep depreciation of the learned direction at a specific error; see Figures 1D and 1G). As discussed in our previous work, bifurcations and avoided crossings can be practically indistinguishable: learning works reasonably well for small enough errors. For errors past the crash value, the outcome becomes irrelevant to the input statistics, and the system is essentially encoding information on the cross-talk pattern itself.
Spectral changes induced by increasing inspecificity, for various inputs
schemes. In all panels, we show, with respect to the quality , the evolution of the eigenvalues, with black for the
largest eigenvalue, red for the second largest, and green for the lowest
(top subplot); the cosine of the angle between the inspecific stable
vector and the correct attracting direction(s) (bottom subplot). In all
panels, v=1,
. The classification is as follows. (A) For fully
biased inputs (
,
), the three eigenvalues remain separated. For partly
biased inputs (
), there are three cases, depending on the number of
negative cross-correlations and on their placement: the leading
eigenvalues can remain separated (B). They can cross at a critical value
of
(C) or approach significantly for some value of q but “avoid” crossing (D). For fully
unbiased inputs, we found four cases, classified simply by the number of
negative off-diagonal cross-correlations: all positive
cross-correlations, and leading eigenvalues remain separated (E); one
negative cross-correlation, where leading eigenvalues coincide only at q=1 and immediately separate (F); two
negative cross-correlations, where leading eigenvalues may approach each
other in an avoided crossing of magnitude depending on parameters, but
remain separated (G); all negative cross-correlations, where leading
eigenvalues coincide on a whole interval, as quality depreciates from q=1 to a critical value (H). In panel H, the
system has a curve of half-neutral attractors, which persists until q reaches the critical value, when a different,
orthogonal eigenvector takes over as the stable direction. (Please refer
to online supplement for color version of this figure.)
Spectral changes induced by increasing inspecificity, for various inputs
schemes. In all panels, we show, with respect to the quality , the evolution of the eigenvalues, with black for the
largest eigenvalue, red for the second largest, and green for the lowest
(top subplot); the cosine of the angle between the inspecific stable
vector and the correct attracting direction(s) (bottom subplot). In all
panels, v=1,
. The classification is as follows. (A) For fully
biased inputs (
,
), the three eigenvalues remain separated. For partly
biased inputs (
), there are three cases, depending on the number of
negative cross-correlations and on their placement: the leading
eigenvalues can remain separated (B). They can cross at a critical value
of
(C) or approach significantly for some value of q but “avoid” crossing (D). For fully
unbiased inputs, we found four cases, classified simply by the number of
negative off-diagonal cross-correlations: all positive
cross-correlations, and leading eigenvalues remain separated (E); one
negative cross-correlation, where leading eigenvalues coincide only at q=1 and immediately separate (F); two
negative cross-correlations, where leading eigenvalues may approach each
other in an avoided crossing of magnitude depending on parameters, but
remain separated (G); all negative cross-correlations, where leading
eigenvalues coincide on a whole interval, as quality depreciates from q=1 to a critical value (H). In panel H, the
system has a curve of half-neutral attractors, which persists until q reaches the critical value, when a different,
orthogonal eigenvector takes over as the stable direction. (Please refer
to online supplement for color version of this figure.)
Phase-Space trajectories for fully biased inputs. (A) In the absence of
error, the system converges generically to the two normalized vectors in
the principal direction wC of the covariance matrix C. The attraction basins
are separated by the subspace (the shaded plane). (B) For error
, the system converges generically to the two
normalized vectors in the principal direction wEC of the modified covariance matrix EC. The attraction
basins are separated by the subspace
(the shaded plane). Parameters: v=1, c=0.2,
,
. Color coding: trajectories evolve in time from darker
to lighter shades. (Please refer to online supplement for color version
of this figure.)
Phase-Space trajectories for fully biased inputs. (A) In the absence of
error, the system converges generically to the two normalized vectors in
the principal direction wC of the covariance matrix C. The attraction basins
are separated by the subspace (the shaded plane). (B) For error
, the system converges generically to the two
normalized vectors in the principal direction wEC of the modified covariance matrix EC. The attraction
basins are separated by the subspace
(the shaded plane). Parameters: v=1, c=0.2,
,
. Color coding: trajectories evolve in time from darker
to lighter shades. (Please refer to online supplement for color version
of this figure.)
Bifurcation in attractor dynamics for partly biased inputs, all negative
cross-correlations. (A) For small error, the attractors (the two
normalized principal eigenvectors of EC) do not differ much
from the correct attractors (the two normalized principal eigenvectors
of panel C). The attraction basins are separated by the
subspace (the shaded plane). (B) For critical error
, the system exhibits an ellipse of neutrally stable
equilibria (yellow curve contained in the shaded plane). (C) For error
past the critical value, the attractors have moved significantly far
from the correct positions. Parameters: v=1, c=0.2,
. Color coding: trajectories evolve in time from darker
to lighter shades. (Please refer to online supplement for color version
of this figure.)
Bifurcation in attractor dynamics for partly biased inputs, all negative
cross-correlations. (A) For small error, the attractors (the two
normalized principal eigenvectors of EC) do not differ much
from the correct attractors (the two normalized principal eigenvectors
of panel C). The attraction basins are separated by the
subspace (the shaded plane). (B) For critical error
, the system exhibits an ellipse of neutrally stable
equilibria (yellow curve contained in the shaded plane). (C) For error
past the critical value, the attractors have moved significantly far
from the correct positions. Parameters: v=1, c=0.2,
. Color coding: trajectories evolve in time from darker
to lighter shades. (Please refer to online supplement for color version
of this figure.)
None of these possibilities is a priori excluded in the brain, but previous work has suggested that nature may favor bias. Segregated outcomes (disconnected completely, forming wiring patterns that are then subject to more subtle synaptic learning) are considered to be an important part of normal development. In our previous work, we argued that cross talk seems to act against this desymmetrizing tendency and prevent segregation, especially for inputs close to unbiased. We viewed this as a limitation of symmetry-breaking mechanisms that generate specific wiring, and we further argued that other factors, such as strong mutual inhibition (large negative correlations) or special specificity-enhancing circuitry (“proofreading”), might act to overcome the equalizing effect of cross talk. The current study completes this idea with new aspects.
One can say, then, that efficient cross-talk-induced segregation happens in our model for a balance of positive and negative correlations in the input distribution. Since the presence, number, and strength of the negative correlations appeared to be crucial in determining the behavior of the system, we defined a formal classification of all possible correlation matrices based on the number of negative upper-diagonal entries of C and then used the three classes to understand the corresponding behavior with respect to cross talk.
We distinguished four combinatorial classes: Class (+, +, +),
comprising the unique matrix configuration with all positive entries; Class (+, +, −), made of the three matrix configurations
with one negative upper-diagonal entry; Class (+, −,
−), for the three configurations with two negative
upper-diagonal entries; Class (−, −, −), for
the one configuration with all negative off-diagonal entries. We studied the
matrix EC, and the differences that occur in its spectrum when
considering different classes of input, in conjunction with different degrees of
bias: from fully biased () to partly biased (
) to fully unbiased (
). In this section, q will be restricted to
the interval (1/3, 1] (representing quality higher than error). Based on these
combinatorial classes of input, we distinguished three main qualitative
behaviors: separated leading eigenvalues, crossing leading eigenvalues and
“avoided crossing.”2
3.1.1. Separated Leading Eigenvalues.
The largest eigenvalue remains separated from the second largest eigenvalue for the whole range of q (as illustrated in Figure 1A, top panel), determining the corresponding leading eigenvector to gradually drift from the direction of the principal component of C, as q decreases (blue curve in Figure 1A, bottom panel). For any value of q, the system has two hyperbolically attracting equilibria: the normalized principal eigenvectors of EC, whose basins are separated by an invariant plane. In Figure 2, we show the evolution of a set of trajectories to illustrate convergence to the two attractors in the phase space, as well the dynamics within the separating plane.
In the presence of cross talk, the network will process the input in a very similar qualitative fashion as in absence of cross talk, observing the main statistical trends, even though the quantitative outcome might be slightly or more substantially altered, depending on the input pattern and the degree of cross talk. Depending on parameters, the eigenvalue curves with respect to q may exhibit a significant point of minimal separation, where the learning outcome (leading eigenvector of EC) deteriorates very fast (see section 3.1.3).
This case is generally associated with biased inputs (the only possible
behavior when ). That is, no negative correlations are required to
maintain segregated inputs in their segregated state when cross talk is
introduced. However, this behavior can be found in conjunction with loss of
bias, provided the mutual negative correlations are limited: it also appears
in partial loss of bias (
) for class (+, +, +) (see Figures 1B and 2), as well as in full loss of bias (
) for classes (+, +, +) and (+, +,
−) (see Figures 1G, 1E, and 2).
An interesting, quite extreme case of separated eigenvalues occurs for symmetric inputs that are fully unbiased and all positively correlated: the leading eigenvalue is separated from the second eigenvalue (which has multiplicity two), but neither the leading eigenvalue nor the corresponding eigenvector of EC changes when the cross talk is increased. Hence, in this case, the learning is fully accurate for any degree of cross talk (see Figure 1E); one may argue that this particular class of input statistics is completely error proof.
3.1.2. Crossing of Leading Eigenvalues.
This behavior sits, in a sense, at the opposite pole of the “separated
eigenvalues” case, and in its most standard form, it is typical to
partial loss of bias () in combination with all negative correlations, that is,
class (−, −, −); see Figure 1C and section 3.2. The term describes an instantaneous swap of the attractors
from one eigendirection to another direction that could be as much as
orthogonal to the original principal component swap, which produces a crash
in the learning outcome. This behavior occurs when the two leading
eigenvalue branches cross and switch at a critical value of the quality
. (We have described this phenomenon in a two-dimensional
model in Rădulescu & Adams, 2013.) Very small levels of cross talk (
) in fact have very little effect on learning in this case.
Although the leading eigenvalue changes, the direction of the leading and
attracting eigenvector is preserved, so that the system will converge to the
same outcome as in the absence of error.
This may seem like a very desirable input distribution to learn in the
presence of low cross talk; however, one has to keep in mind that if the
cross-correlations are small in absolute value with respect to the variance v, then the
critical
gets arbitrarily close to 1. Such perfect learning will
therefore happen only when inspecificity is infinitesimally small, which
makes this scenario lose its appeal, especially when we recall that at the
end of the “good” interval lies the bifurcation, crashing the
equilibrium to a direction completely irrelevant to the input statistics. In
this light, one might expect the network to have an additional, quite
precise estimator of the degree of cross talk involved, so that when
learning an irrelevant outcome, it would at least be aware of it. Any slight
error of the system toward miscalculating the limits for the permissible
error could have dire consequences.
In Figure 3, we represent three
phase-space plots: before, at, and after the bifurcation point . While Figures 3A and 3C illustrate the typical phase
space with two hyperbolically stable equilibria (one representing accurate,
error-free learning and the other inaccurate learning for a postcritical
error), the phase space at the bifurcation point is qualitatively different:
the system has no hyperbolic attractors but rather a closed curve (ellipse)
of half-stable equilibria (neutral along the direction of the curve).
Clearly, the outcome of learning is in this case extremely dependent on the
initial conditions (although, as we commented in Rădulescu &
Adams, 2013, the stochastic version
of the system will have noise-driven stationary solutions that drift around
this neutrally attracting ellipse).
The neutrally attracting ellipse phase-plane dynamics is not specific to this
critical bifurcation state (and thus it cannot be ignored as improbable in
the context of generic behavior). For some classes of inputs, such an
attracting-ellipse slice represents the natural state of the cross-talk free
system and persists for an entire inspecificity range (see Figure 4). This is the case for bias of order
two () when occurring in conjunction with substantial negative
correlations, that is, classes (+, −, −) and (−, −, −). The computations are quite
simplified in the absence of any bias, so for the case of fully unbiased
inputs we carried out analytically a complete classification in theorem 1 in
appendix A. We describe these two
fully unbiased cases in more detail below.
Bifurcation in attractor dynamics for partly biased inputs, all
negative cross-correlations. (A) For small error, the system has an
ellipse of neutrally stable equilibria (yellow curve). This ellipse
is stable in the sense that it persists for a whole interval of
errors, from to
. (B) For error past the critical value, the
ellipse is destroyed, but the new attractors are significantly far
from the plane of the ellipse. Parameters: v=1, c=0.2,
. Color coding: trajectories evolve in time from
darker to lighter shades. (Please refer to online supplement for
color version of this figure.)
Bifurcation in attractor dynamics for partly biased inputs, all
negative cross-correlations. (A) For small error, the system has an
ellipse of neutrally stable equilibria (yellow curve). This ellipse
is stable in the sense that it persists for a whole interval of
errors, from to
. (B) For error past the critical value, the
ellipse is destroyed, but the new attractors are significantly far
from the plane of the ellipse. Parameters: v=1, c=0.2,
. Color coding: trajectories evolve in time from
darker to lighter shades. (Please refer to online supplement for
color version of this figure.)
We found that in instances of highly unbiased inputs, learning may lead to an
ambiguous outcome even in the absence of cross talk (see Figures 1F, 1H, and 4). Indeed, in the
cross-talk-free class (+, −, −), the matrix C has a double leading eigenvalue to begin, and the system
has a whole closed curve of neutrally attracting equilibria (in the
eigenplane spanned by the corresponding eigenvectors). When cross talk is
introduced, the two leading eigenvalues segregate, and one of the
eigenvectors takes over, which determines an immediate complete switch in
the learning outcome. In this case, even the smallest degree of
inspecificity leads to favoring one specific direction, slightly detaching
off the plane that contains the curve of accurate equilibria (notice that
the cosine of the accuracy angle, represented by the blue curve in Figure 1F, does not fall too far off the
perfect value ).
We may interpret this as the error helping the system “make up its mind” in the presence of too much ambiguity in the input statistics. This is an occurrence we have not encountered in our previous, more restrictive versions of the model, since it requires inputs with concomitant negative cross-correlations and loss of bias of order >2. This ambiguity can be interpreted as the basis of a competitive process in which any input channel has equal chances to win. Competitive dynamics has been studied at large in developmental and learning models in the context of imposed (by means of multiplicative or subtractive normalization) or emergent competition. It has become clear that a linear Hebb rule, even when coupled with a multiplicative normalization or winner-takes-all type nonlinearities, is not able to produce segregation of positively correlated inputs (von der Malsburg, 1973; Goodhill & Barrow, 1994; Miller & MacKay, 1994). When used in conjunction with unbiased inputs, it will lead to an equal-weight outcome (Dayan & Abbott, 2002). A variety of known nonlinear mechanisms can break the inherent symmetry, even when the input per se does not favor segregated outcomes (Elliott, 2003), including subtractive normalization (Miller & MacKay, 1994; Goodhill & Barrow, 1994), the BCM rule (Bienenstock, Cooper, & Munro, 1982) and spike-time-dependent-plasticity (Elliott, 2008). As interpreted in one of our previous discussions on ocular dominance wiring (Rădulescu & Adams, 2013), such mechanisms may lead, for example, to ocular segregation under unbiased statistics (the two eyes are likely receiving similar, positively correlated inputs from the visual field). One context that permits segregation under multiplicative normalization is having negatively correlated inputs.
Our current analysis illustrates this issue and shows that when sufficient
negative correlations are present, the fashion in which the cross talk
handles inherent input ambiguity or competition depends quite significantly
on the number (and, to a lesser extent, the positions) of the negative
mutual correlations within the input. In our model, at least two negative
mutual correlations are necessary for cross talk to produce segregation of
symmetric inputs. For two out of three negative correlations, even the
smallest degree of cross talk helps the system make an asymptotic selection
for one particular direction in the eigenspace spanned by the multiple
eigenvalue. For all negative correlations, no small degree of cross talk can
resolve this competitive state. The level of critical cross talk that can
finally destroy the curve of neutrally stable equilibria also pushes the
system to learn an orthogonal direction, hence becomes irrelevant to the
main features of the original input statistics. Indeed, in the
cross-talk-free class (−, −, −), the
matrix C has a double leading eigenvalue, and the system again
has a whole ellipse of neutral equilibria, contained in the corresponding
eigenplane. When subject to errors up to a critical value , the two larger eigenvalues change but remain equal;
furthermore, the subspace spanned by the two corresponding eigenvectors
remains unchanged, hence the learning process preserves the original
ambiguity. Past the critical error value, the eigenvalues swap, and the
eigendirection for the new leading eigenvalue (of multiplicity one) is
orthogonal to the previous plane (see Figure 1H). In other words, past the critical error value, the system
will finally choose a particular direction to learn, but this direction will
be highly inaccurate, and thus the task of learning the input statistics
will be performed very poorly.
3.1.3. “Avoided Crossing” of Leading Eigenvalues.
This can be seen as a hybrid case in which the principal eigenvalues never
actually swap but get very close (arbitrarily close, depending on the values
of v and ), so that learning has a significantly rapid depreciation
around the critical value
(which also depends on all other parameter values; see the
blue curve in Figure 1D). This
situation can be observed when the input has partial bias loss in mixed
cases from classes (+, +, −) and (+, −, −).
Biologically, such a “pseudobifurcation,” if occurring over a
narrow enough range of q, is indistinguishable from a real
bifurcation, induced by crossing eigenvalues; for this reason, we refer to
it as a for-all-practical-purposes (fapp) bifurcation. Since it represents a
sudden (although smooth) depreciation of the principal direction, one may
consider calculating the “susceptibility” or
“sensitivity” of the angle
with respect to the quality q.
In Figure 5, we illustrate the
difference between the discontinuous breakdown of the derivative in the case of a real bifurcation (discontinuity of
) and the continuous blow-up of
in the case of a fapp bifurcation (
has a significant although finite variation over a narrow
interval of q). One may regard this dichotomy to be in
principle analogous to the difference between discontinuous and continuous
phase transitions. Formally, an avoided crossing can be defined to produce a
fapp bifurcation if the size of the blow-up exceeds a certain threshold
(which may depend on the particular network and the accuracy level desired
for learning).
Comparison between real and fapp bifurcations and their dependence on
input patterns. We illustrate the network performance, or
“sensitivity,” with respect to q (measured as the derivative of the angle
between the network attracting direction with and
without cross talk) for two types of input statistics with order one
bias loss (
): one of class (−, −, −)
(giving rise to eigenvalue swap bifurcations, plotted in blue) and
one of class (+, −, −) (producing fapp bifurcations,
plotted in red in panel A and green in panel B, respectively). The
input base variance was fixed to v=1 in both
panels, but the bias was
(right panel) and
(left panel). We also inspected several values of
the mutual cross-correlations: c = 0.4
(thick solid curve), c = 0.2 (thin solid
curve), and c = 0.05 (dotted curve). As c decreases, the fapp bifurcations for the (+,
−, −) input are getting arbitrarily close to q=1, and approximate better and better
the discontinuous blow-up of the corresponding real bifurcation
obtained for the (−, −, −) input. This effect
is more evident when the partial bias is increased (from
to
). (Please refer to online supplement for color
version of this figure.)
Comparison between real and fapp bifurcations and their dependence on
input patterns. We illustrate the network performance, or
“sensitivity,” with respect to q (measured as the derivative of the angle
between the network attracting direction with and
without cross talk) for two types of input statistics with order one
bias loss (
): one of class (−, −, −)
(giving rise to eigenvalue swap bifurcations, plotted in blue) and
one of class (+, −, −) (producing fapp bifurcations,
plotted in red in panel A and green in panel B, respectively). The
input base variance was fixed to v=1 in both
panels, but the bias was
(right panel) and
(left panel). We also inspected several values of
the mutual cross-correlations: c = 0.4
(thick solid curve), c = 0.2 (thin solid
curve), and c = 0.05 (dotted curve). As c decreases, the fapp bifurcations for the (+,
−, −) input are getting arbitrarily close to q=1, and approximate better and better
the discontinuous blow-up of the corresponding real bifurcation
obtained for the (−, −, −) input. This effect
is more evident when the partial bias is increased (from
to
). (Please refer to online supplement for color
version of this figure.)
With this definition, there are circumstances in which fapp bifurcations can
occur even at arbitrarily small cross talk (q arbitrarily
close to 1). For example, Figure 5 shows the difference between the effect of cross talk in the case of two
input distributions, both with loss of bias of degree one. For the first
type of distribution, class (−, −, −), the all-negative
mutual cross-correlations determine eigenvalue crossing (the blue curves,
which exhibit discontinuous blow-ups). The second type, class (+, −,
−), can lead to avoided crossing. We compared the behavior of the
network in these two situations, inspecting a few values of the bias (left panel versus right panel), and mutual
cross-correlation values
(different curves in the same panel, as explained in the
caption). We found that increasing the bias
and decreasing the cross-correlations
transports the point of maximum sensitivity (the location
of the blow-ups) closer to q=1. Moreover, the size
of the continuous blow-up (the height of the finite peak in the case of
avoided crossing) gets larger as q migrates toward 1, so
that the smaller the values of
, the lower the level of cross talk sufficient to produce a
blow-up, and the more indistinguishable the fapp bifurcation looks from the
bifurcation-induced discontinuity. This reiterates the idea that a fapp
bifurcation can be as detrimental to learning as a real bifurcation,
especially since it can arise at arbitrarily small levels of cross-talk,
just like an actual bifurcation.
In appendix C, consider inputs with stronger pairwise correlations (so that C is no longer diagonally dominant). When we consider high negative mutual correlations, the fapp bifurcation, associated with arbitrarily small levels of cross talk, appears in conjunction with an actual bifurcation, at very high cross-talk levels. This suggests that for such inputs, after undergoing the fapp degradation in outcome, the system may suddenly reverse to accurate computation of the learning attractor at very high cross-talk levels.
3.2. An Analytical Application in Higher Dimensions.
In this section, we present only the main analytical results we obtained for our
application; proofs of the statements and additional comments can be found in
appendix D. Propositions 1 and 2
differentiate between behaviors in response to biased versus unbiased n-dimensional negatively correlated inputs, and illustrate
a situation that extends the behavior found in the three-dimensional model. As
before, in the case of biased inputs, the eigenvalues remain separated, and the
attracting direction degrades smoothly as the cross talk increases. Moreover,
also similar to the three-dimensional case, order one loss of bias is not enough
to trigger an eigenvalue-crossing bifurcation (for which bias loss of order is required), but may be enough to produce fapp bifurcations.
Depending on the parameter values, both actual and fapp bifurcations can occur
for arbitrarily small levels of cross talk (see Figure 6).
A simple example of how the characteristic polynomial of EC and its roots change as the quality q decreases, for dimension n=3 and fixed parameters v=1, c=−0.2,
, for
, so that
,
,
. Each different color represents a different value of q: q=0.98 (red), q=0.805 (blue), q=0.76 (green), and q=0.6 (pink). The continuous curves correspond
to the graph of the polynomial for different q’s, and the bullets represent (along the x-axis) the points
, for
. The figure shows how the order of the roots of
changes with respect to the points of the partition
(which in turn travel down the axis as q decreases). For q=0.98
(i.e.,
),
. For q=0.805 (i.e.,
),
. For q=0.76 (i.e.,
),
. For q=0.6 (where
),
. (Please refer to online supplement for color version
of this figure.)
A simple example of how the characteristic polynomial of EC and its roots change as the quality q decreases, for dimension n=3 and fixed parameters v=1, c=−0.2,
, for
, so that
,
,
. Each different color represents a different value of q: q=0.98 (red), q=0.805 (blue), q=0.76 (green), and q=0.6 (pink). The continuous curves correspond
to the graph of the polynomial for different q’s, and the bullets represent (along the x-axis) the points
, for
. The figure shows how the order of the roots of
changes with respect to the points of the partition
(which in turn travel down the axis as q decreases). For q=0.98
(i.e.,
),
. For q=0.805 (i.e.,
),
. For q=0.76 (i.e.,
),
. For q=0.6 (where
),
. (Please refer to online supplement for color version
of this figure.)
3.2.1. Fully Biased Case.
We consider ; clearly:
. In appendix B, we
show how these values can be used to partition the real line and separate
the roots of
. This leads to:
In the biased case , the matrix EC has n real distinct eigenvalues
, for any error
.















3.2.2. Losing the Bias.
Suppose now that for ,
, and allow some of the
; in the limit, this results in a loss of bias in the
covariance matrix C (
for some index j). In consequence,
. It follows that in the limit of
and
, so that the maximal eigenvalue of EC preserves its multiplicity =1. This situation changes if we introduce
an order two bias loss
(i.e., if we make both
and
approach zero simultaneously). Then
and
, so that the two leading roots collide into a double root
. This justifies the following proposition:
Suppose . An order k bias loss of the
covariance matrix C of the type
results in a leading eigenvalue of multiplicity k−1 for the modified covariance matrix EC.
4. Discussion
4.1. Specific Comments on Our Model.
In this study, we considered a learning network based on the classical unsupervised learning model of Oja, extended to incorporate synaptic cross talk; we aimed to show how different input patterns can exacerbate or, on the contrary, efface the effects of cross talk on the asymptotic outcome of learning. We gave central attention to differences in second-order input statistics, studied how cross talk affected the outcome in each case, and observed that the effects can vary widely depending on these second-order statistics.
Efficient cross-talk-induced segregation happens in our model for a balance of positive and negative correlations. It could be argued that the model itself may artificially impose such a condition by being linear Hebbian, with multiplicative normalization. To address this critique, one may chose to study an equivalent model with subtractive normalization; that would, however, produce a different collection of issues, since subtractive normalization may be less biologically plausible. A better solution would be performing a similar cross-talk analysis on a extended nonlinear model with multiplicative normalization. The fact that certain nonlinear Hebbian models are reducible to linear Hebbian models (Miller, 1990; Elliott & Shadbolt, 2002) has led to a general belief that no Hebbian model, linear or nonlinear, can segregate positively correlated afferents under multiplicative normalization. Recently, Elliott and Shadbolt (2002) offered an explicit counterexample.
In this letter, we focus on a rule that is based only on second-order statistics, but the concept of unbiased distribution can be generalized for nonlinear Hebbian rules, sensitive to a lack of bias of higher order. The work of Elliott and others has shown that segregated outcomes are quite typical of nonlinear Hebbian rules with unbiased statistics (Elliott, 2003), and that cross talk can induce bifurcations in these cases (Elliott, 2012). We have suggested before the example of radially symmetric distributions considered by Lyu and Simoncelli (2009), with joint PDF equal density contour lines being nested hyperspheres with nongaussian spacings. We expect that in this setup, completely unbiased (spherical) input statistics would favor no particular direction in the weight space, so that the outcomes would be signed combinations of equal magnitude weights, nontrivially determined by the higher-order correlations. The presence of enough cross talk in the processing of such inputs may amount to suddenly switching the outcome between two such states.
4.2. Some Biophysical Aspects of Oja's Rule.
Since our focus is on a biological realistic phenomenon (cross talk), it may seem odd to study a linear Hebbian model with multiplicative normalization, which may appear to be very formal and unbiological. But as argued in Rădulescu and Adams (2013), Oja's rule is not as biophysically implausible as first appears.3
In our analysis of the Oja rule, we allowed both inputs and weights to be negative. However, if only positive patterns are allowed, the Hebbian part of the rule would always be positive (and correspond to LTP only), and the normalizing part of the rule would always be negative (and represent LTD only). It seems that in the brain, the negative and positive parts of signals are represented using different neurons, such that the two halves of the Oja rule would operate biologically with fixed and opposite polarities (LTP and LTD). However, the overall effect of the biological implementation would be the same as in our version of Oja's rule, which allows either polarity in both parts of the rule.
Experimental studies at single synapses suggest that reliable LTP may be implemented through repeated pairing of correctly timed pre- and postsynaptic spikes, which occur in an all-or-none manner (Petersen, Malenka, Nicoll, & Hopfield, 1998; Markram, Lübke, Frotscher, Roth, & Sakmann, 1997). Averaged over the many synapses comprising a connection, the overall outcome would be the multiplicative Hebbian rule. A simple mechanism for such batching would be if the coincidence-induced calcium increase at a synapse activated (by binding of Ca-Calmodulin) some fraction of its CaMKinase molecules, as follows: after each calcium pulse, Ca-Calmodulin would dissociate but leave some of the CaMKinase molecules phosphorylated; with successive pulses, enough would eventually be activated that the entire set of CaMKinases would fully autophosphorylate, triggering strengthening (Lisman, 1989, 1994; De Koninck & Schulman, 1998).
The normalizing (LTD) part of the Oja rule is, on the other hand, an elegant
implementation of an approximate nonlocal normalization step that leads to a
purely local online rule. Two obvious requirements of its biophysical
implementation are the calculation of y2 and the multiplication by . Recent work in neocortex (Sjöström, Turrigiano,
& Nelson, 2003, 2004) suggests that LTD occurs in the following way:
backpropagating spikes lead to a synapse-related calcium signal that triggers
endocannabinoid release from the local dendrite, which then diffuses back to the
presynaptic specialization, where it activates a G-protein-coupled
endocannabinoid receptor. If there is near-simultaneous activation of
presynaptic NMDARs by spike-release glutamate, transmitter release is depressed.
This dismisses a previously favored theory (Nevian & Sakmann, 2006) that the level of the spine calcium
achieved by LTP or LTD is a sign determinant of the strength change (Lisman, 1994; Shouval, Bear, & Cooper, 2002). This explanation of LTD seems
well suited to meet the two biophysical requirements of the normalizing part of
the Oja rule (and in this sense, the rule would be more than a formal
description). The calcium-dependent endocannabinoid enzyme triggered by calcium
entering through voltage-dependent channels activated by backpropagating spikes
would implement y2, and the multiplication would be achieved by the requirement for
simultaneous activation of the NMDAR. The dependence on
could be achieved in two ways: the endocannabinoid signal
might be proportional to the postsynaptic strength of the synapse, or the extent
of activation of the presynaptic NMDAR could depend on the amount of glutamate
released, which would depend on the extent of the active zone, which is known in
the long term to adjust to match the PSD area (and hence presumably the synaptic
strength). Thus, the synaptic strength would slowly adjust, by a combination of
matched but distinct post- and presynaptic adjustments, to reflect the arriving
spikes, in the way required by the Oja rule (Rădulescu & Adams, 2013).

At first glance, it appears that the normalization errors could cancel out the
Hebbian errors if F is appropriately matched to E (i.e., both “error-onto-all” with adjustment of quality). Such
cancelation would correspond to a weight erroneously “forgetting”
exactly what it erroneously learns for each pattern. The problem is that while
the averaged values of E and F are simple and closely
related, the instantaneous values and
can be, at least locally, quite different, because one
involves intracellular diffusion and the other extracellular diffusion.
Furthermore, the stability of the algorithm will also be affected. The observed
biological implementation appears to avoid these problems in an elegant way.
4.3. General Comments.
In previous work (Rădulescu et al., 2009; Rădulescu & Adams, 2013), we have suggested an analogy and between the Oja rule (even without cross talk) and Eigen's equation of DNA replication and mutation. Indeed, biologically, Darwinian evolution and neural learning are both adaptive processes, encoding inputs based on repeated interactions with the environment (Baum, 2004; Volkenshte, 1991; Adami, 1998), and mathematically, both models describe normalized growth. However, we have argued that unlike Eigen's model, Oja's equation shows a bifurcation at a critical cross-talk value in only very narrow conditions. We have further suggested that while there may not be an actual “isomorphism” (Fernando & Szathmáry, 2009; Fernando, Goldstein, & Szathmáry, 2010) (or other formal mathematical equivalence) between the two models in all parameter ranges, their analogy resides in their common need for accuracy in the adaptation process. While biology is well known for instances in which it affords to be inaccurate, polynucleotide copying requires superaccuracy, and neural learning also seems to require superaccurate synaptic updates (Elliott, 2012; Adams & Cox, 2012).
Indeed, successful and effective reproduction requires copying the entire genome, with an appropriately small error per base rate. The known “proofreading” operation of this replication process is essential in lowering the copying error rate to acceptable levels. The proofreading mechanism copies bases twice, and replication is allowed only when coincidence of the two results is detected. Since proofreading seems to be in general an effective strategy for overcoming physical limitations, it has been proposed that the same operation is being performed in the neocortex in order to ensure the synaptic specificity necessary for effective learning. The mechanism underlying “neural proofreading,” as proposed by Adams and Cox (2012), assigns to each thalamocortical connection (responsible for the tuned responses of cortical neurons) a corticothalamic “proofreading neuron,” which receives and detects “coincidence” between the input and output spikes arriving at that connection and then sends a double signal to both sides of the connection, confirming the validity of the synaptically detected coincidence. Other aspects, consequences, challenges, and limitations of this elaborate neocortical proofreading circuitry are further investigated in Adams and Cox (2012).
5. Conclusion
A lot of work has been aimed recently toward finding key biological factors that may explain the network architectures and computational algorithms that the brain develops to perform learning. The fact that the activity-dependent processes that lead to synaptic strength adjustments cannot be completely synapse specific constitutes a central problem for biological learning. While this model considers only a very simple setup, it helps us better illustrate an important idea, which we have formulated previously (Rădulescu et al., 2009; Rădulescu & Adams, 2013): a performant synaptic updating algorithm may not suffice for accurate learning, and the process may fail (partly or completely, depending on the input pattern to be learned) even when faced with only infinitesimal amounts of synaptic cross talk. It appears therefore increasingly possible that high-level (e.g., neocortical) learning may require not only performant learning algorithms but also special apparatus for enhancing specificity (Adams & Cox, 2006). The brain may thus have to dedicate comparable effort to developing proofreading for its plasticity machinery (all the more necessary in the face of inaccuracy that seems to not merely degrade learning but rather is able to prevent it altogether). Our model does not exclude either possibility but suggests that learning problems (and perhaps, more generally, all problems of survival or reproduction) are so diverse that no single algorithm can solve them all, so that no universal or canonical cortical circuit should be expected.
Appendix A: Stability of Equilibria in the Oja Model
In consequence, EC has a basis of eigenvectors, orthogonal with respect
to the dot product .
The following theorem, describing the equilibria of system 2.5, is immediate.






SupposeEChas multiplicity one largest eigenvalue. An equilibriumw(i.e., by theorem 1, an eigenvector ofECwith eigenvalue, normalized so that
) is a local hyperbolic attractor for equation 2.5 iff it is an eigenvector
corresponding to the maximal eigenvalue ofEC.















Such attractors always exist provided that the condition of theorem 2 is met (i.e., EC has a maximal eigenvalue of multiplicity one). Then the network
learns, depending on its initial state, one of the two stable equilibria, which are
the two (opposite) maximal eigenvectors of the modified input distribution,
normalized so that . Next, we aim to show that these two attractors are the system's
only hyperbolic attractors.
Suppose the the modified covariance matrix EC has a unique maximal eigenvalue . Then the two eigenvectors
corresponding to
, normalized such that
, are the only two attractors of the system. More
precisely, the phase space is divided into two basins of attraction, of wEC and −wEC, respectively, separated by the subspace
.










Appendix B: A Direct Computation for Unbiased Inputs
For order two input bias , the dynamic behavior of the system is classified by the
classification of the input covariance sign: (+, +, +), (+, +, −),
(+, −, −) and (−, −, −).
Computing directly the spectrum for C1, we get one simple error-independent eigenvalue (whose eigenvector is also error independent) and one double
eigenvalue
. If c>0 (class (+, +, +)),
always dominates (see Figure 1E). If c<0 (class (−, −,
−)), the double eigenvalue
takes over for error smaller than the critical value
(see Figure 1H).
Also by direct computation, one notices that C1 and C2 have the same spectral decomposition. One eigenvalue is given by , while the other two,
, are the roots of the quadratic polynomial P(X)=X2+(c−2v−5ec+3ev)X+(6ec2−cv−3ev2−2c2+v2+3ecv). It is easy to see that
. If c>0 (class (+, +, −)), then
; hence,
, with equality at
, and
, with equality when
(see Figure 1G). If c<0 (class (+, −, −)), then
and
, hence
, with equality when
and
(see Figure 1F).
Appendix C: A Numerical Extension to Weakly Correlated Inputs
In this section, we loosen the assumption of weakly mutually correlated
three-dimensional inputs (i.e., of a diagonally dominant input covariance matrix C) and investigate numerically the behavior of the system under a
wider class of input schemes, corresponding to larger ranges for the parameters c, ,
, and q. We will be studying sensitivity to these
parameters in all four combinatorial input classes: (+, +, +), (+, +, −), (+, −, −), and (−, −, −).
Without losing generality, we will be normalizing our matrix C so that v=1, which will be considered fixed throughout this analysis. The range for the mutual covariance c will be extended in each case to the largest interval for which C remains positive definite. While the parameter q was restricted before to live in the interval [1/3, 1] (representing the constraint for the quality to be larger than the error), in the following illustrations, we will allow q to change within [0, 1]. This allows us to better understand how bifurcations and fapp bifurcations appear in the more plausible biological interval [1/3, 1] and also reveals interesting behavior that occurs in the poor-quality range for strongly negatively correlated inputs.
As before, in order to quantify and illustrate the effects of cross talk (error) on
the outcome of learning, we use the cosine of the angle between the system's attractors with and without cross talk (i.e.,
between the directions of the leading eigenvectors of the matrices EC and C, respectively). Generally the behavior of the system with respect
to error, as observed in section 3.1, extends
naturally to the range of high mutual correlations within the input distribution.
The learning outcome depreciates when gradually increasing the error (decreasing q). As discussed in section 3.1, this decay is smooth for some types of input distributions, but for
others, it exhibits jump discontinuities (corresponding to bifurcations in the
dynamics) or just smooth but very sharp drops (fapp bifurcations) with very steep
but bounded slope at the inflection point. We have discussed, in the context of
small mutual correlations c (C had been assumed to be
diagonally dominant, i.e., with
), that both fapp and actual bifurcations can appear at arbitrarily
small cross-talk values (q arbitrarily close to 1). While these
effects still occur for higher values of
, the presence of highly negatively correlated inputs introduces an
interesting new effect that is not accounted for by the analysis in the main
text.
Figure 7 shows a few instances of bifurcations
and fapp bifurcations for one negative pairwise correlation and the slight
differences between its two possible off-diagonal positions (next to the diagonal or
in the corner of the matrix C). When increasing past the value
, while keeping it within the range that preserves positive
definiteness of C, the behavior of
with respect to q remains qualitatively the same,
whether it is a smooth depreciation of the output when decreasing q (for biased inputs) or a sharp drop (some unbiased inputs trigger bifurcations; see
the pink curve in Figure 7A), with only the
position and shape of the transitions being altered in the process.
When increasing the number of negative pairwise correlations, the results change
qualitatively, in particular for very high levels of cross talk, as shown in Figures 8 and 9. Typically for (+, −, −), there is a fapp
bifurcation at low values of cross talk, which in fact can shift to arbitrarily
small levels of cross talk depending on the bias parameters. When increasing past
in class (+, −, −), a bifurcation
appears in the low q range, so that after having passed the
inflection point (fapp) in its degradation from the correct attractor, the system
suddenly reverses, for very large levels of cross talk, to computing the principal
direction of C more accurately (the cosine is close to 1 for small
values of q). While this jump discontinuity also exists in class (+, +, −), it does not appear in Figure 7 because it occurs for q<0. For class (+, −, −), this high cross-talk bifurcation is
brought within the interval
by the increase in the number of negative correlations, together
with increasing the pairwise-correlation strength.
Processing three-dimensional inputs with one negative pairwise correlation.
The two panels represent the two distinct possibilities for off-diagonal
positions for the negative entry within C. In both panels, the
solid lines represent represented as a function of q. We
considered biased inputs:
,
, for c=0 (green), c=0.1 (blue) and c=0.55
(red), as well as an instance of input bias loss of order one:
, for c=0.55 (pink). The dotted
lines measure the sensitivity of the cosine to changes in q by illustrating its derivative with respect to q and are
convenient for locating fapp bifurcations. (Please refer to online
supplement for color version of this figure.)
Processing three-dimensional inputs with one negative pairwise correlation.
The two panels represent the two distinct possibilities for off-diagonal
positions for the negative entry within C. In both panels, the
solid lines represent represented as a function of q. We
considered biased inputs:
,
, for c=0 (green), c=0.1 (blue) and c=0.55
(red), as well as an instance of input bias loss of order one:
, for c=0.55 (pink). The dotted
lines measure the sensitivity of the cosine to changes in q by illustrating its derivative with respect to q and are
convenient for locating fapp bifurcations. (Please refer to online
supplement for color version of this figure.)
Processing three-dimensional inputs with two negative pairwise correlations.
The two panels represent the two distinct possibilities for off-diagonal
configurations of the negative entries within C. In both
panels, the solid lines represent for biased inputs corresponding to
and
; the dotted lines represent
for bias loss of order one, corresponding to
. The color coding is c=0.1
(green), c=0.8 (blue), and c=1 (red). (Please refer to online supplement for
color version of this figure.)
Processing three-dimensional inputs with two negative pairwise correlations.
The two panels represent the two distinct possibilities for off-diagonal
configurations of the negative entries within C. In both
panels, the solid lines represent for biased inputs corresponding to
and
; the dotted lines represent
for bias loss of order one, corresponding to
. The color coding is c=0.1
(green), c=0.8 (blue), and c=1 (red). (Please refer to online supplement for
color version of this figure.)
Processing three-dimensional inputs with all negative pairwise correlations.
(A) The solid lines represent for biased inputs corresponding to
and
; the dotted lines represent
for bias loss of order one, corresponding to
. The color coding is c=0.1
(green), c=0.2 (blue), c=0.4
(red), and c=0.55 (cyan). (B) The solid lines
represent
for biased inputs corresponding to
and
; the dotted lines represent
for bias loss of order one, corresponding to
. The color coding is c=0.1
(green), c=0.2 (blue), c=0.5
(red), and c=0.65 (cyan). (Please refer to online
supplement for color version of this figure.)
Processing three-dimensional inputs with all negative pairwise correlations.
(A) The solid lines represent for biased inputs corresponding to
and
; the dotted lines represent
for bias loss of order one, corresponding to
. The color coding is c=0.1
(green), c=0.2 (blue), c=0.4
(red), and c=0.55 (cyan). (B) The solid lines
represent
for biased inputs corresponding to
and
; the dotted lines represent
for bias loss of order one, corresponding to
. The color coding is c=0.1
(green), c=0.2 (blue), c=0.5
(red), and c=0.65 (cyan). (Please refer to online
supplement for color version of this figure.)
The effect is exacerbated when increasing the number of negative pairwise correlations further and observing class (−, −, −). The high cross-talk bifurcations shown in Figure 9 are more pronounced and occur for higher values of q (i.e., more biologically plausible levels of cross talk).
Appendix D: An Extension to Higher Dimensions









D.1 Fully Biased Case. We first consider the covariance
biases ’s to be distinct:
. We will prove that the polynomial
has n real roots
, and we will find approximating bounds for their positions on the
real line.










Recall that ; hence, f1>f2>⋅⋅⋅>fn. To continue our discussion and establish the signs of
at all partition points
, we need to establish the index j for which the
values fj switch sign.
The diagonal dominance assumption allows us to study all cases that may appear, since it
guarantees
,
. This ensures a complete discussion, since then
is allowed to reach and cross over all the critical values
, creating a possible swap in the order of the eigenvalues of EC, as we will show later. The proof for the other cases will
be omitted, since it is just a simplification of the argument. In fact, the only
crossover of true interest to us is
, where the eigenvalue swap involves the two largest
eigenvalues and thus affects the position of the system's attracting equilibria,
corresponding to the normalized eigenvectors of the maximal eigenvalue. The
other critical values
, for
, affect only the stable and unstable spaces of the
saddle-equilibria. In this light, the condition on the entries of the covariance
matrix can be loosened to
.
We distinguish the following cases:
In particular, we have proved the following proposition in the main text:
In the biased case , the matrix EC has n real distinct eigenvalues
, for any error
.
D.2 Losing the Bias. Suppose now that for ,
, and allow some of the
; in the limit, this results in a loss of bias in the covariance
matrix C (
for some index j). In consequence,
.






Since , it follows that in the limit of
and
, so that the maximal eigenvalue of EC preserves its
multiplicity = 1. This situation changes if we introduce an order two bias
loss
(i.e., if we make both
and
approach zero simultaneously). Then
and
, so that the two leading roots collide into a double root
. This justifies the following proposition:
Suppose . An order k bias loss of
the covariance matrix C of the type
results in a leading eigenvalue of multiplicity k−1 for the modified covariance matrix EC.
This proposition can be generalized to encompass bias loss anywhere in the inputs and
any interval for the error . Below, we give a more general statement, which follows by
repeating the argument for the case we already analyzed but could also be proved
more directly.






The order of these eigenvalues, depending on the the error value with respect to the critical error values
, is the same as described in cases 1 to 3.
References
Notes
If X is an column vector-valued random variable whose covariance matrix
is the
identity matrix, then
.
Since the spectra depend qualitatively on all parameter values, we present here the results of a numerical investigation rather than a rigorous analytical study, which would be extremely cumbersome. The only case in which the computations are more tractable and for which we preferred an analytical approach is the fully unbiased case, presented in appendix A.
Thanks to Paul Adams for the useful conversations and generous contributions to this section.
Author notes
Color versions of all figures in this letter are presented in the online supplement available at http://www.mitpressjournals.org/doi/suppl/10.1162/NECO_a_00565.