Abstract

Integrate-and-express models of synaptic plasticity propose that synapses may act as low-pass filters, integrating synaptic plasticity induction signals in order to discern trends before expressing synaptic plasticity. We have previously shown that synaptic filtering strongly controls destabilizing fluctuations in developmental models. When applied to palimpsest memory systems that learn new memories by forgetting old ones, we have also shown that with binary-strength synapses, integrative synapses lead to an initial memory signal rise before its fall back to equilibrium. Such an initial rise is in dramatic contrast to nonintegrative synapses, in which the memory signal falls monotonically. We now extend our earlier analysis of palimpsest memories with synaptic filters to consider the more general case of discrete state, multilevel synapses. We derive exact results for the memory signal dynamics and then consider various simplifying approximations. We show that multilevel synapses enhance the initial rise in the memory signal and then delay its subsequent fall by inducing a plateau-like region in the memory signal. Such dynamics significantly increase memory lifetimes, defined by a signal-to-noise ratio (SNR). We derive expressions for optimal choices of synaptic parameters (filter size, number of strength states, number of synapses) that maximize SNR memory lifetimes. However, we find that with memory lifetimes defined via mean-first-passage times, such optimality conditions do not exist, suggesting that optimality may be an artifact of SNRs.

1  Introduction

The Hopfield model (Hopfield, 1982) provides the general foundation for many approaches to associative memory. However, its catastrophic forgetting above a threshold memory loading renders it implausible even as a toy model of biological memory. Imposing bounds on synaptic strengths overcomes this catastrophic forgetting by turning the network into a palimpsest, storing new memories by forgetting old ones (Nadal, Toulouse, Changeux, & Dehaene, 1986; Parisi, 1986). One biophysically plausible way of implementing bounds on synaptic strength is to suppose that synapses exist in only a finite set of states of synaptic strength. While some experimental evidence supports the possibility of binary-strength synapses (Petersen, Malenka, Nicoll, & Hopfield, 1998; O’Connor, Wittenberg, & Wang, 2005b), other evidence suggests the existence of ternary-strength (Montgomery & Madison, 2002, 2004) or ternary-state (O’Connor, Wittenberg, & Wang, 2005a) synapses, while yet further evidence indicates that changes in synaptic strength may be discrete, step-like processes without necessarily addressing any possible limit on the number of states of synaptic strength (Yasuda, Sabatini, & Svoboda, 2003; Bagal, Kao, Tang, & Thompson, 2005; Sobczyk & Svoboda, 2007). Many models have considered memory formation with both binary-strength synapses and more general, multilevel, discrete synapses (see, e.g., Willshaw, Duneman, & Longuet-Higgins, 1969; Tsodyks, 1990; Amit & Fusi, 1994; Fusi, Drew, & Abbott, 2005; Leibold & Kempter, 2006, 2008; Rubin & Fusi, 2007; Fusi & Abbott, 2007; Barrett & van Rossum, 2008; Huang & Amit, 2010, 2011). All of these related models share one feature in common: the fidelity of recall of a memory falls monotonically in time, often exponentially fast. Much work has been devoted to extending the resulting rather short memory lifetimes in these models, but the underlying problem of monotonic memory trace decay always remains.

In previous work, we have proposed that synapses may integrate synaptic plasticity induction signals before expressing synaptic plasticity during both development (Elliott, 2008; Elliott & Lagogiannis, 2009) and memory formation (Elliott & Lagogiannis, 2012). By integrating plasticity induction signals, synapses behave as low-pass filters, suppressing high-frequency noise and responding only to low-frequency signals. In this way, the fluctuations in synaptic strength that destabilize both developmentally relevant states and states of memory can be controlled (Elliott, 2011b). We applied such a filtering mechanism to memory formation in a feedforward framework with binary-strength synapses and showed that in radical contrast to nonintegrative models of memory, synaptic filtering and ongoing memory storage actually facilitate an initial increase in the fidelity of recall of a stored memory (Elliott & Lagogiannis, 2012). Such a model outperforms cascade-type models (Fusi et al., 2005) in most biologically relevant regions of parameter space.

Here, we extend our earlier analysis from binary-strength to more general, discrete synapses. After discussing our general formalism in section 2, we derive exact results for the tracked memory signal in the presence of n discrete strength states in section 3. Various approximations to these results may be obtained for the large time limit or the large n limit, which facilitate the extraction of expressions for memory lifetimes, defined via a signal-to-noise ratio (SNR). In section 4, we first explore the dynamics of the tracked memory signal for general, discrete synapses in the presence of a synaptic filter. We show that the signal rise that occurs for binary-strength synapses is present for general, discrete synapses and that, indeed, this signal rise is logarithmically enhanced as a function of n. We also show that the signal then essentially plateaus for large n, with the duration of this plateauing increasing quadratically with n. We then turn to considering memory lifetimes explicitly and explore the dependence of memory lifetimes on the number of synapses, number of states of strength per synapse, and synaptic filter size, . We find that for SNR memory lifetimes, we can trade n and , significantly reducing to biophysically realistic ranges the optimal values of n or that generate maximum SNR memory lifetimes. However, when using a mean-first-passage time (MFPT) definition of memory lifetimes (Elliott, 2014), we do not see maxima in memory lifetimes, so optimality conditions do not exist in this case. Finally, in section 5, we discuss our results and the issues they raise.

2  General Formalism

We provide an outline of our general formalism here. Further details may be found elsewhere (Elliott & Lagogiannis, 2012; Elliott, 2014). Table 1 summarizes the main parameters and quantities used throughout.

Table 1:
Summary of Main Parameters and Quantities Used Throughout.
Parameter or QuantityDescription
n Number of states of synaptic strength per synapse 
N Number of synapses 
 Filter size 
sA Strength of strength state A,  
 Vector of strengths sA,  
 Strength of synapse i at time t 
 Tracked memory signal 
,  Mean and standard deviation of  
,  Equilibrium distribution of filter states, vector, and components 
 Joint distribution of filter and strength states in equilibrium 
 Matrices implementing potentiation and depression steps on the joint distribution of filter and strength states 
 Matrices incrementing or decrementing filter state without threshold processes 
 Matrices implementing filter threshold processes 
 Densities for first escapes through filter thresholds from filter state J in time t 
 Probability of not having reached either filter threshold from filter state J in time t 
 Probability of a transition from filter state J to filter state I in time t 
 Probability of a transition from filter state J and strength state B to filter state I and strength state A in time t 
 Matrix with elements for given filter states I and J 
 Matrix of escape densities from filter states J for transitions to adjacent strength states 
 Auxiliary but key matrix of filter escape densities determining transitions between strength states, defined by  
 The matrix is a stochastic matrix implementing a symmetric random walk between two reflecting boundaries 
,  Eigenvalues and eigenvectors of ,  
  is a vector that captures the change in the equilibrium distribution of strengths at the storage of the tracked memory 
Parameter or QuantityDescription
n Number of states of synaptic strength per synapse 
N Number of synapses 
 Filter size 
sA Strength of strength state A,  
 Vector of strengths sA,  
 Strength of synapse i at time t 
 Tracked memory signal 
,  Mean and standard deviation of  
,  Equilibrium distribution of filter states, vector, and components 
 Joint distribution of filter and strength states in equilibrium 
 Matrices implementing potentiation and depression steps on the joint distribution of filter and strength states 
 Matrices incrementing or decrementing filter state without threshold processes 
 Matrices implementing filter threshold processes 
 Densities for first escapes through filter thresholds from filter state J in time t 
 Probability of not having reached either filter threshold from filter state J in time t 
 Probability of a transition from filter state J to filter state I in time t 
 Probability of a transition from filter state J and strength state B to filter state I and strength state A in time t 
 Matrix with elements for given filter states I and J 
 Matrix of escape densities from filter states J for transitions to adjacent strength states 
 Auxiliary but key matrix of filter escape densities determining transitions between strength states, defined by  
 The matrix is a stochastic matrix implementing a symmetric random walk between two reflecting boundaries 
,  Eigenvalues and eigenvectors of ,  
  is a vector that captures the change in the equilibrium distribution of strengths at the storage of the tracked memory 

2.1  Perceptron Formulation

We consider the possibility of n states of synaptic strength, with , and examine the dependence of memory lifetimes on this parameter. We index these strength states by letters such as A and B, and we define strength state A to correspond to strength
formula
2.1
with , so that and . We have scaled the strengths, regardless of n, into the interval in order to facilitate comparison of results for different n. We discuss the biological relevance of this scaling in section 5. For simplicity and mathematical tractability, we consider a single perceptron. The perceptron has N synapses with strengths , , where t denotes time, with . As standard, the perceptron is assumed to have binary-valued inputs through these N synapses. The activation on presentation of input vector is then
formula
2.2
For our purposes here, we are interested only in this activation and not in any thresholding of the perceptron’s activation that generates the perceptron’s binary-valued output.

The perceptron is required to store “memories” , In a discrete time formalism, memory is stored at time . From a biological perspective, however, a discrete time formalism for memory storage is not particularly realistic. Furthermore, we have previously shown that driving memory storage as a discrete time process eliminates covariance terms that have a detrimental impact on memory dynamics (Elliott & Lagogiannis, 2012). Using a continuous time process to drive memory storage is biologically more realistic and allows a full consideration of the resulting impact of covariance terms on memory dynamics. We therefore employ a continuous time formalism to drive memory storage. The simplest continuous time process to consider is the Poisson process. Memories are therefore stored as a Poisson process of rate r, which we may, without loss of generality, take as r = 1 Hz, since r may be restored in formulas by the replacement . Despite using a Poisson process, memory is nevertheless always stored at t = 0 s; in fact, we consider it to be stored at time  s so that the time immediately after the storage of can be referred to simply as t = 0 s. We need not specify a target perceptron output associated with memory because for an isolated perceptron, we can always, without loss of generality, consider instead the storage of rather than . We then always consider the target output for any memory to be , so that the corresponding perceptron activation is above firing threshold. With this convention, is the plasticity induction signal to synapse i on storage of memory : requires the synapse to potentiate (strengthen), while requires it to depress (weaken). We discuss the implementation of synaptic plasticity in response to these induction signals below. As usual, the memories are assumed to be random and uncorrelated across synapses and between different memories, so that with probability independent of i and . We do not consider the possibility of a sparse coding framework here. We discuss sparse coding in section 5.

Although the use of strengths sA in the range and binary-valued inputs may appear biologically problematic, we can always translate these ranges so that they become nonnegative under an associated change in the perceptron’s firing threshold. These issues are discussed elsewhere (Elliott & Lagogiannis, 2012; Elliott, 2014).

We are interested in the fidelity of recall of the first memory by the perceptron in the face of the ongoing storage of the subsequent memories , . The perceptron’s activation upon re-presentation of at some later time t is just , and if this activation is above firing threshold, then the memory is still stored by the perceptron. We refer to as the tracked memory and to as the tracked memory signal, or just the memory signal, and we write for convenience. Memory lifetimes may be defined in many different ways (for examples, see Tsodyks, 1990; Leibold & Kempter, 2006; Huang & Amit, 2010; Elliott, 2014). Here, we mostly employ the SNR definition (Tsodyks, 1990; Amit & Fusi, 1994). If is the mean memory signal and the standard deviation in the memory signal, then the SNR is defined as . Memory lifetime is then defined as the time at which falls below some defined point, which is typically taken to be unity, so that is the solution of . For simplicity, we mostly use this definition here. Specifically, we assume that a perceptron’s firing threshold can always be chosen so that the mean memory signal does not become inaccessible by ever dropping below the perceptron’s firing threshold. For most models, this requirement amounts to choosing a firing threshold of zero, because asymptotes to zero at large times. Without this assumption, memory lifetimes are severely and disastrously shortened and become independent of N as N increases (Elliott, 2014). We will also consider for comparison a definition of memory lifetimes based on MFPTs (Elliott, 2014). In this formulation, memory lifetime is defined as the average time at which the stochastic memory signal first falls below firing threshold. Such a definition provides a more natural definition of memory lifetime, but it is analytically much more difficult to study for nontrivial models of synaptic plasticity.

2.2  Filter-Based Synaptic Plasticity

Upon the storage of memory , the component is the induction signal to synapse i, indicating whether the synapse should potentiate () or depress (). We have proposed that synaptic plasticity induction signals should be integrated by synapses before synaptic plasticity is expressed (Elliott, 2008), generating what we have termed integrate-and-express models of synaptic plasticity (Elliott & Lagogiannis, 2009) in analogy with integrate-and-fire models of neuronal firing. Specifically, we have proposed that a synapse may implement a discrete low-pass filter that attenuates high-frequency noise while passing a low-frequency signal (Elliott, 2011a; Elliott & Lagogiannis, 2012). The synapse essentially decides whether to express synaptic plasticity depending on whether a synaptic filter mechanism reaches upper or lower filter thresholds, for potentiation or depression, respectively.

Figure 1 represents this synaptic filter as a continuous-time Markov process. Because potentiating and depressing induction signals are equiprobable, we need only consider a symmetric filter with equal upper and lower thresholds, . Filter states are indexed by letters such as I and J, with , and are represented in the figure by the circles enclosing the filter states. Rightward transitions represent potentiating induction signals that cause the filter state to increment by one; conversely, leftward transitions represent depressing induction signals. The rates of these signals, indicated on the transitions between filter states in the figure, are just the rate of memory storage (r, which we set to unity) multiplied by the probabilities that (which are in both cases). If the filter is in state and receives a potentiating induction signal, then the filter reaches threshold, is returned to the I = 0 filter state, and a potentiation step is expressed (indicated by ), so that the synapse’s strength increases from sA to . If , or , then of course a potentiation step cannot be expressed since the synapse is already saturated at its upper strength limit. In this case, the strength remains at . Similarly, if the filter is in state and receives a depressing induction signal, it is returned to state I = 0 and a depression step is expressed (indicated by ), so that the synapse’s strength decreases from sA to . If , or A = 1, then a depression step cannot be expressed, as the synapse is already saturated at its lower strength limit. In this case, the strength remains at . The synapse thus performs a random walk on its allowed strength states in the presence of reflecting boundaries at A = 0 and , implementing saturation of strength. The random walk between these strength states is driven by the underlying filter threshold events.

Figure 1:

A filter-based mechanism for the integration of synaptic plasticity induction signals leading to the expression of synaptic plasticity at filter thresholds. Synaptic filter states are represented by the circled numbers, . Plasticity induction signals occur at Poisson rate r, with potentiation signals (arrows and ) and depression signals (arrows and ) being equiprobable, with probability . Potentiating induction signals acting on filter states lead only to increments in filter state (indicated by ), while a potentiating induction signal acting on filter state causes the filter to reach its upper threshold, leading to the expression of a potentiation step if possible () and resetting the filter state to zero. Similarly, depressing induction signals acting on states decrement the filter state (), while a depressing induction signal acting on state leads to the expression of a depression step is possible () and resetting the filter state to zero.

Figure 1:

A filter-based mechanism for the integration of synaptic plasticity induction signals leading to the expression of synaptic plasticity at filter thresholds. Synaptic filter states are represented by the circled numbers, . Plasticity induction signals occur at Poisson rate r, with potentiation signals (arrows and ) and depression signals (arrows and ) being equiprobable, with probability . Potentiating induction signals acting on filter states lead only to increments in filter state (indicated by ), while a potentiating induction signal acting on filter state causes the filter to reach its upper threshold, leading to the expression of a potentiation step if possible () and resetting the filter state to zero. Similarly, depressing induction signals acting on states decrement the filter state (), while a depressing induction signal acting on state leads to the expression of a depression step is possible () and resetting the filter state to zero.

Let be the matrix that increments the filter state by one unit but without taking the filter state back to I = 0, so without the filter upper threshold process. has entries of unity on its lower diagonal and zeros elsewhere. Let be the matrix that takes the filter state back to I = 0. This matrix has zeros everywhere except for its entry of unity at the position in filter indices (or position with conventional indexing of matrix entries). For , for example, we have
formula
Let the matrices and be the corresponding matrices for a decrement in filter state. , where the superscript denotes the transpose, and the matrix is zero everywhere except for unity at the position in filter indices (or position with conventional indexing). For , for example,
formula
As there are n strength states and filter states, the joint probability distribution of the strength and filter states of a synapse is represented by a -dimensional vector. We order the entries of such vectors so that the batch of entries corresponds to the distribution of filter states when the synapse is in strength state A. Let denote the matrix that implements a potentiation step and the corresponding matrix implementing a depression step. For example, with , and are given schematically by
formula
2.3
and
formula
2.4
where all entries are zero unless explicitly specified. The appearance of the submatrices or in the relevant subblocks in or reflects the fact that a saturated synapse cannot potentiate or depress, respectively, any further, but that its filter state is nevertheless returned to zero when the appropriate threshold is reached.
The matrices and implement potentiation and depression steps on synapses. The matrix superposition implements a plasticity operation on a synapse that is potentiation with probability and depression with probability . For , we schematically have
formula
2.5
The equilibrium joint probability distribution of strength and filter states for a synapse is determined by the eigenvector of with unit eigenvalue. In equilibrium, for synapses, we previously found that the probability distribution of filter states has components
formula
2.6
regardless of the strength state (Elliott & Lagogiannis, 2012). This distribution just corresponds to the (suitably normalized) eigenvector of the matrix with unit eigenvalue. Because the distribution is symmetric about its central, component, we have that . The vector is therefore also an eigenvector of the two matrices and with unit eigenvalue. If we consider the probability distribution
formula
2.7
with being a -dimensional vector with occurring once for each strength state A, , then it is clear from the block structure of the matrix that is an eigenvector of with unit eigenvalue. The vector is therefore the equilibrium joint distribution of filter and strength states for general n. All filter states are therefore distributed according to in equilibrium, regardless of the value of n, and we see that the strength states are themselves uniformly distributed with probability in equilibrium because of the common scaling factor in .
It is against the background of this equilibrium distribution that the definite memory is stored at time  s. Synapse i will have distribution at time t = 0 s after the storage of depending on the definite sign of . For all subsequent memories , , however, the relevant matrix operator is , which averages over both possible induction signals for later memories at any given synapse, allowing us to average over all possible subsequent memories as we are not interested in any particular realization of subsequent memories. With symmetric filters (equal upper and lower thresholds, ), the two distributions are exact mirror images of each other. For n = 2, we previously showed that synapses experiencing an initial potentiating induction signal () and those experiencing an initial depressing induction signal () contribute identically to the tracked memory signal because the roles of weak (strength ) and strong (strength ) synapses are reversed in their contributions to , depending on the signs of the induction signals (Elliott & Lagogiannis, 2012). Specifically, if we define , so that
formula
2.8
then the various are all identically distributed random variables, regardless of i. Furthermore, since all synapses subsequently experience only a superposition of induction signals via the same matrix operator in order to average over all possible later memories , , if are identically distributed at t = 0 s, then they remain identically distributed for all time. It is then a statement in elementary probability that
formula
2.9
formula
2.10
where and denote the mean and variance, respectively, of any one of the variables, and denotes the covariance between any two of them. This equivalence and resulting simplification arises because, for strength states, the two strengths are treated symmetrically. It is therefore clear that for general n, provided that the various strengths are symmetrically distributed around zero (or, in general, around their mean value), so that if for some strength state A there exists a strength state B such that , as is true for equation 2.1, then the same arguments go through. In fact, these arguments also go through for any model of synaptic plasticity in which potentiation and depression processes are treated completely symmetrically (Elliott, 2014), and not just for filter-based mechanisms of synaptic plasticity as considered here and elsewhere.
This equivalence between the two distributions in terms of their contributions to therefore means that we need only consider the joint probability distribution of the filter and tilded-strength (rather than strength) states, and we can therefore restrict without loss of generality to considering only, say, the distribution of at time t = 0 s. Defining (or, equivalently, ), which is a -dimensional vector with components , where is the Kronecker delta symbol, we then have that
formula
2.11
for the probability distribution of states immediately after the storage of memory . This follows directly from the block structure of expressed in terms of and . The contributions from to the intermediate strength states with arise from the upper filter threshold processes occurring in states . There is no such contribution to the A = 1 state because there is no lower, A = 0 state. There are two such contributions to the state because one arises from the upper threshold process from the state and the other from the state’s own upper threshold process, which cannot induce an increment in strength because of saturation. Summing over the filter states for each strength state in equation 2.11, we see that
formula
2.12
which give the probabilities of the various strength states at time t = 0 s. The intermediate (tilded-)strength states therefore continue to have probability immediately after the storage of , while the probability of state A = 1 decreases to and that of increases to . The initial mean memory signal is therefore
formula
2.13
Finally, we note that the matrix operator acts on the state in such a way that the probabilities of the intermediate states remain unchanged, at , and this remains true for all time. Only the probabilities of the A = 1 and states change over time. These always differ equally but oppositely from , because the total probability of all strength states must sum to unity.1 It is therefore straightforward to see that
formula
2.14
independent of t, because and any deviations of the A = 1 and states from probability cancel out in because .

3  Analytical Results

With our general formalism established, we may now derive an analytical expression for the mean memory signal . Analytical expressions for are very much harder to derive because of the covariance term (Elliott & Lagogiannis, 2012). Even for , the generating matrix defining the Markov process lacks a complete set of eigenvectors (i.e., the matrix is defective), so tensor product methods cannot be used to extract higher-order statistics (Elliott & Lagogiannis, 2012). This difficulty remains for . For our purposes here, it is sufficient to approximate the variance using either , or even more simply, , but where necessary, we employ numerical matrix methods to compute the full result.

3.1  Laplace Transform of

Let be the probability of a transition from filter state J to filter state I in time t without filter thresholds being reached. An expression for was derived in Elliott and Lagogiannis (2012), although its explicit form is not required. Let be the densities for first escapes through the upper and lower filter thresholds, respectively, at time t. We have that , explicitly including the rate factor r. For a symmetric filter, we have that . Let be the probability of not having reached either filter threshold, starting from filter state J, in time t. We have that , but also
formula
3.1
is the probability that a random walk on filter states, starting from state J, has yet to reach threshold (or absorbing boundaries) in time t. The sum is the probability density for reaching either threshold, so the integral in equation 3.1 gives the integrated probability density or just the probability of having reached either threshold in time t. The probability of not having reached either threshold then follows directly. Simple expressions may be derived for the Laplace transforms of (Elliott, 2011a). If denote these Laplace transforms with transformed variable s, then
formula
3.2
where are the two solutions of , where the rate factor r is retained for generality, so that and .
Let be the probability of a transition from filter state J and strength state B to filter state I and strength state A in time t. Then, generalizing the argument in Elliott and Lagogiannis (2012), we may write down the system of renewal equations for in terms of and :
formula
3.3
where the middle equation applies only for so that it is the general case rather that the boundary cases at or . The inhomogeneous terms on the right-hand sides (RHSs) arise only when and thus allow the possibility of changes in filter state without any threshold processes arising. The homogeneous terms consider threshold processes leading to changes in strength state (with or without saturation or reflecting boundary dynamics), followed by further state transition processes after the first threshold process. Let be a matrix with components , so a matrix of strength change probabilities in time t for given initial and final filter states, and define the matrix
formula
3.4
Then, Laplace-transforming equation 3.3, we have
formula
3.5
where is the identity matrix. The first term on the right-hand side of this equation for describes changes in filter state, via , without any changes in strength state, via the identity matrix . The second term describes a single increment or decrement in strength (subject to possible saturation) via the matrix , followed by further possible changes in both strength and filter states, but starting from the zero filter state after the threshold process associated with the strength change represented by . The advantage of the Laplace-transformed representation is that these sequential temporal processes reduce to simple products. Setting and defining where is the Dirac delta function, or its Laplace transform,
formula
3.6
we have , so that
formula
3.7
where we drop the Laplace argument s for notational simplicity. The second term in equation 3.5 has transformed into . We note that, formally, . A general term in this second term on the right-hand side of equation 3.7 is therefore of the form . This product of matrices in Laplace-transform space represents a first change in strength starting from initial filter state J, followed by precisely m changes in strength associated with transitions from the zero filter state back to the zero filter state via filter threshold processes, and finally a change in filter state from the zero state to state I without any change in strength. equation 3.7 therefore decomposes the overall transition matrix into elementary, fundamental steps and sums over all possibilities.
To compute , we must sum over the initial and final states with suitable weighting factors. The final filter state I is irrelevant to , so we directly sum over I. The final strength state A must be weighted by sA. Defining the vector as the vector of strengths sA, we then have
formula
3.8
To sum over the initial state, we must sum over the components of the vector because this gives the state of the system immediately after the storage of the tracked memory , whose mean tracked memory signal we wish to determine. Let be the n-dimensional vector of probabilities of a synapse’s strength at time t = 0 s for each particular filter state J. Then we have that , so that
formula
3.9
Structurally, this equation for the mean memory signal should be compared to the equation for in equation 3.7. The first term on the right-hand side describes the contribution to that arises from the initial storage of the tracked memory but without any subsequent changes in strength. This contribution decays away monotonically, controlled by HJ, because the probability of no changes in strength drops to zero as time increases. The second term on the right-hand side describes the contributions arising from subsequent changes in strength after the storage of the tracked memory. Again, we may expand as before and observe contributions from definite numbers of strength changes.
The vectors can be read off from equation 2.11. Noting that the matrix is just a shift operator on filter states, so that , we can write
formula
3.10
where and , both being n-dimensional vectors. The J = 0 form accounts for the various contributions arising in equation 2.11. Note that . We then have that , so that the first term on the right-hand side of equation 3.9 becomes . Evaluating , we readily find that
formula
3.11
where .2 To compute , we first write , setting as for a symmetric filter, where we define the matrix by
formula
3.12
We may then formally write . We observe that if some n-dimensional vector is antisymmetric about its center, so that for any component A, then so is the vector . Of course, is such a vector. Thus, the vector is antisymmetric about its center and so . Hence, all the terms in in equation 3.11 are killed by and only the and terms survive. For the terms involving , we require
formula
3.13
For the term involving , we observe that and that
formula
3.14
Putting all this together, we may finally write
formula
3.15
For n = 2, it is easy to see that (this is true for any density ), so that we obtain
formula
3.16
which is precisely the expression that we obtained before (Elliott & Lagogiannis, 2012). We may then write
formula
3.17
We will interpret the terms in this equation for momentarily.
It is striking that the form for in equation 3.15 factorizes into a form involving and the new contribution . Immediately after the storage of the tracked memory at time  s, synapses may be in different filter states, with probability distribution governed by equation 2.11. Consider, however, a scenario in which all synapses at time t = 0 s are in filter state I = 0. Over time they escape through the upper or lower filter thresholds and are returned to the I = 0 filter state, with associated steps in synaptic strength where possible. In this scenario, synapses therefore perform a non-Markovian but renewing random walk on the strength states with waiting times between transitions governed by the densities (which for symmetric filters are equal, ). The matrix or, more correctly, the matrix , is precisely a stochastic matrix that implements an unbiased (i.e., symmetric) random walk on n discrete states between two reflecting boundaries. In section 4 of Elliott (2010b), we derived general results for renewal processes on bounded intervals in the presence of nonexponential waiting times between transitions. Adapting those results to our notation here, if is the matrix of transition probabilities with elements for transitions from strength state B to strength state A in time t (dropping filter indices because they are irrelevant to the argument here), then we would have for the scenario considered here that (cf. equation 4.8 in Elliott (2010b))
formula
3.18
so that
formula
3.19
The interpretation of this equation for can be seen clearly by expanding in powers of the stochastic matrix ,
formula
3.20
and then taking the inverse Laplace transform using the convolution theorem,
formula
3.21
The third term on the right-hand side of this equation, for example, represents a change in strength occurring at time with probability density , followed by a second change in strength at later time with probability density , and then no subsequent changes in strength, indicated by the presence of the waiting time function over the remaining time interval. The changes in strength are signaled by the presence of the stochastic matrix , which implements a single step in strength space subject to possible saturation. These changes in strength are governed by filter transitions from the zero state through either filter threshold back to the zero state, so are governed by the probability density . Similarly, the general term in corresponds to the occurrence of precisely m filter threshold escape processes and therefore m possible strength changes, giving rise to the m occurrences of , followed by no filter threshold escape processes, giving rise to the waiting time factor. To compute in this scenario, suppose that the initial probability distribution of strength states at t = 0 s is governed by that generated by the storage of , despite considering all filter states to be I = 0 at time t = 0 s. This distribution, from equation 2.12, is just
formula
3.22
In this scenario, the mean memory signal is just , or
formula
3.23
This result is obtained essentially by integrating out synapses’ internal filter states and instead dealing directly with transitions in synaptic strength (cf. Elliott, 2010b). Comparing this result to equation 3.15, we see that we have all the terms except for that in square brackets. Most of the terms in the full form of therefore arise from the unbiased, renewing random walk in bounded strength space governed by the nonexponential waiting times . The remaining term, in square brackets, must arise because of the preparation of the initial filter states, and specifically because the initial filter states are not, in general, immediately after the storage of . In appendix A, we provide an alternative derivation of equation 3.15 using the argument in this paragraph, but taking into account the correct distribution of initial filter states at time t = 0 s.

With these considerations in hand, we may now interpret the two factors, and , appearing in equation 3.17 for . Up to overall parameters and the function , the factor arises purely from the change in the distribution of filter states induced by the storage of the tracked memory and is independent of the number of states of synaptic strength, n. This change in the distribution of filter states is therefore already captured entirely by the particular case of binary synapses, , studied before (Elliott & Lagogiannis, 2012). The contribution to the tracked memory signal from multiple strength states and therefore general n is essentially confined to the second factor, . It is important to note, however, that the filter dynamics continue to exert an influence in this factor through the probability density for escape through either filter threshold from the zero state, . As we have seen, the inverse matrix essentially generates a random walk on strength states between reflecting barriers by summing over all possible processes. The vector in weights the final states by their strengths, and the vector is proportional to the shift in the distribution of strength states induced by the storage of the tracked memory , since only the and strength states change their probabilities. It is indeed striking, and even remarkable, that factorizes in this way. However, this factorization of course reflects the separation of the underlying dynamics into an initial phase governed by the change in filter distributions induced by the storage of , and a later phase governed by transitions in synaptic strength driven by a renewing but non-Markovian random walk regulated by the probability density . This probability density arises because once the state of a filter immediately after the storage of has been forgotten due to a filter threshold process leading to a return of the filter to the zero state, later processes naturally decompose into dynamics starting from the zero filter state and returning to the zero filter state via a filter threshold process, leading (possibly) to a change in strength. We must now determine the contribution from these later processes by explicitly computing .

3.2  Computation of

In order to obtain the full form for , we must now compute . We may do this using two different methods, both of which lead to different but useful approximations, which we shall examine below.

3.2.1  Method 1: Eigenanalysis

The first method is a direct computation of . As the matrix is tridiagonal, its eigenstructure may be computed explicitly. We give the details in appendix B. The stochastic matrix has eigenvalues given by
formula
3.24
and corresponding normalized eigenvectors with components given by
formula
3.25
The matrix inverse is therefore given by
formula
3.26
We find that
formula
3.27
and
formula
3.28
where strictly this latter equation is valid only for , but it also generates the required result (of zero) for m = 0. Finally, then, we have that
formula
3.29

3.2.2  Method 2: Generating Function

The first method above for computing involves a direct eigenanalysis of the stochastic matrix . We may instead use a generating function approach that implicitly determines all the matrix powers without a direct eigenanalysis and in so doing obtain a very different representation of .

The stochastic matrix generates a random walk between reflecting boundaries, so consider the probability distribution after m discrete time steps, starting from some initial distribution . Then . Let the components of be , , the probability of being in state A after m time steps. Define a generating function for over both states A and time steps m by writing
formula
3.30
If we set , then essentially determines , because . By writing out the set of equations for in terms of and taking into account the reflecting boundary conditions, a standard calculation (see, e.g., van Kampen, 1992) produces
formula
3.31
where and . The function is the generating function for the initial state at . Taking the initial state to be B with probability unity, we have . F(w, z) involves the two unknown functions and , which arise from the reflecting boundary conditions. However, these functions can be uniquely determined via analyticity arguments in the complex z plane (Cox & Miller, 1965). Specifically, F(w, z) is by construction a polynomial of degree n in z and so cannot contain any singularities in z. Yet the denominator in equation 3.31 has zeros at the two roots, call them , of the equation . The numerator must therefore also have zeros at these two locations. Hence, we may deduce that
formula
3.32
formula
3.33
although we do not in fact need to know .
By writing in the denominator of F(w, z), it is straightforward to obtain an expression for the coefficient of the zA, , term in F(w, z).3 With , this coefficient is just . Because we want to compute with , it suffices to set . Setting must by symmetry produce the same final result, up to a sign. The required coefficient is then
formula
3.34
with given by equation 3.32, and we need
formula
3.35
After a great deal of tedious but routine algebra, we eventually obtain the remarkably simple form,
formula
3.36
We note that for a definite choice of sign conventions.

3.3  Extraction of

We now invert the Laplace transform to obtain for the two forms of derived above. The two expressions for in equations 3.29 and 3.36 are very different in structure, but each is useful for extracting different approximations.

3.3.1  from equations 3.15 and 3.29

Inserting into equation 3.29 and expanding in , we find that the leading-order behavior is . For , . It is therefore convenient to define
formula
3.37
so that the Dirac delta function present in is isolated and made explicit. By the convolution theorem, we may directly invert equation 3.17, writing in the form
formula
3.38
which defines the scaled form of to be , removing an overall factor of , and where denotes the Laplace convolution,
formula
3.39
Using allows us to directly compare for different values of n. The form of equation 3.38 allows us to see that deviates from and indeed exceeds by a lagged response that is determined by via the convolution . We of course have that , so that for . The response is lagged because rises from zero at  s. To obtain an explicit formula for , we first reproduce here the result for derived elsewhere (Elliott & Lagogiannis, 2012):
formula
3.40
where is the floor function. We must also determine explicitly from equation 3.37 by computing the inverse Laplace transform. We have that
formula
3.41
so there are poles in s at the roots determined by the solutions of . Because and , if and are a root and its complex conjugate in , then
formula
3.42
so a root and its complex conjugate in combine to create a simple pole in s at . With this observation, the inverse Laplace transform is routine, and we obtain
formula
3.43
whence
formula
3.44
Plugging equations 3.40 and 3.44 into equation 3.38 and explicitly computing the convolution, we obtain a messy expression for .
We do not reproduce in full this expression here because below we will obtain a much simpler expression for via equation 3.36. However, the advantage of the convolution representation is that we may obtain an approximation to that is valid at both small times and large times, but not intermediate times. Because is at small times very close to due to the lagged response from , any good enough approximation to will maintain the behavior for small t. Furthermore, if we approximate with a form that improves and becomes asymptotically exact at large t, then under this approximation will become asymptotically exact. Replacing by just its slowest decaying mode ( and in equation 3.44) will achieve this approximation, so writing
formula
3.45
A different approximation can be achieved by retaining the full sum over l in equation 3.44 and retaining only the contribution from the sum over m, but this approximation is necessarily more complicated than retaining just the slowest decaying, and mode in equation 3.44.
In Figure 2 we illustrate the result for in equation 3.44 by plotting as a function of t for various choices of n and . In Figure 2A, for varying n and fixed , we see explicitly the lagged onset of , rising from zero, reaching a peak, and then falling back to zero. As n increases, the peak increases in amplitude, is displaced somewhat later in time, and also broadens, becoming plateau-like. In Figure 2B we fix n and vary . As increases, is overall and significantly (note the log scale) scaled down, and its peak is increasingly displaced toward later times. Comparing to the probability density for escape through either filter threshold starting from the zero filter state, plotted in Figure 2C, we see that tracks quite closely for smaller times, with the overall scale of being set by . In fact, if we expand the form for in equation 3.36 in terms of , we find that the leading-order behavior is governed by
formula
3.46
so that for small times. The escape density therefore explicitly sets the scale for , up to the overall factor of . As n increases, this factor approaches unity, and indeed we see all the curves for for increasing n in Figure 2A converging for smaller times.
Figure 2:

The function as a function of time, for different choices of n and . (A) for , , , , and (moving right to left in the graph, with smaller n corresponding to overall smaller ) for the particular choice, , as indicated. (B) for , 4, 6, 8, and 10 (moving top to bottom in the graph, with smaller corresponding to larger maxima for ) for the particular choice, n = 8, as indicated. (C) For comparison, we show the probability density for escape through either filter threshold starting from the zero filter state, , for the same values of used in panel B, with again smaller values of corresponding to larger maxima for . The initial small time profile of in panel B follows very closely that for .

Figure 2:

The function as a function of time, for different choices of n and . (A) for , , , , and (moving right to left in the graph, with smaller n corresponding to overall smaller ) for the particular choice, , as indicated. (B) for , 4, 6, 8, and 10 (moving top to bottom in the graph, with smaller corresponding to larger maxima for ) for the particular choice, n = 8, as indicated. (C) For comparison, we show the probability density for escape through either filter threshold starting from the zero filter state, , for the same values of used in panel B, with again smaller values of corresponding to larger maxima for . The initial small time profile of in panel B follows very closely that for .

3.3.2  from equation 3.15 and 3.36

We now obtain using the form for in equation 3.36. Rather than exploiting the convolution structure explicit in the product of two Laplace transforms, we instead reduce equation 3.15 to its simplest form before evaluating the inverse transform. We know that , and from equation A.50 in Elliott and Lagogiannis (2012), we have that
formula
3.47
Writing in equation 3.36 out in terms of , we obtain
formula
3.48
so we have
formula
3.49
The inverse Laplace transform is routine, and we obtain
formula
3.50
This general n form is striking in its similarity to the form in equation 3.40. For , reduces identically to because . By again taking the slowest decaying mode from each of the two sums in equation 3.50, we obtain an extremely simple approximation to ,
formula
3.51
or, taking only the slowest decaying term, the even simpler
formula
3.52
This latter form is especially useful for determining SNR memory lifetimes because it gives an extremely simple expression for at large times.
Finally, we may extract some large n limits by considering the limiting behavior of equation 3.36. Throwing away terms that are exponentially suppressed in n, we have
formula
3.53
as an approximation, and as an approximation we have
formula
3.54
The two corresponding forms of are then
formula
3.55
and
formula
3.56
formula
3.57
where and are modified Bessel functions of the first kind. Note the remarkable behavior that asymptotes to a constant, , for large t in the formal, limit. We may compare this approximation to the large n form of the two-decay approximation in equation 3.51:
formula
3.58
This large n form of the two-decay approximation therefore underestimates the asymptotic behavior by around 19%, since .

3.4  Results for a Stochastic Updater Synapse

To facilitate comparison, we also consider a simple, stochastic updater synapse (Tsodyks, 1990). Such a synapse expresses potentiation or depression steps with a fixed probability p on receipt of a plasticity induction signal. Previously we compared such a synapse to a filter-based synapse, but only for (Elliott & Lagogiannis, 2012).

The transition matrix for single, one-step changes in synaptic strength for a stochastic updater synapse is simply , so the transition matrix for changes in synaptic strength in time t is
formula
3.59
where the second line follows immediately from the eigenstructure of considered in section 3.2.1. The mean memory signal is, as usual, , where is the probability distribution of strength states immediately after the storage of memory . This distribution is just (cf. equation 3.22)
formula
3.60
We then obtain
formula
3.61
For , reduces to , as it should (Elliott & Lagogiannis, 2012).
Because a stochastic updater synapse has no internal states, the argument leading to equation 3.23 applies with the replacements and , where is just the density for the expression of either a potentiation or a depression step. We may then use the large n approximations to in equation 3.53 and 3.54 to obtain large n forms for for a stochastic updater. We obtain
formula
3.62
and
formula
3.63
respectively. We note that , in the formal limit, , so that the scaled mean memory signal remains fixed for all time in this limit.
For a stochastic updater, we may also readily compute