Hopfield attractor networks are robust distributed models of human memory, but they lack a general mechanism for effecting state-dependent attractor transitions in response to input. We propose construction rules such that an attractor network may implement an arbitrary finite state machine (FSM), where states and stimuli are represented by high-dimensional random vectors and all state transitions are enacted by the attractor network’s dynamics. Numerical simulations show the capacity of the model, in terms of the maximum size of implementable FSM, to be linear in the size of the attractor network for dense bipolar state vectors and approximately quadratic for sparse binary state vectors. We show that the model is robust to imprecise and noisy weights, and so a prime candidate for implementation with high-density but unreliable devices. By endowing attractor networks with the ability to emulate arbitrary FSMs, we propose a plausible path by which FSMs could exist as a distributed computational primitive in biological neural networks.

Hopfield attractor networks are one of the most celebrated models of robust neural autoassociative memory, as from a simple Hebbian learning rule they display emergent attractor dynamics that allow for reliable pattern recall, completion, and correction even in situations with considerable nonidealities imposed (Amit, 1989; Hopfield, 1982). Attractor models have since found widespread use in neuroscience as a functional and tractable model of human memory (Chaudhuri & Fiete, 2016; Eliasmith, 2005; Khona & Fiete, 2022; Little, 1974; Rolls, 2013; Schneidman et al., 2006). The assumption of these models is that the network represents different states by different, usually uncorrelated, global patterns of persistent activity. When the network is presented with an input that closely resembles one of the stored states, the network state converges to the corresponding fixed-point attractor.

This process of switching between discrete attractor states is thought to be fundamental both to describe biological neural activity, as well as to model higher cognitive decision-making processes (Brinkman et al., 2022; Daelli & Treves, 2010; Mante et al., 2013; Miller, 2016; Tajima et al., 2017). What attractor models currently lack, however, is the ability to perform state-dependent computation, a hallmark of human cognition (Buonomano & Maass, 2009; Dayan, 2008; Granger, 2020). That is, when the network is presented with an input, the attractor state to which the network switches ought to be dependent on the input stimulus as well as the state the network currently inhabits rather than simply the input.

We thus seek to endow a classical neural attractor model, the Hopfield network, with the ability to perform state-dependent switching between attractor states, without resorting to the use of biologically implausible mechanisms, such as training via backpropagation algorithms. The resulting attractor networks will then be able to robustly emulate any arbitrary finite state machine (FSM), considerably improving their usefulness as a neural computational primitive.

We achieve this by leaning heavily on the framework of vector symbolic architectures (VSAs), also known as hyperdimensional computing (HDC). VSAs treat computation in an entirely distributed manner by letting symbols be represented by high-dimensional random vectors, hypervectors (Gayler, 1998; Kanerva, 1997; Kleyko et al., 2022; Plate, 1995). When equipped with a few basic operators for binding and superimposing hypervectors together, corresponding often either to component-wise multiplication or addition, respectively, these architectures are able to store primitives such as sets, sequences, graphs, and arbitrary data bindings, as well as enabling more complex relations, such as analogical and figurative reasoning (Kanerva, 2009; Kleyko et al., 2021). Although different VSA models often have differing representations and binding operations (Kleyko et al., 2022), they all share the need for an autoassociative cleanup memory, which can recover a clean version of the most similar stored hypervector, given a noisy version of itself. We here use the recurrent dynamics of a Hopfield-like attractor neural network as a state-holding autoassociative memory (Gritsenko et al., 2017).

Symbolic FSM states will thus be represented each by a hypervector and stored within the attractor network as a fixed-point attractor. Stimuli will also be represented by hypervectors, which, when input to the attractor network, will trigger the network dynamics to transition between the correct attractor states. We make use of common VSA techniques to construct a weights matrix to achieve these dynamics, where we use the Hadamard product between bipolar hypervectors {-1,1}N as the binding operation (the multiply-add-permute (MAP) VSA model; Gayler, 1998). We thus claim that attractor-based FSMs are a plausible biological computational primitive insofar as Hopfield networks are.

This represents a computational paradigm that is a departure from conventional von Neumann architectures, wherein the separation of memory and computation is a major limiting factor in current advances in conventional computational performance (the von Neumann bottleneck—Backus, 1978; Indiveri & Liu, 2015). Similarly, the high redundancy and lack of reliance on individual components makes this architecture fit for implementation with novel in-memory computing technologies such as resistive RAM (RRAM) or phase-change memory (PCM) devices, which could perform the network’s matrix-vector-multiplication (MVM) step in a single operation (Ielmini & Wong, 2018; Xia & Yang, 2019; Zidan & Lu, 2020).

2.1  Hypervector Arithmetic

Throughout this article, symbols are represented by high-dimensional, randomly generated dense bipolar hypervectors,
(2.1)
where the number of dimensions N is generally taken to be greater than 10,000. Unless explicitly stated otherwise, any bold lowercase Latin letter may be assumed to be a new, independently generated hypervector, with the value Yi at any index i in x generated according to
(2.2)
For a summary of the notation used throughout this article, see Table A.1 in section A.6 in the appendix. For any two arbitrary hypervectors a and b, we define the similarity between the two hypervectors by the normalized inner product,
(2.3)
where the similarity between a hypervector and itself d(a,a)=1, and d(a,-a)=-1. Due to the high dimensionality of the hypervectors, the similarity between any two unrelated (and so independently generated) hypervectors is the mean of an unbiased random sequence of -1 and 1s,
(2.4)
which tends to 0 for N. It is from this result that we get the requirement of high dimensionality, as it ensures that the inner product between two random hypervectors is approximately 0. We can thus say that independently generated hypervectors are pseudo-orthogonal (Kleyko et al., 2021). For a set of independently generated states {xμ}, these results can be summarized by
(2.5)
where δμν is the Kronecker delta. Hypervectors may be combined via a so-called binding operation to produce a new hypervector that is dissimilar to both of its constituents. We here choose the Hadamard product, or component-wise multiplication, as our binding operation, denoted :
(2.6)
The statement that the binding of two hypervectors is dissimilar to its constituents is written as
(2.7)
where we implicitly assume that N is large enough that we can ignore the O(1N) noise terms. If we wish to recover a similarity between the hypervectors ab and a, we could bind to the a hypervector a b hypervector to produce the hypervector ab, in which case we would have d(ab,ab)=1. For reasons of ease and robustness of implementation in an asynchronous neural system, we focus instead on another method to recover the similarity (see sections 3.1 and A.3). If we mask the system using b, such that only components where bi=1 are remaining, then we have
(2.8)
where we have used the Heaviside step function H(·) defined by
(2.9)

to create a multiplicative mask H(b), setting to 0 all components where bi=-1. In the second line, we have split the summation over all components into summations over components where bi=1 and -1, respectively. The final similarity of 12 is a consequence of approximately half of all values in any hypervector being +1 (see equation 2.2).

2.2  Hopfield Networks

A Hopfield network is a dynamical system defined by its internal state vector z and fixed recurrent weights matrix W, with a state update rule given by
(2.10)
where zt is the network state at discrete time step t and sgn(·) is a component-wise sign function, with zeroes resolving1 to +1 (Hopfield, 1982). We know that if we want to store P uncorrelated patterns {xν}ν=1P within a Hopfield network, we can construct the weights matrix W according to
(2.11)
Then as long as not too many patterns are stored (P<0.14N; Hopfield, 1982), the patterns will become fixed-point attractors of the network’s dynamics, and the network can perform robust autoassociative pattern completion and correction.

2.3  Finite State Machines

A finite state machine (FSM) M is a discrete system with a finite state set XFSM={χ1,χ2,...,χNZ}, a finite input stimulus set SFSM={ς1,ς2,...,ςNS}, and a finite output response set RFSM={ρ1,ρ2,...,ρNR}. The FSM M is then fully defined with the addition of the transition function F(·):XFSM×SFSMXFSM and the output response function G(·):XFSM×SFSMRFSM,
(2.12)
where xtXFSM, rtRFSM, and stSFSM are the state, output, and stimulus at time step t, respectively. The transition function F(·) thus provides the next state for any state-stimulus pair, while G(·) provides the output, and both may be chosen arbitrarily. The FSM M can thus be represented by a directed graph, where each node represents a different state χ, and every edge has a stimulus ς and optional output ρ associated with it.

We now show how a Hopfield-like attractor network may be constructed to emulate an arbitrary FSM, where the states within the FSM are stored as attractors in the network and the stimuli for transitions between FSM states trigger all corresponding transitions between attractors. More specifically, for every FSM state χXFSM, an associated hypervector x is randomly generated and stored as an attractor within the network, the set of which we denote XAN. We henceforth refer to these hypervectors as node hypervectors or node attractors. Every unique stimulus ςSFSM in the FSM is also now associated with a randomly generated hypervector sSAN, where SAN is the set of all stimulus hypervectors. For the FSM edge outputs ρRFSM, a corresponding set of output hypervectors rRAN is similarly generated. These correspondences are summarized in Table 1.

Table 1:

A Comparison of the Notation Used to Represent States, Stimuli, and Outputs in the FSM, and the Corresponding Hypervectors Used to Represent the FSM within the Attractor Network.

FSM (Symbols)Attractor Network (Hypervectors)
States χXFSM Attractors xXAN 
Stimuli ςSFSM Stimuli sSAN 
Outputs ρRFSM Outputs rRAN 
FSM (Symbols)Attractor Network (Hypervectors)
States χXFSM Attractors xXAN 
Stimuli ςSFSM Stimuli sSAN 
Outputs ρRFSM Outputs rRAN 

3.1  Constructing Transitions

We consider the general situation that we want to initiate a transition from source attractor state xXAN to attractor state yXAN, by imposing some stimulus hypervector sSAN as input onto the network:
(3.1)
To ensure the plausible functionality of the network in a biological system, the mechanism for enacting transitions in the network should make very few timing assumptions about the system and should be robust to an arbitrary degree of asynchrony. How we model input to the network is thus of crucial importance to its functionality in these regimes. We model input to the network as a masking of the network state, such that all components where the stimulus s is -1 are set to 0. This may be likened to saying we are considering input to the network that selectively silences half of all neurons according to the stimulus hypervector. This mechanism was chosen as it allows the network to function even when the input is applied asynchronously and with random delays (see section A.3). While a stimulus hypervector s is being imposed on the network, the modified state update rule is given by
(3.2)
where the Hadamard product of the network state with H(s) enacts the masking operation, and the weights matrix W is constructed such that zt+1 will resemble the desired target state (see section 3.1).
For every edge in the FSM, we randomly generate an “edge state” e, which is also stored as an attractor within the network. Each edge will use this e state as an intermediate attractor state, en route to y. Additionally, each unique stimulus ςSFSM will now have two stimulus hypervectors associated with it, sa and sb, which trigger transitions from source state x to edge state e and edge state e to target state y, respectively. The edge states are introduced to allow the system to function even when stimuli are input to the network for arbitrarily many time steps and prevents unwanted effects such as skipping over certain attractor states or oscillations between states (see section A.4). A general transition now looks like
(3.3)
where x,yXAN are node attractor states but e exists purely to facilitate the transition. The weights matrix is constructed2 as
(3.4)
where xνXAN is the node hypervector corresponding to the νth node in the graph to be implemented, NZ and NE are the number of nodes and edges, respectively, and Eη is the addition to the weights matrix required to implement an individual edge, given by
(3.5)
where x, e, and y are the source, edge, and target states of the edge η, respectively, and sa and sb are the input stimulus hypervectors associated with this edge’s label. The edge index η has been dropped for brevity. The ee term is the edge state attractor we have introduced as an intermediary for the transition. The second set of terms enacts the xsae transition by giving a nonzero inner product with the network state zt only when the network is in state x, and the network is being masked by the stimulus sa. When both of these conditions are met, the (xsa) term will have a nonzero inner product with the network state, projecting out the (e-x) term, which “pushes” the network from the x to the e attractor state. This allows terms to be stored in W, where they are effectively obfuscated, not affecting network dynamics considerably until a specific stimulus is applied as a mask to the network. Likewise, the third set of terms enacts the esby transition.
In the absence of input, the network functions like a standard Hopfield attractor network,
(3.6)
where nRN is a standard, normally distributed random vector and
(3.7)
is the magnitude of noise due to the undesired finite inner product with other stored terms (see section A.1 for proof). Thus, as long as the magnitude of the noise is not too large, x will be a solution of z=sgn(Wz) and so a fixed-point attractor of the dynamics.
When a valid stimulus is presented as input to the network, however, masking the network state, the previously obfuscated asymmetric transition terms become significant and dominate the dynamics. Assuming there is a stored transition term E corresponding to a valid edge with hypervectors x,e,y,sa,sb having the same meaning as in equation 3.5, during a masking operation, we have
(3.8)
where implies approximate proportionality (see section A.2 for proof). The second set of terms can be ignored, as they project only to neurons that are currently being masked. Thus, the only significant term is that containing the edge state e, which consequently drives the network to the e state, enacting the xsae transition. Since the state e is also stored as an attractor within the network, we have
(3.9)
and
(3.10)
thus, the edge states e are also fixed-point attractors of the network dynamics. To complete the transition from state x to y, the second stimulus sb is applied, giving
(3.11)
which drives the network state toward yXAN, the desired target attractor state. By consecutive application of the inputs sa and sb, the transition terms Eη stored in W have thus caused the network to controllably transition from the source attractor state to the target attractor state. Due to the robustness of the masking mechanism, the stimuli can be applied asynchronously and with arbitrary delays (see section A.3). Transition terms Eη may be iteratively added to W to achieve any arbitrary transition between attractor states, and so any arbitrary FSM may be implemented within a large enough attractor network.

3.2  Edge Outputs

Until now we have not mentioned the other critical component of FSMs: the output associated with every edge. We have separated the construction of transitions and edge outputs for clarity, since the two may be effectively decoupled. Much like for the nodes and edges in the FSM to be implemented, for every unique FSM output ρRFSM, we generate a corresponding hypervector rRAN, where RAN is the set of all output hypervectors. We then seek to somehow embed these hypervectors into the attractor network, such that every transition between node attractor states may contain one of these hypervectors r. A natural solution would be to embed the r hypervector into the edge state attractors ee, since there already exists one for every edge. We can consider altering the edge state attractors from ee to erer, where er resembles the original e state with r somehow embedded within it, such that its presence can be detected via a linear projection. If multiple edges have the same r hypervector, however, then the erer terms for different edges will be correlated, incurring unwanted interference between attractor states and violating the assumption that the inner product between different attractor terms is small enough that it can be ignored. We avoid this by instead storing altered edge state attractors of the form ere. We then choose er such that it is minimally different from e (i.e., d(er,e)1), so that we still retain the desired attractor dynamics. We thus choose the output hypervectors rRAN to be sparse ternary hypervectors r{-1,0,1}N with coding level fr:=1NiN|ri|, the fraction of nonzero components. These output hypervectors are then embedded in the edge state attractors, altering the ee terms in each E term according to
(3.12)
where the composite vector er introduced above is here defined and 1 is a hypervector of all ones. As a result of this modification, the edge states e themselves will no longer be exact attractors of the space. The composite state er will, however, be stable, in which the presence of r can be easily detected by a linear projection (er·r=Nfr). This has been achieved without incurring any similarity and thus interference between attractors, which would otherwise alter the dynamics of the previously described transitions.
A full transition term Eη, including its output, is thus given by
(3.13)
which combined with the network state masking operation is solely responsible for storing the FSM connectivity and enabling the desired interattractor transition dynamics.

3.3  Sparse Activity States

It is well known that the memory capacity of attractor networks can be vastly increased by storing sparsely coded activity patterns, rather than dense patterns as we have done thus far (Amari, 1989; Amit, 1989; Tsodyks & Feigel’man, 1988). We therefore adapt the construction of the attractor network to the case that the network state zt and its stored hypervectors xν are binary and f-sparse, that is, they contain mostly zeroes, with very few entries being +1, to test if there are similar gains in the size of FSM that can be reliably embedded. To distinguish these hypervectors from the dense bipolar hypervectors we have been using thus far, we denote sparse binary hypervectors xsp{0,1}N with |xsp|1=Nf, where f is the fixed coding level of the states, the fraction of nonzero components. Note that we here construct hypervectors that have exactly Nf nonzero components, and so they may better be described as a sparse N-of-M code (Furber et al., 2004). The attractor network’s weights matrix is constructed as
(3.14)
where Eη are the equivalent sparse edge terms to be defined. If the neuron state update rule (see equation 2.10) is replaced with a sparse binary variant, such as a top-k activation function or a Heaviside function with an appropriately chosen threshold, then the stored states xspν will be attractors of the network’s dynamics (Amari, 1989). The additional edge terms Eη are analogously constructed as
(3.15)
where the first set of terms embeds the sparse binary edge state esp as an attractor, while the second and third terms embed the source-to-edge and edge-to-target transitions, respectively. The stimulus hypervectors sa and sb can also be made sparse, such that fewer than half of all neurons are masked by the stimuli, but at the cost of decreased memory capacity (see section A.5). For this reason, we here keep them as bipolar hypervectors, with an approximately equal number of +1 as -1 entries. Each set of terms within each Eη term performs the same role as in the dense bipolar case as discussed in section 3.1. How output states should be embedded into each transition in the sparse case is unclear, because unlike in the dense case, they cannot be embedded into the edge state attractors without considerably affecting the network dynamics and thus attractor stabilities.

4.1  FSM Emulation

To show the generality of FSM construction, we chose to implement a directed graph representing the relationships between gods in ancient Greek mythology due to the graph’s dense connectivity. The graph, and thus FSM to be implemented, is shown in Figure 1. From the graph, it is clear that a state machine representing the graph must explicitly be capable of state-dependent transitions; for example, the input “overthrown_by” must result in a transition to state “Kronos” when in state “Uranus,” but to state “Zeus” when in state “Kronos.” To construct W, the necessary hypervectors are first generated. For every state χXFSM in the FSM (e.g., “Zeus,” “Kronos”) a random bipolar hypervector x is generated according to equation 2.2. For every unique stimulus ςSFSM (e.g., “overthrown_by,” “father_is”) a pair of random bipolar stimulus hypervectors sa and sb is likewise generated. Similarly, sparse ternary output hypervectors r are also generated. The weights matrix W is then iteratively constructed as per equations 3.4 and 3.13, with a new hypervector e also being generated for every edge. The matrix generated from this procedure we denote Wideal. For all of the following results, the attractor network is first initialized to be in a certain node attractor state, in this case, “Hades.” The network is then allowed to freely evolve for 10 time steps (chosen arbitrarily) as per equation 2.10, with every neuron being updated simultaneously on every time step. During this period, it is desired that the network state zt remains in the attractor state in which it was initialized. An input stimulus sa is then presented to the network for 10 time steps, during which time the network state is masked by the stimulus hypervector, and the network evolves synchronously according to equation 3.2. If the stimulus corresponds to a valid edge in the FSM, the network state zt should then be driven toward the correct edge state attractor e. After these 10 time steps, the second stimulus hypervector sb for a particular input is presented for 10 time steps. Again, the network evolves according to equation 3.2, and the network should be driven toward the target attractor state y, completing the transition. This process is repeated every 30 time steps, causing the network state zt to travel between node attractor states xXAN, corresponding to a valid walk between states χXFSM in the represented FSM. To view the resulting network dynamics, the similarity between the network state zt and the edge- and node attractor states is calculated as per equation 2.3, such that a similarity of 1 between zt and some attractor state xν implies zt=xν and thus that the network is inhabiting that attractor. The similarity between the network state zt and the output states rRAN is also calculated, but due to the output hypervectors being sparse, the maximum value that the similarity can take is d(zt,r)=fr, which would be interpreted as that output symbol being present.

Figure 1:

An example FSM that we implement within the attractor network. Each node within the graph (e.g., “Zeus”) is represented by a new hypervector xμ and stored as an attractor within the network. Every edge is labeled by its stimulus (e.g., “father_is”), for which corresponding hypervectors sa and sb are also generated. When a stimulus’s hypervector is input to the network, it should allow all corresponding attractor transitions to take place. Each edge may also have an associated output symbol, where we here choose the edges labeled “type” to output the generation of the god {“Primordial,” “Titans,” “Olympians”}. This graph was chosen as it displays the generality of the embedding: it contains cycles, loops, bidirectional edges, and state-dependent transitions.

Figure 1:

An example FSM that we implement within the attractor network. Each node within the graph (e.g., “Zeus”) is represented by a new hypervector xμ and stored as an attractor within the network. Every edge is labeled by its stimulus (e.g., “father_is”), for which corresponding hypervectors sa and sb are also generated. When a stimulus’s hypervector is input to the network, it should allow all corresponding attractor transitions to take place. Each edge may also have an associated output symbol, where we here choose the edges labeled “type” to output the generation of the god {“Primordial,” “Titans,” “Olympians”}. This graph was chosen as it displays the generality of the embedding: it contains cycles, loops, bidirectional edges, and state-dependent transitions.

Close modal

An attractor network performing a walk is shown in Figure 2, with parameters N=10,000, Nfr=200, NZ=8, and NE=16. This corresponds to the network having a per neuron noise (the finite-size effect resulting from random hypervectors having a nonzero similarity to each other) of σ0.07, calculated via equation 3.7. The magnitude of the noise is thus small compared with the desired signal of magnitude 1 (see equation 3.6), and so we are far away from reaching the memory capacity of the network. The network performs the walk as intended, transitioning between the correct node attractor states and corresponding edge states with their associated outputs. The specific sequence of inputs was chosen to show the generality of implementable state transitions. First, there is the explicit state dependence in the repeated input of “father_is, father_is.” Second, it contains an input stimulus that does not correspond to a valid edge for the currently inhabited state (“Zeus overthrown_by”), which should not cause a transition. Third, it contains bidirectional edges (“consort_is”), whose repeated application causes the network to flip between two states (between “Kronos” and “Rhea”). And fourth, it contains self-connections, whose target states and source states are identical. Since the network traverses all these edges as expected, we do not expect the precise structure of an FSM’s graph to limit whether it can be emulated by the attractor network.

Figure 2:

An attractor network transitioning through attractor states in a state-dependent manner as a sequence of input stimuli is presented to the network. (a) The input stimuli to the network, where for each unique stimulus (e.g., “father_is”) in the FSM to be implemented (see Figure 1) a pair of hypervectors sa and sb have been generated. No stimulus, a stimulus sa, then a stimulus sb are input for 10 time steps each in sequence. (b, c) The similarity of the network state zt to stored node attractor states xXAN and stored edge states e, respectively, computed via the inner product (see equation 2.3). (d) The similarity of the network state zt to the sparse output states rRAN. All similarities have been labeled with the state they represent, and the colors are purely illustrative. The attractor transitions shown here are explicitly state dependent, as can be seen from the repeated input of the stimulus “father_is,” which results in a transition to state “Kronos” when in “Hades,” but to “Uranus” when in “Kronos.” Additionally, the network is unaffected by nonsense input that does not correspond to a stored edge, as the network remains in the attractor “Uranus” when presented with the stimulus “father_is.”

Figure 2:

An attractor network transitioning through attractor states in a state-dependent manner as a sequence of input stimuli is presented to the network. (a) The input stimuli to the network, where for each unique stimulus (e.g., “father_is”) in the FSM to be implemented (see Figure 1) a pair of hypervectors sa and sb have been generated. No stimulus, a stimulus sa, then a stimulus sb are input for 10 time steps each in sequence. (b, c) The similarity of the network state zt to stored node attractor states xXAN and stored edge states e, respectively, computed via the inner product (see equation 2.3). (d) The similarity of the network state zt to the sparse output states rRAN. All similarities have been labeled with the state they represent, and the colors are purely illustrative. The attractor transitions shown here are explicitly state dependent, as can be seen from the repeated input of the stimulus “father_is,” which results in a transition to state “Kronos” when in “Hades,” but to “Uranus” when in “Kronos.” Additionally, the network is unaffected by nonsense input that does not correspond to a stored edge, as the network remains in the attractor “Uranus” when presented with the stimulus “father_is.”

Close modal

4.2  Network Robustness

One of the advantages of attractor neural networks that make them suitable as plausible biological models is their robustness to imperfect weights (Amit, 1989). That is, individual synapses may have very few bits of precision or become damaged, yet the relevant brain region must still be able to carry out its functional task. To this end, we subjected the network presented here to similar nonidealities, to check that the network retains the feature of global stability and robustness despite being implemented with low precision and noisy weights. In the first of these tests, the ideal weights matrix Wideal was binarized and then additive noise was applied, via
(4.1)

where χijR are independently sampled standard gaussian variables, sampled once during matrix construction, and σnoiseR is a scaling factor on the strength of noise being imposed. The sgn(·) function forces the weights to be bipolar, emulating that the synapses may have only one bit of precision, while the χij random variables act as a smearing on the weight state, emulating that the two weight states have a finite width. A σnoise value of 2 thus corresponds to the magnitude of the noise being equal to that of the signal (whether Wijideal0), and so, for example, for a damaged weight value of Wijnoisy=+1, there is a 38% chance that the predamaged weight Wijideal=-1. This level of degradation is far worse than is expected even from novel binary memory devices (Xia & Yang, 2019), and presumably also for biology. We used the same set of hypervectors and sequence of inputs as in Figure 2, but this time using the degraded weights matrix Wnoisy to test the network’s robustness. The results are shown in Figure 3 for weight degradation values of σnoise=2 and σnoise=5, corresponding to signal-to-noise ratios (SNRs) of 0 dB and -0.8 dB, respectively. We see that for σnoise=2, the attractor network performs the walk just as well as in Figure 2, which used the ideal weights matrix, despite the fact that here, the binary weight distributions overlap each other considerably. Furthermore, we have that d(zt,xν)1 where xν is the attractor that the network should be inhabiting at any time, indicating that the attractor stability and recall accuracy are unaffected by the nonidealities. For σnoise=5, a scenario where the realized weight carries very little information about the ideal weight’s value, we see that the network nonetheless continues to function, performing the correct walk between attractor states. However, there is a degradation in the recall of stored attractor states, with the network state no longer converging to a similarity of 1 with the stored attractor states. For greater values of σnoise, the network ceases to perform the correct walk, and indeed does not converge on any stored attractor state (not shown).

Figure 3:

The attractor network performing a walk as in Figure 2 but using the damaged weights matrix Wnoisy, whose entries have been binarized and then independent additive noise has been applied, as per equation 3.4. (a) The distribution of weights after they have been thusly damaged with noise of magnitude σnoise=2, corresponding to an SNR of 0 dB. Weights whose ideal values were positive or negative have been plotted separately. (b) The similarity of the network state zt to stored node hypervectors, with the network using the weights from panel a. Shown above is the sequence of inputs given to the network, identical to that in Figure 2. (c) The distribution of weights damaged with σnoise=5, corresponding to an SNR of -0.8 dB. (d) The similarity of the network state to stored node hypervectors, but with the network using the damaged weights from panel c. The network transitions are thus highly robust to unreliable weights and show a gradual degradation in performance, even when the network’s weights are highly imprecise and noisy. For both panels b and d, the edge state and output similarity plots have been omitted for visual clarity.

Figure 3:

The attractor network performing a walk as in Figure 2 but using the damaged weights matrix Wnoisy, whose entries have been binarized and then independent additive noise has been applied, as per equation 3.4. (a) The distribution of weights after they have been thusly damaged with noise of magnitude σnoise=2, corresponding to an SNR of 0 dB. Weights whose ideal values were positive or negative have been plotted separately. (b) The similarity of the network state zt to stored node hypervectors, with the network using the weights from panel a. Shown above is the sequence of inputs given to the network, identical to that in Figure 2. (c) The distribution of weights damaged with σnoise=5, corresponding to an SNR of -0.8 dB. (d) The similarity of the network state to stored node hypervectors, but with the network using the damaged weights from panel c. The network transitions are thus highly robust to unreliable weights and show a gradual degradation in performance, even when the network’s weights are highly imprecise and noisy. For both panels b and d, the edge state and output similarity plots have been omitted for visual clarity.

Close modal
A further test of robustness was to restrict the weights matrix to be sparse, as a dense all-to-all connectivity may not be feasible in biology, where synaptic connections are spatially constrained and have an associated chemical cost. Similar to the previous test, the sparse weights matrix was generated via
(4.2)

where θ is a threshold set such that Wsparse{-1,0,1}N×N has the desired sparsity. Through this procedure, only the most extreme weight values are allowed to be nonzero. Since the terms inside Wideal are symmetrically distributed around zero, there are approximately as many +1 entries in Wsparse as -1s. Using the same hypervectors and sequence of inputs as before, an attractor network performing a walk using the sparse weights matrix Wsparse is shown in Figure 4, with sparsities of 98% and 99%. We see that for the 98% sparse case, there is again very little difference with the ideal case shown in Figure 2, with the network still having a similarity of d(zt,x)1 with stored attractor states and performing the correct walk. When the sparsity is pushed further to 99%, however, we see that despite the network performing the correct walk, the attractor states are again slightly degraded, with the network converging on states with d(zt,xν)<1 with stored attractor states xν. For greater sparsities, the network ceases to perform the correct walk and again does not converge on any stored attractor state (not shown).

Figure 4:

The attractor network performing a walk as in Figure 2, but using a sparse ternary weights matrix Wsparse{-1,0,1}N×N, generated via equation 4.2. The weights matrices for panels a and b are 98% and 99% sparse, respectively. Shown are the similarities of the network state zt with stored node hypervectors xXAN, with the applied stimulus hypervector at any time shown above. We see that even when 98% of the entries in W are zeroes, the network continues to function with negligible loss in stability, as the correct walk between attractor states is performed and the network converges on stored attractors with similarity d(zt,x)1. At 99% sparsity, there is a degradation in the accuracy of stored attractors, with the network converging on states with d(zt,x)<1, but with the correct walk still being performed. Beyond 99% sparsity, the attractor dynamics break down (not shown). Thus, although requiring a large number of neurons N to enforce state pseudo-orthogonality, the network requires far fewer than N2 nonzero weights to function robustly.

Figure 4:

The attractor network performing a walk as in Figure 2, but using a sparse ternary weights matrix Wsparse{-1,0,1}N×N, generated via equation 4.2. The weights matrices for panels a and b are 98% and 99% sparse, respectively. Shown are the similarities of the network state zt with stored node hypervectors xXAN, with the applied stimulus hypervector at any time shown above. We see that even when 98% of the entries in W are zeroes, the network continues to function with negligible loss in stability, as the correct walk between attractor states is performed and the network converges on stored attractors with similarity d(zt,x)1. At 99% sparsity, there is a degradation in the accuracy of stored attractors, with the network converging on states with d(zt,x)<1, but with the correct walk still being performed. Beyond 99% sparsity, the attractor dynamics break down (not shown). Thus, although requiring a large number of neurons N to enforce state pseudo-orthogonality, the network requires far fewer than N2 nonzero weights to function robustly.

Close modal

These two tests thus highlight the extreme robustness of the model to imprecise and unreliable weights. The network may be implemented with 1-bit precision weights, whose weight distributions are entirely overlapping, or set 98% of the weights to zero and still continue to function without any discernible loss in performance. The extent to which the weights matrix may be degraded and the network still remain stable is of course a function not only of the level of degradation but also of the size of the network N, as well as the number of FSM states NZ and edges NE stored within the network. For conventional Hopfield models with Hebbian learning, these two factors are normally theoretically treated alike, as contributing an effective noise to the postsynaptic sum as in equation 3.7, and so the magnitude of withstandable synaptic noise increases with increasing N (Amit, 1989; Sompolinsky, 1987). Although a thorough mathematical investigation into the scaling of weight degradation limits is justified, as a first result, we have here given numerical data showing stability even in the most extreme cases of nonideal weights, and expect that any implementation of the network with novel devices would be far away from such extremities.

4.3  Asynchronous Updates

Another useful property of Hopfield networks is the ability to robustly function even with asynchronously updating neurons, wherein not every neuron experiences a simultaneous state update. This property is especially important for any architecture claiming to be biologically plausible, as biological neurons update asynchronously and largely independent of each other, without the the need for global clock signals. To this end, we ran a similar experiment to that in Figure 2, using the undamaged weights matrix Wideal but with an asynchronous neuron update rule, wherein on each time step, every neuron has only a 10% chance of updating its state. The remaining 90% of the time, the neuron retains its state from the previous time step, regardless of its postsynaptic sum. There is thus no fixed order of neuron updates, and indeed it is not even a certainty that a neuron will update in any finite time. To account for the slower dynamics of the network state, the time for which inputs were presented to the network, as well as the periods without any input, was increased from 10 to 40 time steps. To be able to easily view the gradual state transition, three of the node hypervectors were chosen to be columns of the N-dimensional Hadamard matrix rather than being randomly generated. The results are shown in Figure 5 for a shorter sequence of stimulus inputs. We see that the network functions as intended, but with the network now converging on the correct attractors in a finite number of updates rather than in just one. The model proposed here is thus not reliant on synchronous dynamics, which is important not only for biological plausibility but also when considering possible implementations on asynchronous neuromorphic hardware (Davies et al., 2018; Liu et al., 2014).

Figure 5:

An attractor network performing a shorter walk than in Figure 2, but where neurons are updated asynchronously, with each neuron having a 10% chance of updating on any time step. (a) The similarity of the network state zt to stored node hypervectors, with the stimulus hypervectors being applied to the network labeled above. (b) The evolution of a subset of neurons within the attractor network, where for visual clarity, three of the node hypervectors have been taken from columns of the N-dimensional Hadamard matrix rather than being randomly generated. The network functions largely the same as in the synchronous case, but with transitions between attractor states now taking a finite number of time steps to complete. The model is thus not dependent on the precise timing of neuron updates and should function robustly in asynchronous systems where timing is unreliable.

Figure 5:

An attractor network performing a shorter walk than in Figure 2, but where neurons are updated asynchronously, with each neuron having a 10% chance of updating on any time step. (a) The similarity of the network state zt to stored node hypervectors, with the stimulus hypervectors being applied to the network labeled above. (b) The evolution of a subset of neurons within the attractor network, where for visual clarity, three of the node hypervectors have been taken from columns of the N-dimensional Hadamard matrix rather than being randomly generated. The network functions largely the same as in the synchronous case, but with transitions between attractor states now taking a finite number of time steps to complete. The model is thus not dependent on the precise timing of neuron updates and should function robustly in asynchronous systems where timing is unreliable.

Close modal

4.4  Storage Capacity

It is well known that the number of patterns P that can be stored and reliably retrieved in a Hopfield network is proportional to the size of the network, via P<0.14N (Amit, 1989; Hopfield, 1982). When one tries to store more than P attractors within the network, the so-called memory blackout occurs, after which no pattern can be retrieved. We thus perform numerical simulations for a large range of attractor network and FSM sizes to see if an analogous relationship exists. Said otherwise, for an attractor network of finite size N, what sizes of FSM can the network successfully emulate?

For a given N, number of FSM states NZ, and edges NE, a random FSM was generated and an attractor network constructed to represent it as described in section 3. To ensure a reasonable FSM was generated, the FSM’s graph was first generated to have all nodes connected in a sequential ring structure (i.e., every state χνXFSM connects to χν+1modNZ). The remaining edges between nodes were selected at random until the desired number of edges NE was reached. For each edge, an associated stimulus is then required. Although one option would be to allocate as few unique stimuli as possible, so that the state transitions are maximally state-dependent, this results in some advantageous cancellation effects between the Eη transition terms and the stored attractors xνxν. To instead probe a worst-case scenario, each edge was assigned a unique stimulus.

With the FSM now generated, an attractor network with N neurons was constructed as previously described. An initial attractor state was chosen at random, and then a random valid walk between states was chosen to be performed (chosen arbitrarily to be of length 6, corresponding to each run taking 180 time steps). The corresponding sequence of stimuli was input to the attractor network using the same procedure as in Figure 2, each masking the network state in turn. Each run was then evaluated to have either passed or failed, with a pass meaning that the network state inhabited the correct attractor state with overlap d(zt,xν)>0.5 in the middle of all intervals when it should be in a certain node attractor state. This 0.5 criterion was chosen since, for a set of orthogonal hypervectors, at most only one hypervector may satisfy the criterion at once. A pass thus corresponds to the network performing the correct walk between attractor states. The results are shown in Figure 6. We see that for a given N, there is a linear relationship between the number of nodes NZ and number of edges NE in the FSM that can be implemented before failure. That this trade-off exists is not surprising, since both contribute additively to the SNR within the attractor network (see equation 3.7). For each N, a linear support vector machine (SVM) was fitted to the data to find the separating boundary at which failure and success of the walk are approximately equiprobable. The boundary is given by NZ+βNE=c(N), where β represents the relative cost of adding nodes and edges and c(N) is an offset. For all of the fitted boundaries, the value of β was found to be approximately constant, with β=2.2±0.1, and so is assumed to be independent of N. For every value of N, we define the capacity C to be the maximum size of FSM that can be implemented before failure, for which NE=NZ. The capacity C is then given by C(N)=c(N)1+β and is also plotted in Figure 6. A linear fit reveals an approximate proportionality relationship of C(N)0.029N. Combining these two results, the boundary that limits the size of FSM that can be emulated is then given by
(4.3)
It is expected that additional edges consume more of the network’s storage capacity than additional nodes, since for every edge, five additional terms are added to W (see equation 3.13), contributing three times as much cross-talk noise as adding a node would (see equation 3.7). We can compare this storage capacity relation with that of the standard Hopfield model by considering the case NE=0, that is, there are no transition terms in the network, and so the network is identical to a standard Hopfield network. In this case, our failure boundary would become NZ<0.10N, in comparison to Hopfield’s P<0.14N.
Figure 6:

The capacity of the attractor network for varying size N in terms of the size of FSM that can be emulated before failure. For a given N, a random FSM was generated with number of nodes NZ and number of edges NE. An attractor network was then constructed as described in section 3 and a sequence of stimuli input to the network that should trigger a specific walk between attractor states. (a) Every colored square is a successful walk, with no unique (NZ,NE,N) triplet being sampled more than once, and lower-N squares occlude higher-N squares. Since only graphs with at least as many edges as nodes were sampled, NE-NZ is given on the y-axis rather than NE. The overlaid black lines are the SVM-fitted decision boundaries, distinguishing between values that succeeded and values that failed. (b) The capacity C for varying attractor network sizes N, where C is defined to be the maximum size of of FSM that can be implemented before failure, for which NE=NZ. A linear fit is overlaid and shows a linear relationship in the capacity C in terms of N over the range explored. Assuming that the gradients of the linear fit in panel a are equal, the boundary at which failure and success are equiprobable is given by NZ+2.2NE=0.10N.

Figure 6:

The capacity of the attractor network for varying size N in terms of the size of FSM that can be emulated before failure. For a given N, a random FSM was generated with number of nodes NZ and number of edges NE. An attractor network was then constructed as described in section 3 and a sequence of stimuli input to the network that should trigger a specific walk between attractor states. (a) Every colored square is a successful walk, with no unique (NZ,NE,N) triplet being sampled more than once, and lower-N squares occlude higher-N squares. Since only graphs with at least as many edges as nodes were sampled, NE-NZ is given on the y-axis rather than NE. The overlaid black lines are the SVM-fitted decision boundaries, distinguishing between values that succeeded and values that failed. (b) The capacity C for varying attractor network sizes N, where C is defined to be the maximum size of of FSM that can be implemented before failure, for which NE=NZ. A linear fit is overlaid and shows a linear relationship in the capacity C in terms of N over the range explored. Assuming that the gradients of the linear fit in panel a are equal, the boundary at which failure and success are equiprobable is given by NZ+2.2NE=0.10N.

Close modal

4.5  Storage Capacity with Sparse States

The same FSM as shown in Figure 1 was embedded into an attractor network via the construction scheme described in section 3.3, with values N=10,000 neurons and coding level f=0.1. To enforce the correct sparsity in the neural state, the sgn(·) activation function (see equation 2.10) was replaced with a top-k activation function (also known as “k-Winners-Take-All”),
(4.4)
where H(·) is a component-wise Heaviside function and θ is chosen to be the Nfth largest value of Wzt,sp, to enforce that zt+1,sp is f-sparse. While a stimulus hypervector s{-1,1}N is being applied as a mask to the network, the activation function is similarly
(4.5)
with θ being chosen in the same manner. Note that although the introduction of this adaptive θ threshold mechanism may seem to be somewhat biologically implausible, or at least a tall order for any possible neural implementation, it may easily be implemented using a suitably connected population of inhibitory feedback neurons, which silence all attractor neurons except those that receive the greatest input (Amari, 1989; Lin et al., 2014). The sparse attractor network is shown performing a walk between the correct attractor states in Figure 7, as a sequence of stimuli is applied as input to the network. In contrast to the dense bipolar case, the maximum overlap between the network state zt,sp and a stored attractor state xspν is now d(zt,sp,xspν)=f=0.1, while the expected overlap between unrelated states is f2=0.01 rather than 0.
Figure 7:

The attractor network performing a walk between sparse attractor states, where the neurons have a top-k binary activation function to enforce the desired sparsity (see equations 4.4 and 4.5), and the weights matrix is constructed as discussed in section 3.3. The values used here are N=10,000 neurons with coding level f=0.1, such that in any sparse hypervector, 1000 components are +1 while the rest are 0. (a) The input stimuli to the network, consisting of dense bipolar hypervectors applied as multiplicative masks. (b, c) The overlap between the network state zt,sp to stored node attractor states xsp and stored edge attractor states esp, respectively, computed via the inner product (see equation 2.3). Note that since the network and attractor states are now sparse binary, the maximum possible overlap value is f=0.1, while independently generated states have an expected overlap of f2=0.01.

Figure 7:

The attractor network performing a walk between sparse attractor states, where the neurons have a top-k binary activation function to enforce the desired sparsity (see equations 4.4 and 4.5), and the weights matrix is constructed as discussed in section 3.3. The values used here are N=10,000 neurons with coding level f=0.1, such that in any sparse hypervector, 1000 components are +1 while the rest are 0. (a) The input stimuli to the network, consisting of dense bipolar hypervectors applied as multiplicative masks. (b, c) The overlap between the network state zt,sp to stored node attractor states xsp and stored edge attractor states esp, respectively, computed via the inner product (see equation 2.3). Note that since the network and attractor states are now sparse binary, the maximum possible overlap value is f=0.1, while independently generated states have an expected overlap of f2=0.01.

Close modal

We now apply the same procedure as in the dense case for determining the memory capacity of the sparse-activity attractor network. For direct comparison with the dense case, we define the memory capacity C(N) to be the largest FSM with NE=NZ for which walk success and failure are equiprobable. For every tested (N,f,NZ) tuple, we generate a corresponding set of hypervectors and weights matrix as discussed in section 3.3 and then randomly choose a walk between six node attractor states to be completed. The chosen walk then determines the sequence of stimuli to be input, and each stimulus is then applied for 10 time steps. Each (N,f,NZ) tuple was then determined to have passed or failed, with a success criterion that d(zt,sp,xspν)>12(f+f2) in the middle of all intervals when the network should be in a certain node attractor state. This criterion was chosen as it is the sparse analogue of that used in the dense case: at most, only one attractor state may satisfy it at any time.

The results are shown in Figure 8. We see that for a fixed number of neurons N, the size of FSM that may be stored initially increases as f is decreased, but below a certain f, drops off rapidly. To estimate the optimal coding level f and maximum FSM size NZ for an attractor network of size N, we apply a 2D gaussian convolutional filter with standard deviation 3 over the grid of successes or failures for each N value separately; in order to obtain a kernel density estimate (KDE) pKDE of the walk success probability. The capacity C(N) was then obtained by taking the maximum NZ value for which pKDE0.5. This procedure was chosen in order to be comparable to that performed in the dense bipolar case (see Figure 6), where a linear separation boundary between success and failure was used instead. Plotting capacity C against N and applying a linear fit in the log-log domain reveals a scaling relation of CN1.90. This approximately quadratic scaling in the sparse case is a vast improvement over the linear scaling shown in the dense case (see Figure 6) and is in keeping with the theoretical scaling estimates of PmaxN2/(logN)2 for sparsely coded binary attractor networks (Amari, 1989). The optimal coding level f is also shown, and a linear fit in the log-log domain implies a scaling relation of the form fN-0.949. Again, this is similar to the theoretically optimal f(N) scaling relation for sparse binary attractor networks, where the coding level scales like f(logN)/N (Amari, 1989).

Figure 8:

The capacity of the attractor network with sparse binary activity and attractor states, for varying coding level f. (a) Each colored square is a successful walk, with no unique (N,f,NZ) tuple being tested more than once, and lower-N squares occlude higher-N squares for visual clarity. To comply with the definition of the memory capacity C in the dense case, each FSM was generated with an equal number of states as edges, NZ=NE. The capacity C is taken as the maximum NZ value for an N at which the walk success probability pKDE50%, estimated via a gaussian KDE and indicated by the black crosses. (b) The capacities C obtained by this procedure for varying attractor network sizes N, up to N=40,000, and (c) the coding levels f at these points. Linear fits are overlaid for each, implying an approximately quadratic scaling relation for the memory capacity CN1.90 and an approximately inverse scaling relation for the coding level fN-0.949.

Figure 8:

The capacity of the attractor network with sparse binary activity and attractor states, for varying coding level f. (a) Each colored square is a successful walk, with no unique (N,f,NZ) tuple being tested more than once, and lower-N squares occlude higher-N squares for visual clarity. To comply with the definition of the memory capacity C in the dense case, each FSM was generated with an equal number of states as edges, NZ=NE. The capacity C is taken as the maximum NZ value for an N at which the walk success probability pKDE50%, estimated via a gaussian KDE and indicated by the black crosses. (b) The capacities C obtained by this procedure for varying attractor network sizes N, up to N=40,000, and (c) the coding levels f at these points. Linear fits are overlaid for each, implying an approximately quadratic scaling relation for the memory capacity CN1.90 and an approximately inverse scaling relation for the coding level fN-0.949.

Close modal

5.1  FSM Emulation

While there is a large body of work concerning the equivalence between RNNs and FSMs, their implementations broadly fall into a few categories. There are those that require iterative gradient descent methods to mimic an FSM (Das & Mozer, 1994; Lee Giles et al., 1995; Pollack, 1991; Zeng et al., 1993), which makes them difficult to train for large FSMs and improbable for use in biology. There are those that require creating a new FSM with an explicitly expanded state set, Z':=Z×S, such that there is a new state for every old state-stimulus pair (Alquézar & Sanfeliu, 1995; Minsky, 1967), which is unfavorable due to the the explosion of (usually one-hot) states needing to be represented, as well as the difficulty of adding new states or stimuli iteratively. There are those that require higher-order weight tensors in order to explicitly provide a weight entry for every unique state-stimulus pair (Forcada & Carrasco, 2001; Mali et al., 2020; Omlin et al., 1998) which, as well as being nondistributed, may be more difficult to implement, for example, requiring the use of sigma-pi units (Groschner et al., 2022; Koch, 1998) or a large number of hidden neurons with two-body synaptic interactions only (Krotov & Hopfield, 2021).

In Recanatesi et al. (2017), transitions are triggered by adiabatically modulating a global inhibition parameter, such that the network may transition between similar stored patterns. Lacking, however, is a method to construct a network to perform arbitrary, controllable transitions between states. In Chen & Miller (2020) an in-depth analysis of small populations of rate-based neurons is conducted, wherein synapses with short-term synaptic depression enable a rich behavior of itinerancy between attractor states but does not scale to large systems and arbitrary stored memories.

Most closely resembling our approach, however, are earlier works concerned with the related task of creating a sequence of transitions between attractor states in Hopfield-like neural networks. The majority of these efforts rely on the use of synaptic delays, such that the postsynaptic sum on a time step t depends, for example, also on the network state at time t-10 rather than just t-1. These delay synapses thus allow attractor cross-terms of the form xν+1xν to become influential only after the network has inhabited an attractor state for a certain amount of time, triggering a walk between attractor states (Kleinfeld, 1986; Sompolinsky & Kanter, 1986). This then also allowed for the construction of networks with state-dependent input-triggered transitions (Amit, 1988; Drossaers, 1992; Gutfreund & Mezard, 1988). Similar networks were shown to function without the need for synaptic delays, but require fine tuning of network parameters and suffer from extremely low storage capacity (Amit, 1989; Buhmann & Schulten, 1987). In any case, the need for synaptic delay elements represents a large requirement on any substrate that might implement such a network and indeed are problematic to implement in neuromorphic systems (Nielsen et al., 2017).

State-dependent computation in spiking neural networks was realized in Neftci et al. (2013) and Liang et al. (2019), where they used population attractor dynamics to achieve robust state representations via sustained spiking activity. Additionally, these works highlight the need for robust yet flexible neural state machine primitives if one is to succeed in designing intelligent end-to-end neuromorphic cognitive systems. These approaches differ from this work, however, in that the state representations are still fundamentally population-based rather than distributed, and so pose difficulties such as the requirement of finding a new population of neurons to represent any new state (Rutishauser & Douglas, 2009).

Rigotti et al. (2010) discuss the need for a mechanism to induce flips in the neuron state (an operation akin to a Hadamard product) in order to directly implement nontrivial switching between different attractor states but disqualify such a mechanism from plausibly existing using synaptic currents alone. We also reject such a mechanism as a biologically plausible solution, but on the grounds that it would not robustly function in an asynchronous neural system (see section A.3). They instead show the necessity of a population of neurons with mixed selectivity, connected to both the input and attractor neurons, in order to achieve the desired attractor itinerancy dynamics. This requirement arose by demanding that the network state switch to resembling the target state immediately upon receiving a stimulus. We instead show that similar results can be achieved without this extra population if we relax to instead demanding only that the network soon evolve to the target state.

The main contribution of this article is thus to introduce a method by which attractor networks may be endowed with state-dependent, attractor-switching capabilities, without requiring biologically implausible elements or components that are expensive to implement (e.g., precise synaptic delays) and can be scaled up efficiently. The extension to arbitrary FSM emulation shows the generality of the method and that its limitations can be overcome by the appropriate modifications, like introducing the edge state attractors (see section A.7).

5.2  VSA Embeddings

This work also differs from more conventional methods to implement graphs and FSMs in VSAs (Kleyko et al., 2022; Osipov et al., 2017; Poduval et al., 2022; Teeters et al., 2023; Yerxa et al., 2018) in that the network state does not need to be read by an outsider in order to implement the state transition dynamics. That is, where in previous works a graph is encoded by a hypervector (or an associative memory composed of hypervectors) such that the desired dynamics and outputs may be reliably decoded by external circuitry, we instead encode the graph’s connectivity within the attractor network’s weights matrix, such that its recurrent neural dynamics realize the desired state machine behavior.

The use of a Hopfield network as an autoassociative cleanup memory in conjunction with VSAs has been explored in previous works, including theoretical analyses of their capacity to store bundled hypervectors with different representations (Clarkson et al., 2023), and using single attractor states to retrieve knowledge structures from partial cues (Steinberg & Sompolinsky, 2022). Further links between VSAs and attractor networks have also been demonstrated with the use of complex phasor hypervectors, rather than binary or bipolar hypervectors, being stored as attractors within phasor neural networks (Frady & Sommer, 2019; Kleyko et al., 2022; Noest, 1987; Plate, 2003). Complex phasor hypervectors are of particular interest in neuromorphic computing, since they may be very naturally implemented with spike-timing phasor codes, wherein the value represented by a neuron is encoded by the precise timing of its spikes with respect to other neurons or a global oscillatory reference signal, and hypervector binding may be implemented by phase addition (Auge et al., 2021; Orchard & Jarvis, 2023).

Osipov et al. (2017) show the usefulness of VSA representations for synthesizing state machines from observable data, which might be combined with this work to realize a neural system that can synthesize appropriate attractor itinerancy dynamics to best fit observed data. Similarly, if equally robust attractor-based neural implementations of other primitive computational blocks could be created, such as a stack, then they might be combined to create more complex VSA-driven cognitive computational structures, such as neural Turing machines (Graves et al., 2014; Grefenstette et al., 2015; Yerxa et al., 2018). Looking further, this combined with the end-to-end trainability of VSA models could pave the way for neural systems that have the explainability, compositionality, and robustness thereof, but the flexibility and performance of deep neural networks (Hersche et al., 2023; Schlag et al., 2020).

Transitions between discrete neural attractor states are thought to be a crucial mechanism for performing context-dependent decision making in biological neural systems (Daelli & Treves, 2010; Mante et al., 2013; Miller, 2016; Tajima et al., 2017). Attractor dynamics enable a temporary retention of received information and ensure that irrelevant inputs do not produce stable deviations in the neural state. Such networks are widely theorized to exist in the brain—for example, in the hippocampus for its pattern completion and working memory capabilities (Khona & Fiete, 2022; Rolls, 2013). As such, we showed that a Hopfield attractor network and its sparse variant can be modified such that they can perform stimulus-triggered, state-dependent attractor transitions without resorting to additional biologically implausible mechanisms and while abiding by the principles of distributed representation. The changes we introduced are (1) an altered weights matrix construction with additional asymmetric cross-terms (which does not incur any considerable extra complexity) and (2) the ability for a stimulus to mask a subset of neurons within the attractor population. As long as such a mechanism exists, the network proposed here could thus map onto brain areas theorized to support attractor dynamics. The masking mechanism could, for example, feasibly be achieved by a population of inhibitory neurons representing the stimuli, which selectively project to neurons within the attractor population.

6.1  Robustness

The robust functioning of the network despite noisy and unreliable weights is a crucial prerequisite for the model to plausibly be able to exist in biological systems. As we have shown, the network weights may be considerably degraded without affecting the behavior of the network, and indeed beyond this, the network exhibits a so-called graceful degradation in performance. Furthermore, biological synapses are expected to have only a few bits of precision (Baldassi et al., 2016; Bartol et al., 2015; O’Connor et al., 2005), and the network has been shown to function even in the worst case of binary weights. These properties stem from the massive redundancy arising from storing the attractor states across the entire synaptic matrix in a distributed manner, a technique that the brain is expected to use (Crawford et al., 2016; Rumelhart & McClelland, 1986). Of course, we expect there to be a trade-off between the amount of each nonideality that the network can withstand before failure. That is, an attractor network with dense noisy weights may withstand a greater degree of synaptic noise than if the weights matrix were also made sparse. Likewise, larger networks storing the same-sized FSM should be able to withstand greater nonidealities than smaller networks, as is the case for attractor networks in general (Amit, 1989; Sompolinsky, 1987).

Since the network is still an attractor network, it retains all of the properties that make them suitable for modeling cognitive function, such as that the network can perform robust pattern completion and correction, that is, the recovery of a stored prototypical memory given a damaged, incomplete, or noisy version, and thereafter function as a stable working memory (Amit, 1989; Hopfield, 1982).

The robustness of the network to weight nonidealities also makes it a prime candidate for implementation with novel memristive crossbar technologies, which would allow an efficient and high-density implementation of the matrix-vector multiplication required in the neural state update rule (see equation 3.2) to be performed in one operation (Ielmini & Wong, 2018; Verleysen & Jespers, 1989; Xia & Yang, 2019). Akin to the biological synapses they emulate, such devices also often have only a few bits of precision and suffer from considerable per-device mismatch in the programmed conductance states. The network proposed in this article is thus highly suitable for implementation with such architectures, as we have shown that robust performance is retained even when the network is subjected to a very high degree of such nonidealities.

The continued functionality of the network when its dynamics are asynchronous is another important factor when considering its biological plausibility. In a biological neural system, neurons will produce action potentials whenever their membrane potential happens to exceed the neuron’s spiking threshold, rather than all updating synchronously at fixed time intervals. We tested the regime where the timescale of the neuron dynamics is much slower than the timescale of the input by replacing the synchronous neuron update rule with a stochastic asynchronous variant thereof, and showed that the network is robust to this asynchrony. Similarly, we tested the regime where neuron dynamics are much faster than the input by considering input that is applied stochastically and asynchronously instead (see section A.3). The continued robustness of the model in these two extreme asynchronous regimes implies that the network is not dependent on the exact timing of inputs to the network or on the neuron updates within the network, and so would function robustly in both biological neural systems and asynchronous neuromorphic systems where the exact timing of events cannot be guaranteed (Davies et al., 2018; Liu et al., 2014).

6.2  Learning

The procedure for generating the weights matrix W as a result of its simplicity makes the proposed network more biologically plausible than other more complex approaches (e.g., those using gradient descent methods). It can be learned in one shot in a fully online fashion, since adding a new node or edge involves only an additive contribution to the weights matrix, which does not require knowledge of irrelevant edges, nodes, their hypervectors, or the weight values themselves. Furthermore, as a result of the entirely distributed representation of states and transitions, new behaviors may be added to the weights matrix at a later date without having to allocate new hardware and without having to recalculate W with all previous data. Both of these factors are critical for continual online learning.

Evaluating the local learnability of W is also necessary to evaluate the biological plausibility of the model. In the original paper by Hopfield, the weights could be learned using the simple Hebbian rule,
(6.1)
where xiν and xjν are the activities of the post- and presynaptic neurons, respectively, and δwij the online synaptic efficacy update (Hebb, 1949; Hopfield, 1982). While the attractor terms within the network can be learned in this manner, the transition cross-terms that we have introduced require an altered version of the learning rule. If we simplify our network construction by removing the edge state attractors, then the local weight update required to learn a transition between states is given by
(6.2)
where y, x, and s are as previously defined. In removing the edge states, we disallow FSMs with consecutive edges with the same stimulus (e.g., “father_is, father_is”), but this is not a problem if completely general FSM construction is not the goal per se (see section 7, Figure 12). This state-transition learning rule is just as local as the original Hopfield learning rule, as the weight update from presynaptic neuron j to postsynaptic neuron i is dependent only on information that may be made directly accessible in the pre- and postsynaptic neurons and does not depend on information in other neurons to which the synapse is not connected (Khacef et al., 2022; Zenke & Neftci, 2021).
Figure 9:

An attractor network constructed via the simpler weights construction method specified in section A.3, with input to the network modeled as Hadamard product binding rather than component-wise masking. (a) The similarity of the network state zt to stored node hypervectors, when the stimulus hypervector s is applied on one time step for all neurons simultaneously. (b) A subset of the stimulus hypervector s at each time step in this synchronous case. (c) The attractor overlaps in the asynchronous case, where the stimulus s is applied over multiple time steps randomly. (d) A subset of the stimulus hypervector s at each time step in this asynchronous case. For visual clarity, the two stimulus hypervectors shown were manually chosen rather than randomly generated. In the synchronous case, the network performs the correct walk between attractor states as intended. In the asynchronous case however, the stimuli fail to effect the desired transitions, since any changes in the network state caused by the input stimuli are short-lived, as they are quickly reversed on the next time step by the attractor network’s pattern-correcting dynamics.

Figure 9:

An attractor network constructed via the simpler weights construction method specified in section A.3, with input to the network modeled as Hadamard product binding rather than component-wise masking. (a) The similarity of the network state zt to stored node hypervectors, when the stimulus hypervector s is applied on one time step for all neurons simultaneously. (b) A subset of the stimulus hypervector s at each time step in this synchronous case. (c) The attractor overlaps in the asynchronous case, where the stimulus s is applied over multiple time steps randomly. (d) A subset of the stimulus hypervector s at each time step in this asynchronous case. For visual clarity, the two stimulus hypervectors shown were manually chosen rather than randomly generated. In the synchronous case, the network performs the correct walk between attractor states as intended. In the asynchronous case however, the stimuli fail to effect the desired transitions, since any changes in the network state caused by the input stimuli are short-lived, as they are quickly reversed on the next time step by the attractor network’s pattern-correcting dynamics.

Close modal
Figure 10:

The attractor network performing a walk as masking input is applied asynchronously over multiple time steps with random delays. (a) The similarities between the network state zt and stored node hypervectors xXAN. (b) A subset of the stimulus hypervector s being applied to the network as a mask at each time step. Indices that are black on any time step have [s]i=-1 and so are being masked by the stimulus. For visual clarity, the two stimulus hypervectors shown were manually chosen rather than randomly generated. The attractor transition dynamics are thus robust to input asynchrony when the input is modeled as a component-wise masking of the network state.

Figure 10:

The attractor network performing a walk as masking input is applied asynchronously over multiple time steps with random delays. (a) The similarities between the network state zt and stored node hypervectors xXAN. (b) A subset of the stimulus hypervector s being applied to the network as a mask at each time step. Indices that are black on any time step have [s]i=-1 and so are being masked by the stimulus. For visual clarity, the two stimulus hypervectors shown were manually chosen rather than randomly generated. The attractor transition dynamics are thus robust to input asynchrony when the input is modeled as a component-wise masking of the network state.

Close modal
Figure 11:

An attractor network receiving a sequence of stimuli to trigger a certain walk constructed (a) without edge states and (b,c) with edge states, with edge state overlaps shown in panel c. Due to the consecutive edges in the FSM (see Figure 1) with the same stimulus “father_is,” the edge-state-less network overshoots and skips the “Kronos” state, stopping instead at the “Uranus” state. Similarly, there is an unwanted oscillation between the states “Gaia” and “Uranus” due to the bidirectional edge with stimulus “consort_is.” The addition of the edge state attractors resolves these issues and allows the network to function robustly when input stimuli are applied for an arbitrary number of time steps.

Figure 11:

An attractor network receiving a sequence of stimuli to trigger a certain walk constructed (a) without edge states and (b,c) with edge states, with edge state overlaps shown in panel c. Due to the consecutive edges in the FSM (see Figure 1) with the same stimulus “father_is,” the edge-state-less network overshoots and skips the “Kronos” state, stopping instead at the “Uranus” state. Similarly, there is an unwanted oscillation between the states “Gaia” and “Uranus” due to the bidirectional edge with stimulus “consort_is.” The addition of the edge state attractors resolves these issues and allows the network to function robustly when input stimuli are applied for an arbitrary number of time steps.

Close modal
Figure 12:

Embedding an FSM that does not require edge states since it does not have consecutive edges with the same stimulus. (a) The FSM to be embedded, representing a simple decision tree. (b, c) An attractor network constructed to store this FSM, without any edge states, as a sequence of stimuli is input. The network performs the correct walks between attractor states as desired. To note is that the second stimulus (“is_orange”) and its transition are state dependent, as the target state (“carrot” or “tangerine”) is dependent on the stimulus given 20 time steps before (“is_round” or “is_pointy”). This illustrates that the edge states are not strictly necessary to implement state-dependent transitions between attractor states.

Figure 12:

Embedding an FSM that does not require edge states since it does not have consecutive edges with the same stimulus. (a) The FSM to be embedded, representing a simple decision tree. (b, c) An attractor network constructed to store this FSM, without any edge states, as a sequence of stimuli is input. The network performs the correct walks between attractor states as desired. To note is that the second stimulus (“is_orange”) and its transition are state dependent, as the target state (“carrot” or “tangerine”) is dependent on the stimulus given 20 time steps before (“is_round” or “is_pointy”). This illustrates that the edge states are not strictly necessary to implement state-dependent transitions between attractor states.

Close modal

From the hardware perspective, the locality of the learning rule means that if the matrix-vector multiplication step in the neuron state update rule is implemented using novel memristive crossbar circuits (Ielmini & Wong, 2018; Xia & Yang, 2019; Zidan & Lu, 2020), then the weights matrix could be learned online and in-memory via a sequence of parallel conductance updates rather than by computing the weights matrix offline and then writing the summed values to the devices’ conductances. As long as the updates in the memristors’ conductances are sufficiently linear and symmetric, then attractors and transitions could be sequentially learned in one shot and in parallel by specifying the two hypervectors in the outer product weight update at the crossbar’s inputs and outputs by appropriately shaped voltage pulses (Alibart et al., 2013; Li et al., 2021).

6.3  Scaling

When the FSM states are represented by dense bipolar hypervectors within the attractor network, we found a linear scaling between the size of the network N and the capacity C in terms of the size of FSM that could be embedded without errors. Although this is in keeping with the results in the Hopfield paper, this is not a favorable result when considering the biological plausibility of the system for large N (Hopfield, 1982). Since the attractor network is fully connected, the capacity actually scales sublinearly CNsyn with the number of synapses Nsyn, meaning that an increasing number of synapses are required per attractor and transition to be stored for large N, and so the network becomes increasingly inefficient. Additionally, the fact that every neuron is active at any time (or half of them, depending on the interpretation of the -1 state) represents an unnecessarily large energy burden for any system using this model. This is in contrast to data from neural recordings, where a low per neuron mean activity is ensured by the sparse coding of information (Barth & Poulet, 2012; Olshausen & Field, 2004; Rolls & Treves, 2011).

We thus tested how the capacity of the network scales with N when the FSM states are instead represented by sparse binary hypervectors with coding level f, since it is well known that the number of sparse binary vectors that can be stored in an attractor network scales much more favorably, PN2/(logN)2 (Amari, 1989). We found indeed that the sparse coding of the FSM states vastly improved the capacity of the network, scaling approximately quadratically with CN1.90, and so approximately linearly in the number of synapses. This linear scaling with the number of synapses not only ensures the efficient use of available synaptic resources in biological systems, but is especially important when one considers a possible implementation in neuromorphic hardware, where the number of synapses usually represents the main size constraint rather than the number of neurons (Davies et al., 2018; Manohar, 2022).

The coding level f was found to have an approximately inverse relationship with the attractor network size, fN-0.949, which would imply that the number of active neurons Nf in any attractor state grows very slowly, NfN0.051. This is in agreement with the theoretically optimal case, where the coding level for a sparse binary attractor network should scale like f(logN)/N, and so the number of active neurons in any pattern scales like NflogN (Amari, 1989).

Sparsity in the stored hypervectors is especially important when one considers how the weights matrix W could be learned in an online fashion if the synapses are restricted to have only a few bits of precision. So far we have considered quantization of the weights only after the summed values have been determined, whereas including weight quantization while new patterns are being iteratively learned is a much harder problem and implies attractor capacity relations as poor as PlogN. One solution is for the states to be increasingly sparse, in which case the optimal scaling of PN2/(logN)2 can be recovered (Amit & Fusi, 1994; Brunel et al., 1998).

In short, by letting the FSM states be represented by sparse binary hypervectors rather than dense bipolar hypervectors, we not only move closer to a more biologically realistic model of neural activity, but also benefit from the superior scaling properties of sparse binary attractor networks, which lets the maximum size of FSM that can be embedded scale approximately quadratically with the attractor network size rather than linearly.

Attractor neural networks are robust abstract models of human memory, but previous attempts to endow them with complex and controllable attractor-switching capabilities have suffered mostly from being nondistributed, not scalable, or not robust. We have here introduced a simple procedure by which any arbitrary FSM may be embedded into a large-enough Hopfield-like attractor network, where states and stimuli are represented by high-dimensional random hypervectors and all information pertaining to FSM transitions is stored in the network’s weights matrix in a fully distributed manner. Our method of modeling input to the network as a masking of the network state allows cross-terms between attractors to be stored in the weights matrix in a way that they are effectively obfuscated until the correct state-stimulus pair is present, much in a manner similar to the standard binding-unbinding operation in more conventional VSAs.

We showed that the network retains many of the features of attractor networks that make them suitable for biology, namely, that the network is not reliant on synchronous dynamics and is robust to unreliable and imprecise weights, thus also making it highly suitable for implementation with high-density but noisy devices. We presented numerical results showing that the network capacity in terms of implementable FSM size scales linearly with the size of the attractor network for dense bipolar hypervectors and approximately quadratically for sparse binary hypervectors.

In summary, we introduced an attractor-based neural state machine that overcomes many of the shortcomings that made previous models unsuitable for use in biology and propose that attractor-based FSMs represent a plausible path by which FSMs may exist as a distributed computational primitive in biological neural networks.

A.1  Dynamics without Masking

For the following calculations we assume that the coding level of the output states fr is low enough that their effect can be ignored. With this in mind, if we ignore the semantic differences between attractors for node states and attractors for edge states, the two summations over states can be absorbed into one summation over both types of attractor, here both denoted xν. Similarly, there is then no difference between the two transition cross-terms within each E term, and they too can be absorbed into one summation. Our simplified expression for W is now given by
(A.1)
where χ(λ) and υ(λ) are functions {1,...,2NE}{1,...,NZ+NE} determining the indices of the source and target states for transition λ, and π(λ):{1,...,2NE}{1,...Nstimuli} determines the index of the associated stimulus. We then wish to calculate the statistics of the postsynaptic sum Wz while the attractor network is currently in an attractor state. When in an attractor state xμ, the postsynaptic sum is given by
(A.2)
where we have used the notation N(μ,σ2) to denote a normally distributed random variable (RV) with mean μ and variance σ2. In the third line, we have made the approximation in the transition summation that the linear sum of attractor hypervectors, each multiplied by a gaussian RV, is itself a separate gaussian RV in each dimension. This holds as long as there are “many” attractor terms appearing on the left-hand side of the transition summation. Said otherwise, if the summation over transition terms has only very few unique attractor terms on the left-hand side (NENZ), then the noise will be a random linear sum of the same few (masked) hypervectors, each with approximate magnitude 1N, and so will be highly correlated between dimensions. Nonetheless, we assume we are far away from this regime and let the effect of the sum of these unwanted terms be approximated by a normally distributed random vector, and so we have
(A.3)

where σ=NZ+3NEN is the strength of cross-talk noise and n a vector composed of IID standard normally distributed RVs. This procedure of quantifying the signal-to-noise ratio (SNR) is adapted from that in the original Hopfield paper (Amit, 1989; Hopfield, 1982).

A.2  Dynamics with Masking

We can similarly calculate the postsynaptic sum when in an attractor state xμ, while the network is being masked by a stimulus sκ, with this (state, stimulus) tuple corresponding to a certain valid transition λ', with source, target, and stimulus hypervectors xμ, xφ, and sκ respectively:
(A.4)
where in the third line, we have made the same approximations as previously discussed. The postsynaptic sum is thus approximately xφ in all indices that are not currently being masked, which drives the network toward that (target) attractor. In vector form, the above is written as
(A.5)
where it is assumed that there exists a stored transition from state xμ to xφ with stimulus sκ, and denotes approximate proportionality. A similar calculation can be performed in the case that a stimulus is imposed that does not correspond to a valid transition for the current state. In this case, no terms of significant magnitude emerge from the transition summation, and we are left with
(A.6)
that is, the attractor dynamics are largely unaffected. Since we have not distinguished between our above attractor terms being node attractors or edge attractors or our stimuli from being sa or sb stimuli, the above results can be applied to all relevant situations mutatis mutandis.

A.3  Why Model Input as Masking?

One immediate question might be why we have chosen to model input to the network as a masking of the neural state vector (see equation 3.2), rather than simply modeling input as a Hadamard product, with a state update rule given by
(A.7)
such that a component for which the input stimulus si=-1 triggers a “flip” in the neuron state +1-1. As we will show, the problem with this construction is that it relies on the synchrony of input to the network and does not allow for the input to arrive asynchronously and with arbitrary delays. While this would not be a problem for a digital synchronous system, such timing constraints cannot be expected to be met in a network of asynchronously firing biological neurons. In the synchronous case, however, the edge terms Eη in the weights matrix construction could be simplified to
(A.8)
where as per previous notation, x and y are the source and target attractor states, respectively, and s the stimulus to cause the transition. Superficially, this construction would then satisfy our main requirements for achieving the desired attractor itinerancy dynamics during input and rest scenarios, namely,
(A.9)
which ensures that while there is no input to the network, the states x are stable attractors of the network dynamics, and
(A.10)
which ensures that inputting the stimulus s triggers the desired transition. The resulting dynamics for this network, when input is entirely synchronous, are shown in Figure 9a, and indeed the network performs the desired walk.

We then test the functionality of the attractor network with Hadamard input when the exact simultaneous arrival of input stimuli cannot be guaranteed (i.e., the input to the network is asynchronous. To model this, we consider that the arrival time of the stimulus is component-wise randomly and uniformly spread over five time steps rather than just one. The same attractor network receiving the same sequence of Hadamard-product stimuli, but now asynchronously, is shown in Figure 9c). The network does not perform the correct walk between attractor states and instead remains localized near the initial attractor state across all time steps. This is due to the fact that although when input is applied, the network begins to move away from the initial attractor state, these changes are immediately undone by the network’s inherent attractor dynamics, since the neural state is still within the initial attractor’s basin of attraction. Only when the timescale of the input is far faster than the timescale of the attractor dynamics (e.g., input is synchronous) may the input accumulate fast enough to escape the initial basin of attraction.

When input to the network is treated as masking operation, however (see equation 3.2), the attractor itinerancy dynamics are robust to input asynchrony. To model this, the input stimulus is stochastically applied, with each component being delayed randomly and uniformly by up to 20 time steps. The stimulus is then held for 10 time steps and stochastically removed over 20 time steps in the same manner. The attractor network with asynchronous masking input is shown in Figure 10 and functions as desired, performing the correct walk between attractor states. Modeling input to the network as a masking operation thus allows the network to operate robustly in asynchronous regimes, while modeling input to the network as a Hadamard product does not.

A.4  The Need for Edge States

The need for the edge state attractors arises when one wants to emulate an FSM where there are consecutive edges with the same stimulus. For example, in the FSM implemented throughout this article (see Figure 1), there is an incoming edge from “Zeus” to “Kronos” with stimulus “father_is” and then immediately an outgoing edge from “Kronos” to “Uranus” with stimulus “father_is” also. More generally, consider that we wish to embed the transitions
(A.11)

In the fully synchronous case, when input is applied for one time step only, there is no need for edge states. When the stimulus s is applied, the network will make one transition only. In the asynchronous case, however, one cannot ensure that the stimulus is applied for one time step only. Thus, starting from x1, when the stimulus is applied “once” for an arbitrary number of time steps, the network may have the unwanted behavior of transitioning to x2 on the first time step and then to x3 on the second, effectively overshooting and skipping x2. In Figure 11 we see the dynamics of the attractor network constructed without any edge states, with inputs that are applied for 10 time steps each, and we indeed see the undesirable skipping behavior. Similarly, bidirectional edges with the same stimulus (e.g., “consort_is”) cause an unwanted oscillation between attractor states. The edge states offer a solution to this problem: by adding an intermediate attractor state for every edge and splitting each edge into two transitions with stimuli sa and sb, we ensure that there are no consecutive edges with the same stimulus.

If we don’t necessarily need to be able to embed FSMs with consecutive edges with the same stimulus, then we can get rid of the edge states and construct our weights matrix with simpler transition terms as in equation 6.2. An attractor network constructed in this way is shown in Figure 12 for a chosen FSM that does not require edge states but still contains state-dependent transitions. The network performs the correct walk between attractor states as intended and does not suffer from any of the unwanted skipping or oscillatory phenomena as in Figure 11. Thus, while the edge states are required to ensure that any FSM can be implemented in a “large enough” attractor network, they are not strictly necessary to achieve state-dependent, stimulus-triggered attractor transition dynamics.

A.5  Sparse Stimuli

One shortcoming of the model might be that we used dense bipolar hypervectors s to represent the stimuli, meaning that when s is being input to the network, masking all neurons for which sj=-1, approximately half of all neurons within the network are silenced. This was initially chosen because unbiased bipolar hypervectors are arguably the simplest and most common choice of VSA representation, and highlights the fact that VSA-based methods can be applied to the design of attractor networks with very little required tweaking (Gayler, 1998; Kleyko et al., 2022).

From the biological perspective, however, it could be seen as somewhat implausible that the number of active neurons should change so drastically (halving) while a stimulus is present. Furthermore, if implemented with spiking neurons, the large changes in the total spiking activity could cause unwanted effects in the spike rate of the nonmasked neurons. Also, this means that while the network is being masked, the size of the network (and so its capacity) is reduced to N/2, and so the network is especially prone to instability during the transition periods if the network is nearing its memory capacity limits.

For these reasons, it is worth exploring whether the network could be constructed such that during a masking operation, fewer than half of all neurons are masked—that is, s is biased to contain more +1 than -1 entries.3 To keep the notation consistent with the notation used for sparse binary hypervectors, we denote the coding level of the attractor states as fz (where previously it was simply f) and the coding level of the stimulus hypervectors as fs. The coding level of the stimulus hypervectors fs we define to be the fraction of components for which sj>0. A stimulus hypervector with fs>0.5 thus silences fewer neurons from the network during a masking operation. This is not the only change we need to make, however. If we turn to our (sparse) edge terms (see equation 3.15), they were previously constructed such that they would produce a nonnegligible overlap with the network state zsp if and only if the network is in the correct attractor state and is being masked by the correct stimulus. The important condition to be fulfilled is then
(A.12)
that is, the overlap should be negligible if the network is in the correct attractor state but the stimulus is not present. This condition is satisfied if the components of s are generated according to
where sj is the jth component of s. This implies that for a stimulus hypervector biased toward having more positive entries (fewer neurons are masked), the negative entries must increase in magnitude to compensate for their infrequency. For the case that only a quarter of neurons are masked by the stimulus (fs=0.75), the negative 25% of components must have the value -3, while for fs=0.5, this of course collapses to the balanced bipolar hypervectors used throughout this article with IP(Sj=1)=IP(Sj=-1)=0.5 (see equation 2.2). We are forced to increase the magnitude of the negative terms rather than reduce the positive terms, since the magnitude of the positive terms must remain identical to that of the stored attractor terms in order to ensure that the correct target state is projected out during a transition. We can then construct our weights matrix in the same way as before but using these biased stimulus hypervectors s. An attractor network was generated with coding levels fz=0.1 (10% of neurons are active in any attractor hypervector) and fs=0.9 (10% of neurons are masked by stimulus hypervectors), and the results are shown in Figure 13, with the neural state performing the correct walk between attractor states as desired.
Figure 13:

An attractor network with both sparse states and sparse stimuli, constructed as described in section A.5. The values used here are N=10,000, fz=0.1 (meaning only 10% of neurons are active at any time), and fs=0.9 (meaning that only 10% of neurons are masked by the stimulus). (a) The input hypervectors to the network, each masking a random 10% of neurons within the network. (b) The overlap of the sparse network state zsp with stored attractor hypervectors. (c) A subset of the neurons within the network, showing the active neurons (zj=1) in red, as well as which neurons are currently being masked by the input (sj<0). The network performs the correct walk between attractor states. The balanced bipolar stimulus hypervectors used throughout this article may thus also be generalized to be sparse.

Figure 13:

An attractor network with both sparse states and sparse stimuli, constructed as described in section A.5. The values used here are N=10,000, fz=0.1 (meaning only 10% of neurons are active at any time), and fs=0.9 (meaning that only 10% of neurons are masked by the stimulus). (a) The input hypervectors to the network, each masking a random 10% of neurons within the network. (b) The overlap of the sparse network state zsp with stored attractor hypervectors. (c) A subset of the neurons within the network, showing the active neurons (zj=1) in red, as well as which neurons are currently being masked by the input (sj<0). The network performs the correct walk between attractor states. The balanced bipolar stimulus hypervectors used throughout this article may thus also be generalized to be sparse.

Close modal

To be noted is that as we approach fs1, the stimuli become less and less distributed, with the limiting case fs=1-1/N implying that only one component of s is negative, and so by masking only one neuron, the network will switch between attractor states. This case is obviously a stark departure from the robustness that the more distributed representations afford us, since if that single neuron is faulty or dies, it would be catastrophic for the functioning of the network. Similarly, if another independent stimulus were to, by chance, choose the same component to be nonnegative, this would cause similarly unwanted dynamics. Less catastrophic but still worth considering is that the noise added per edge term, as a result of the negative terms becoming very large, has variance that scales like Var[sj]1/(1-fs), and so for fs1 contributes an increasing amount of unwanted noise to the system, destabilizing the attractor dynamics. Nevertheless, this represents yet another trade-off in the attractor network’s design, as needing to mask fewer neurons might be worth the increased noise within the system, decreasing its memory capacity.

A.6  Symbols and Definitions

Table 2:

Notation and Frequently Used Symbols.

SymbolDefinition
N Number of neurons within the attractor network 
NZ Number of FSM states 
NE Number of FSM edges 
a,b,c... Dense bipolar hypervectors 
asp,bsp,csp... Sparse binary hypervectors 
f Coding level of a hypervector (fraction nonzero components) 
zt Neuron state vector at time step t 
x,y Node hypervectors representing an FSM state 
e Edge-state hypervectors 
s,sa,sb Stimulus hypervectors 
r Ternary output hypervectors 
1 A hypervector of all ones 
W Recurrent weights matrix 
wij Synaptic weight from neuron j to i 
E Matrices added to W to implement transitions 
 Hadamard product (component-wise multiplication) 
x Transpose of x 
H(·) Component-wise Heaviside function 
sgn(·) Component-wise sign function 
SymbolDefinition
N Number of neurons within the attractor network 
NZ Number of FSM states 
NE Number of FSM edges 
a,b,c... Dense bipolar hypervectors 
asp,bsp,csp... Sparse binary hypervectors 
f Coding level of a hypervector (fraction nonzero components) 
zt Neuron state vector at time step t 
x,y Node hypervectors representing an FSM state 
e Edge-state hypervectors 
s,sa,sb Stimulus hypervectors 
r Ternary output hypervectors 
1 A hypervector of all ones 
W Recurrent weights matrix 
wij Synaptic weight from neuron j to i 
E Matrices added to W to implement transitions 
 Hadamard product (component-wise multiplication) 
x Transpose of x 
H(·) Component-wise Heaviside function 
sgn(·) Component-wise sign function 

We thank Dr. Federico Corradi, Dr. Nicoletta Risi, and Prof. Matthew Cook for their invaluable input and suggestions, as well as their help with proofreading this article.

The work was funded by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation; Project MemTDE, project number 441959088) as part of the DFG priority program SPP 2262 MemrisTec (project number 422738993) and Project NMVAC (project number 432009531). We acknowledge the financial support of the CogniGron research center and the Ubbo Emmius Funds (University of Groningen).

1

Though this arbitrary choice may seem to incur a bias to a particular state, in practice, the postsynaptic sum very rarely equals 0.

2

We have here ignored that the diagonal of W is set to zero (no self-connections), but this does not significantly affect the following results.

3

We could also use binary s hypervectors, rather than positive/negative, and then alter the transition terms Eη to include f and 1/(1-f) terms to achieve the same result. We believe it is more intuitive not to make this change for this section, however.

Alibart
,
F.
,
Zamanidoost
,
E.
, &
Strukov
,
D. B.
(
2013
).
Pattern classification by memristive crossbar circuits using ex situ and in situ training
.
Nature Communications
,
4
(
1
), 2072.
Alquézar
,
R.
, &
Sanfeliu
,
A.
(
1995
).
An algebraic framework to represent finite state machines in single-layer recurrent neural networks
.
Neural Computation
,
7
.
Amari
,
S.-I.
(
1989
).
Characteristics of sparsely encoded associative memory
.
Neural Networks
,
2
(
6
),
451
457
.
Amit
,
D. J.
(
1988
).
Neural networks counting chimes
.
Proceedings of the National Academy of Sciences
,
85
(
7
),
2141
2145
.
Amit
,
D. J.
(
1989
).
Modeling brain function: The world of attractor neural network.
Cambridge University Press
.
Amit
,
D. J.
, &
Fusi
,
S.
(
1994
).
Learning in neural networks with material synapses
.
Neural Computation
,
6
(
5
),
957
982
.
Auge
,
D.
,
Hille
,
J.
,
Mueller
,
E.
, &
Knoll
,
A.
(
2021
).
A survey of encoding techniques for signal processing in spiking neural networks
.
Neural Processing Letters
,
53
(
6
),
4693
4710
.
Backus
,
J.
(
1978
).
Can programming be liberated from the von Neumann style? A functional style and its algebra of programs
.
Communications of the ACM
,
21
(
8
),
613
641
.
Baldassi
,
C.
,
Gerace
,
F.
,
Lucibello
,
C.
,
Saglietti
,
L.
, &
Zecchina
,
R.
(
2016
).
Learning may need only a few bits of synaptic precision
.
Physical Review E
,
93
.
Barth
,
A. L.
, &
Poulet
,
J. F. A.
(
2012
).
Experimental evidence for sparse firing in the neocortex
.
Trends in Neurosciences
,
35
(
6
),
345
355
.
Bartol
,
T. M.
, Jr.
,
Bromer
,
C.
,
Kinney
,
J.
,
Chirillo
,
M. A.
,
Bourne
,
J. N.
,
Harris
,
K. M.
, &
Sejnowski
,
T. J.
(
2015
).
Nanoconnectomic upper bound on the variability of synaptic plasticity
.
eLife
,
4
, e10778.
Brinkman
,
B. A. W.
,
Yan
,
H.
,
Maffei
,
A.
,
Park
,
I. M.
,
Fontanini
,
A.
,
Wang
,
J.
, &
La Camera
,
G.
(
2022
).
Metastable dynamics of neural circuits and networks
.
Applied Physics Reviews
,
9
(
1
), 011313.
Brunel
,
N.
,
Carusi
,
F.
, &
Fusi
,
S.
(
1998
).
Slow stochastic Hebbian learning of classes of stimuli in a recurrent neural network
.
Network: Computation in Neural Systems
,
9
(
1
),
123
152
.
Buhmann
,
J.
, &
Schulten
,
K.
(
1987
).
Noise-driven temporal association in neural networks
.
Europhysics Letters
,
4
(
10
),
1205
1209
.
Buonomano
,
D. V.
, &
Maass
,
W.
(
2009
).
State-dependent computations: Spatiotemporal processing in cortical networks
.
Nature Reviews Neuroscience
,
10
(
2
),
113
125
.
Chaudhuri
,
R.
, &
Fiete
,
I.
(
2016
).
Computational principles of memory
.
Nature Neuroscience
,
19
(
3
),
394
403
.
Chen
,
B.
, &
Miller
,
P.
(
2020
).
Attractor-state itinerancy in neural circuits with synaptic depression
.
Journal of Mathematical Neuroscience
,
10
(
1
), 15.
Clarkson
,
K. L.
,
Ubaru
,
S.
, &
Yang
,
E.
(
2023
).
Capacity analysis of vector symbolic architectures
. .
Crawford
,
E.
,
Gingerich
,
M.
, &
Eliasmith
,
C.
(
2016
).
Biologically plausible, human-scale knowledge representation
.
Cognitive Science
,
40
(
4
),
782
821
.
Daelli
,
V.
, &
Treves
,
A.
(
2010
).
Neural attractor dynamics in object recognition
.
Experimental Brain Research
,
203
(
2
),
241
248
.
Das
,
S.
, &
Mozer
,
M. C.
(
1994
).
A unified gradient-descent/clustering architecture for finite state machine induction
. In
J.
Cowan
,
G.
Tesauro
, &
J.
Alspector
(Eds.),
Advances in neural information processing systems
,
6
.
MIT Press
.
Davies
,
M.
,
Srinivasa
,
N.
,
Lin
,
T.-H.
,
Chinya
,
G.
,
Cao
,
Y.
,
Choday
,
S. H.
, . . .
Wang
,
H.
(
2018
).
Loihi: A neuromorphic manycore processor with on-chip learning
.
IEEE Micro
,
38
(
1
),
82
99
.
Dayan
,
P.
(
2008
).
Simple substrates for complex cognition
.
Frontiers in Neuroscience
,
2
, 31.
Drossaers
,
M. F. J.
(
1992
).
Hopfield models as nondeterministic finite-state machines
. In
Proceedings of the 14th Conference on Computational Linguistics
(vol.
1
,
113
119
).
Eliasmith
,
C.
(
2005
).
A unified approach to building and controlling spiking attractor networks
.
Neural Computation
,
17
(
6
),
1276
1314
.
Forcada
,
M.
, &
Carrasco
,
R. C.
(
2001
).
Finite-state computation in analog neural networks: Steps towards biologically plausible models?
In
M.
Forcada
&
R. C.
Carrasco
,
Emergent neural computational architectures based on neuroscience
.
Springer
.
Frady
,
E. P.
, &
Sommer
,
F. T.
(
2019
).
Robust computation with rhythmic spike patterns
.
Proceedings of the National Academy of Sciences
,
116
(
36
),
18050
18059
.
Furber
,
S. B.
,
Bainbridge
,
W. J.
,
Cumpstey
,
J. M.
, &
Temple
,
S.
(
2004
).
Sparse distributed memory using N-of-M codes
.
Neural Networks
,
17
(
10
),
1437
1451
.
Gayler
,
R. W.
(
1998
).
Multiplicative binding, representation operators and analogy
.
Advances in analogy research: Integration of theory and data from the cognitive, computational, and neural sciences
.
Granger
,
R.
(
2020
).
Toward the quantification of cognition
. .
Graves
,
A.
,
Wayne
,
G.
, &
Danihelka
,
I.
(
2014
).
Neural Turing machines.
.
Grefenstette
,
E.
,
Hermann
,
K. M.
,
Suleyman
,
M.
, &
Blunsom
,
P.
(
2015
).
Learning to transduce with unbounded memory
. In
C.
Cortes
,
N.
Lawrence
,
D.
Lee
,
M.
Sugiyama
, &
R.
Garnett
(Eds.),
Advances in neural information processing systems
,
28
.
Curran
.
Gritsenko
,
V. I.
,
Rachkovskij
,
D. A.
,
Frolov
,
A. A.
,
Gayler
,
R.
,
Kleyko
,
D.
, &
Osipov
,
E.
(
2017
).
Neural distributed autoassociative memories: A survey
.
Kibernetika i vyčislitel’naâ tehnika
,
2
(
188
),
5
35
.
Groschner
,
L. N.
,
Malis
,
J. G.
,
Zuidinga
,
B.
, &
Borst
,
A.
(
2022
).
A biophysical account of multiplication by a single neuron
.
Nature
,
603
(
7899
),
119
123
.
Gutfreund
,
H.
, &
Mezard
,
M.
(
1988
).
Processing of temporal sequences in neural networks
.
Physical Review Letters
,
61
(
2
),
235
238
.
Hebb
,
D. O.
(
1949
).
The organization of behavior: A neuropsychological theory
.
American Psychological Association
.
Hersche
,
M.
,
Zeqiri
,
M.
,
Benini
,
L.
,
Sebastian
,
A.
, &
Rahimi
,
A.
(
2023
).
A neuro-vector-symbolic architecture for solving Raven’s progressive matrices
.
Nature Machine Intelligence
,
5
(
4
),
363
375
.
Hopfield
,
J. J.
(
1982
).
Neural networks and physical systems with emergent collective computational abilities
.
Proceedings of the National Academy of Sciences
,
79
(
8
),
2554
2558
.
Ielmini
,
D.
, &
Wong
,
H.-S. P.
(
2018
).
In-memory computing with resistive switching devices
.
Nature Electronics
,
1
(
6
),
333
343
.
Indiveri
,
G.
, &
Liu
,
S.-C.
(
2015
).
Memory and information processing in neuromorphic systems
.
Proceedings of the IEEE
,
103
(
8
),
1379
1397
.
Kanerva
,
P.
(
1997
).
Fully distributed representation
.
Proceedings of the 1997 Real World Computing Symposium
(pp.
358
365
).
ACM
.
Kanerva
,
P.
(
2009
).
Hyperdimensional computing: An introduction to computing in distributed representation with high-dimensional random vectors
.
Cognitive Computation
,
1
(
2
),
139
159
.
Khacef
,
L.
,
Klein
,
P.
,
Cartiglia
,
M.
,
Rubino
,
A.
,
Indiveri
,
G.
, &
Chicca
,
E.
(
2022
). Spike-based local synaptic plasticity: A survey of computational models and neuromorphic circuits.
Khona
,
M.
, &
Fiete
,
I. R.
(
2022
).
Attractor and integrator networks in the brain
.
Nature Reviews Neuroscience
,
23
(
12
),
744
766
.
Kleinfeld
,
D.
(
1986
).
Sequential state generation by model neural networks
.
Proceedings of the National Academy of Sciences
,
83
(
24
),
9469
9473
.
Kleyko
,
D.
,
Davies
,
M.
,
Frady
,
E. P.
,
Kanerva
,
P.
,
Kent
,
S. J.
,
Olshausen
,
B. A.
, . . .
Sommer
,
F. T.
(
2021
).
Vector symbolic architectures as a computing framework for nanoscale hardware
. Proceedings of the IEEE, 110(10).
Kleyko
,
D.
,
Rachkovskij
,
D. A.
,
Osipov
,
E.
, &
Rahimi
,
A.
(
2022
).
A survey on hyperdimensional computing aka vector symbolic architectures, Part I: Models and data transformations
.
ACM Computing Surveys
.
Koch
,
C.
(
1998
).
Biophysics of computation: Information processing in single neurons
.
Oxford University Press
.
Krotov
,
D.
, &
Hopfield
,
J.
(
2021
).
Large associative memory problem in neurobiology and machine learning
. In
Proceedings of the International Conference on Learning Representations
.
Lee Giles
,
C.
,
Horne
,
B. G.
, &
Lin
,
T.
(
1995
).
Learning a class of large finite state machines with a recurrent neural network
.
Neural Networks
,
8
(
9
),
1359
1365
.
Li
,
Y.
,
Xiao
,
T. P.
,
Bennett
,
C. H.
,
Isele
,
E.
,
Melianas
,
A.
,
Tao
,
H.
, . . .
Talin
,
A. A.
(
2021
).
In situ parallel training of analog neural network using electrochemical random-access memory
.
Frontiers in Neuroscience
,
15
, 636127.
Liang
,
D.
,
Kreiser
,
R.
,
Nielsen
,
C.
,
Qiao
,
N.
,
Sandamirskaya
,
Y.
, &
Indiveri
,
G.
(
2019
).
Neural state machines for robust learning and control of neuromorphic agents
.
IEEE Journal on Emerging and Selected Topics in Circuits and Systems
,
9
(
4
),
679
689
.
Lin
,
A. C.
,
Bygrave
,
A. M.
,
de Calignon
,
A.
,
Lee
,
T.
, &
Miesenböck
,
G.
(
2014
).
Sparse, decorrelated odor coding in the mushroom body enhances learned odor discrimination
.
Nature Neuroscience
,
17
(
4
),
559
568
.
Little
,
W. A.
(
1974
).
The existence of persistent states in the brain
.
Mathematical Biosciences
,
19
(
1
),
101
120
.
Liu
,
S.-C.
,
Delbruck
,
T.
,
Indiveri
,
G.
,
Whatley
,
A.
, &
Douglas
,
R.
(
2014
).
Event-based neuromorphic systems
.
Wiley
.
Mali
,
A. A.
,
Ororbia II
,
A. G.
, &
Giles
,
C. L.
(
2020
).
A neural state pushdown automata
.
IEEE Transactions on Artificial Intelligence
,
1
(
3
),
193
205
.
Manohar
,
R.
(
2022
).
Hardware/software co-design for neuromorphic systems
. In
Proceedings of the IEEE Custom Integrated Circuits Conference
.
Mante
,
V.
,
Sussillo
,
D.
,
Shenoy
,
K. V.
, &
Newsome
,
W. T.
(
2013
).
Context-dependent computation by recurrent dynamics in prefrontal cortex
.
Nature
,
503
(
7474
),
78
84
.
Miller
,
P.
(
2016
).
Itinerancy between attractor states in neural systems
.
Current Opinion in Neurobiology
,
40
,
14
22
.
Minsky
,
M. L.
(
1967
).
Computation: Finite and infinite machines
.
Prentice Hall
.
Neftci
,
E.
,
Binas
,
J.
,
Rutishauser
,
U.
,
Chicca
,
E.
,
Indiveri
,
G.
, &
Douglas
,
R. J.
(
2013
).
Synthesizing cognition in neuromorphic electronic systems
.
Proceedings of the National Academy of Sciences
,
110
(
37
),
E3468
E3476
.
Nielsen
,
C.
,
Qiao
,
N.
, &
Indiveri
,
G.
(
2017
).
A compact ultra low-power pulse delay and extension circuit for neuromorphic processors
.
2017 IEEE Biomedical Circuits and Systems Conference
(pp.
1
4
).
Noest
,
A. J.
(
1987
).
Phasor neural networks
. In
D.
Anderson
(Ed.),
Neural information processing systems
(pp.
584
591
).
MIT Press
.
O’Connor
,
D. H.
,
Wittenberg
,
G. M.
, &
Wang
,
S. S.-H.
(
2005
).
Graded bidirectional synaptic plasticity is composed of switch-like unitary events
.
Proceedings of the National Academy of Sciences
,
102
(
27
),
9679
9684
.
Olshausen
,
B. A.
, &
Field
,
D. J.
(
2004
).
Sparse coding of sensory inputs
.
Current Opinion in Neurobiology
,
14
(
4
),
481
487
.
Omlin
,
C.
,
Thornber
,
K.
, &
Giles
,
C.
(
1998
).
Fuzzy finite-state automata can be deterministically encoded into recurrent neural networks
.
IEEE Transactions on Fuzzy Systems
,
6
(
1
),
76
89
.
Orchard
,
J.
, &
Jarvis
,
R.
(
2023
).
Hyperdimensional computing with spiking-phasor neurons
. In
Proceedings of the 2023 International Conference on Neuromorphic Systems
(pp.
1
7
).
Osipov
,
E.
,
Kleyko
,
D.
, &
Legalov
,
A.
(
2017
).
Associative synthesis of finite state automata model of a controlled object with hyperdimensional computing
. In
Proceedings of the 43rd Annual Conference of the IEEE Industrial Electronics Society
(pp.
3276
3281
).
Plate
,
T. A.
(
1995
).
Holographic reduced representations
.
IEEE Transactions on Neural Networks
,
6
(
3
),
623
641
.
Plate
,
T. A.
(
2003
).
Holographic reduced representation: Distributed representation for cognitive structures
.
University of Chicago Press
.
Poduval
,
P.
,
Alimohamadi
,
H.
,
Zakeri
,
A.
,
Imani
,
F.
,
Najafi
,
M. H.
,
Givargis
,
T.
, &
Imani
,
M.
(
2022
).
GrapHD: Graph-based hyperdimensional memorization for brain-like cognitive learning
.
Frontiers in Neuroscience
,
16
.
Pollack
,
J. B.
(
1991
).
The induction of dynamical recognizers
.
Machine Learning
,
7
(
2
),
227
252
.
Recanatesi
,
S.
,
Katkov
,
M.
, &
Tsodyks
,
M.
(
2017
).
Memory states and transitions between them in attractor neural networks
.
Neural Computation
,
29
(
10
),
2684
2711
.
Rigotti
,
M.
,
Ben Dayan Rubin
,
D.
,
Wang
,
X.-J.
, &
Fusi
,
S.
(
2010
).
Internal representation of task rules by recurrent dynamics: The importance of the diversity of neural responses
.
Frontiers in Computational Neuroscience
,
4
.
Rolls
,
E.
(
2013
).
The mechanisms for pattern completion and pattern separation in the hippocampus
.
Frontiers in Systems Neuroscience
,
7
, 74.
Rolls
,
E. T.
, &
Treves
,
A.
(
2011
).
The neuronal encoding of information in the brain
.
Progress in Neurobiology
,
95
(
3
),
448
490
.
Rumelhart
,
D. E.
, &
McClelland
,
J. L.
(
1986
).
Parallel distributed processing: Foundations
.
MIT Press
.
Rutishauser
,
U.
, &
Douglas
,
R.
(
2009
).
State-dependent computation using coupled recurrent networks
.
Neural Computation
,
21
,
478
509
.
Schlag
,
I.
,
Smolensky
,
P.
,
Fernandez
,
R.
,
Jojic
,
N.
,
Schmidhuber
,
J.
, &
Gao
,
J.
(
2020
).
Enhancing the transformer with explicit relational encoding for math problem solving.
.
Schneidman
,
E.
,
Berry
,
M. J.
,
Segev
,
R.
, &
Bialek
,
W.
(
2006
).
Weak pairwise correlations imply strongly correlated network states in a neural population
.
Nature
,
440
(
7087
),
1007
1012
.
Sompolinsky
,
H.
(
1987
).
The theory of neural networks: The Hebb rule and beyond
. In
J. L.
van Hemmen
&
I.
Morgenstern
(Eds.),
Heidelberg Colloquium on Glassy Dynamics
(pp.
485
527
).
Springer
.
Sompolinsky
,
H.
, &
Kanter
,
I.
(
1986
).
Temporal association in asymmetric neural networks
.
Physical Review Letters
,
57
(
22
),
2861
2864
.
Steinberg
,
J.
, &
Sompolinsky
,
H.
(
2022
).
Associative memory of structured knowledge
.
Scientific Reports
,
12
(
1
), 21808.
Tajima
,
S.
,
Koida
,
K.
,
Tajima
,
C. I.
,
Suzuki
,
H.
,
Aihara
,
K.
, &
Komatsu
,
H.
(
2017
).
Task-dependent recurrent dynamics in visual cortex
.
eLife
,
6
, e26868.
Teeters
,
J. L.
,
Kleyko
,
D.
,
Kanerva
,
P.
, &
Olshausen
,
B. A.
(
2023
).
On separating long- and short-term memories in hyperdimensional computing
.
Frontiers in Neuroscience
,
16
.
Tsodyks
,
M. V.
, &
Feigel’man
,
M. V.
(
1988
).
The enhanced storage capacity in neural networks with low activity level
.
Europhysics Letters
,
6
(
2
),
101
105
.
Verleysen
,
M.
, &
Jespers
,
P.
(
1989
).
An analog VLSI implementation of Hopfield’s neural network
.
IEEE Micro
,
9
(
6
),
46
55
.
Xia
,
Q.
, &
Yang
,
J. J.
(
2019
).
Memristive crossbar arrays for brain-inspired computing
.
Nature Materials
,
18
(
4
),
309
323
.
Yerxa
,
T.
,
Anderson
,
A.
, &
Weiss
,
E.
(
2018
).
The hyperdimensional stack machine
.
Cognitive Computing
,
1
2
.
Zeng
,
Z.
,
Goodman
,
R. M.
, &
Smyth
,
P.
(
1993
).
Learning finite state machines with self-clustering recurrent networks
.
Neural Computation
,
5
(
6
),
976
990
.
Zenke
,
F.
, &
Neftci
,
E. O.
(
2021
).
Brain-inspired learning on neuromorphic substrates
.
Proceedings of the IEEE
,
109
(
5
),
935
950
.
Zidan
,
M. A.
, &
Lu
,
W. D.
(
2020
).
Vector multiplications using memristive devices and applications thereof
. In
S.
Spiga
,
A.
Sebastian
,
D.
Querlioz
, &
B.
Rajendran
(Eds.),
Memristive devices for brain-inspired computing
(pp.
221
254
).
Elsevier
.
This is an open-access article distributed under the terms of the Creative Commons Attribution 4.0 International License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. For a full description of the license, please visit https://creativecommons.org/licenses/by/4.0/legalcode