## Abstract

Recurrent neural networks (RNNs) have been widely used to model sequential neural dynamics (“neural sequences”) of cortical circuits in cognitive and motor tasks. Efforts to incorporate biological constraints and Dale's principle will help elucidate the neural representations and mechanisms of underlying circuits. We trained an excitatory-inhibitory RNN to learn neural sequences in a supervised manner and studied the representations and dynamic attractors of the trained network. The trained RNN was robust to trigger the sequence in response to various input signals and interpolated a time-warped input for sequence representation. Interestingly, a learned sequence can repeat periodically when the RNN evolved beyond the duration of a single sequence. The eigenspectrum of the learned recurrent connectivity matrix with growing or damping modes, together with the RNN's nonlinearity, were adequate to generate a limit cycle attractor. We further examined the stability of dynamic attractors while training the RNN to learn two sequences. Together, our results provide a general framework for understanding neural sequence representation in the excitatory-inhibitory RNN.

## 1 Introduction

Sequentially activated neuronal activities (“neural sequences”) are universal neural dynamics that have been widely observed in neural assemblies of cortical and subcortical circuits (Fujisawa, Amarasingham, Harrison, & Buzsaki, 2008; Long et al., 2010; Harvey, Coen, & Tank, 2012; Buzsaki & Tingley, 2018; Adler et al., 2019; Hemberger, Shein-Idelson, Pammer, & Laurent, 2019). Sequence-based neural dynamics have been proposed as a common framework for circuit function during spatial navigation, motor learning, memory, and decision-making tasks (Sussillo, 2014). Neural sequential activity can be driven by sensory or motor input, such as in the rodent hippocampus, rodent primary motor cortex, and premotor nucleus HVC of song birds; neural sequences can be driven by intrinsic dynamics during the task period in the absence of sensory or motor input. Such sequential dynamics could potentially be implemented using feedforward or recurrent architectures (Goldman, 2009; Cannon, Kopell, Gardner, & Markowitz, 2015).

Biological networks are strongly recurrent. Therefore, nonlinear recurrent neural networks (RNNs) have been developed for modeling a wide range of neural circuits in various cognitive and motor tasks (Mante, Sussillo, Shenoy, & Newsome, 2013; Sussilo, Churchland, Kaufman, & Shenoy, 2015; Rajan, Harvey, & Tank, 2016; Barak, 2017; Goudar & Buonomano, 2018; Yang, Joglekar, Song, Newsome, & Wang, 2019; Kao, 2019; Mackwood, Naumann, & Sprekeler, 2021; Bi & Zhou, 2020; Zhang, Liu, & Chen, 2021). However, biological constraints were only recently considered in RNN models (Song, Yang, & Wang, 2016; Ingrosso & Abbott, 2019; Xue, Halassa, & Chen, 2021), whereas in other work (Murphy & Miller, 2009), biological constraints were considered only in the linear recurrent system. In general, RNNs have been treated as black boxes, and their mechanisms and high-dimensional computations are difficult to analyze (Sussillo & Barak, 2013; Ceni, Ashwin, & Livi, 2019).

In this article, we used an excitatory-inhibitory nonlinear RNN to model neural sequences. Extending previous modeling efforts (Rajan et al., 2016; Hardy & Buonomano, 2018; Orhan & Ma, 2019), we explicitly incorporated Dale's principle and excitatory-inhibitory (E/I) balance into the RNN and studied its stimulus-driven and spontaneous dynamics. Neural sequences, as a special form of dynamical attractors, can be viewed as an alternative to fixed points for storing memories (Rajan et al., 2016). Generalized Hopfield-type neural networks can store patterns in limit cycles (Deshpande & Dasgupta, 1991). However, to our best knowledge, limit cycle dynamics have not been well studied in the context of excitatory-inhibitory RNN.

Neural dynamics during online task behavior is referred to as a transient state, whereas the neural dynamics during the offline state (in the absence of stimulus input) is often described as the steady or stationary state, which can also play important roles in brain functions (such as memory replay). Based on extensive computer simulations for learning one or two neural sequences, we discovered and studied the stability of limit cycles encoded by the excitatory-inhibitory RNN while evolving in the stimulus-free spontaneous state (“steady state”). Our combined theoretical and computational analyses provide a framework to study sequence representation in neural circuits in terms of excitatory-inhibitory balance, network connectivity, and synaptic plasticity.

## 2 Excitatory-Inhibitory RNN

Biological neuronal networks consist of excitatory and inhibitory neurons. Local cortical circuits have varying degrees of connectivity and sparsity depending on the nature of the cell types: excitatory-to-excitatory (EE), inhibitory-to-excitatory (IE), excitatory-to-inhibitory (EI), and inhibitory-to-inhibitory (II) connections (Murphy & Miller, 2009). We imposed the following biological constraints on our excitatory-inhibitory RNN:

- •
The ratio of the number of excitatory to inhibitory neurons is around 4:1 according to Dale's principle: $NexcNinh=41$, where $N=Nexc+Ninh$.

- •
The dimensionality of the input signal is 1, $Nin=1$; for ease of interpretation, we assumed that $Win={wiin}N\xd71$ has only positive elements.

- •
$Wrec={wijrec}N\xd7N$ has EE, EI, IE, and II connections as follows: $EEIEEIII$. Both EE and EI connections were from presynaptic excitatory neurons, and therefore their weights were positive, whereas IE and II connections were from presynaptic inhibitory neurons and had negative weights. No self-connection was assumed for $Wrec$; namely, all diagonal elements ${wiirec}$ are zeros.

- •
The output sequences are direct (scaled) readout of $Nout$ neurons from the excitatory neuron population. Therefore, $Wout={w\u2113iout}Nout\xd7N$ is a partial identity matrix. Here, $Nout=8$, and we constrained $Wout$ such that its upper block contained an $Nout\xd7Nout$ diagonal matrix, namely, $w\u2113iout=0,\u2200\u2113\u2260i$.

### 2.1 Computer Simulation Setup

For a relatively simple computational task, we assumed that the input was a one-dimensional signal (i.e., $Nin=1$) and set $N=100$. According to Dale's principle, we assumed that the EE, EI, IE, and II connections were represented as $80\xd780$, $20\xd780$, $80\xd720$, and $20\xd720$ submatrices, respectively.

We have considered different forms of input signals $u(t)$ lasting 900 ms during the 1 s task period (see Figures 1b and 1c), which could be either sinusoidal modulated on an exponential carrier (single sequence case) or sawtooth functions on a linear increasing/decreasing carrier (two sequences case). The continuous-time model was numerically implemented in discrete time by Euler's method. For each input signal, we set $dt$ = 10 ms and $Ntime=100$, resulting in a simulation trial duration of $\u223c1$ s. We have also tried smaller $dt$ (e.g., 2 ms and 5 ms) and obtained similar results. However, the simulation sample size became much larger and significantly increased the computational cost. Therefore, our default setup was $dt=10$ ms.

The output has the same duration as the input signal. The desired readout of the RNN was a neural sequence (see Figure 1d); each neuronal activation was sparse in time and had 5% overlap between its neighbor activation. In the simulations here, we used the first eight excitatory neurons' firing rates as the readout for the sequence output (i.e., $Nout=8$). The number of trials was $Ntrials=20$, and the input signals were generated with added independent additive gaussian noise (zero mean and variance 0.01).

### 2.2 RNN Training with Gradient Descent

We initialized the RNN weights as follows. The input weights $Win$ were drawn from a uniform distribution $[-0.1,0.1]$; the recurrent weights $Wrec$ was first initialized with a gamma distribution $gamma(2,0.0495)$ (mean 0.099, variance 0.0049) while ensuring all the diagonal values were zeros. Furthermore, we rescaled the recurrent weight matrix to obtain a spectral radius around 0.99 and ensured the sum of excitatory elements the same as the sum of the inhibitory elements to keep the E/I balance. Finally, the output weights $Wout$ had a diagonal structure, with diagonal values drawn from a uniform distribution $[-0.1,0.1]$ while keeping the nontarget neuron connections as zeros. We did not impose additional constraints on the connection weights.

In addition, we imposed an $L2$-norm regularization on the firing rates and the sum of $L1$-norms of connection weights onto $E$. We used the stochastic gradient descent (SGD) algorithm with default learning rate parameter. The gradient was computed effectively by backpropagation through time (BPTT). The vanishing gradient problem was alleviated using the regularization techniques discussed in Song et al. (2016). The RNN implementation was adapted based on the Python package (https://github.com/frsong/pycog), with modifications for task design, network architecture, and optimization. To avoid overfitting, in addition to the training trials, we also monitored the cost function of an independent validation data set. We concluded the trained network achieved convergence when the mean-squared error between the network output and the target out was sufficiently small or their correlation coefficient was sufficiently high (e.g., $R2>0.95$). The standard choice of hyperparameters is shown in Table 1. However, the results reported here are not sensitive to the hyperparameter configuration.

Parameter . | Setup . |
---|---|

$L1$ weight regularization for weights | 2 |

driven noise variance | $(0.01)2$ |

input noise variance $\sigma in2$ | $(0.01)2$ |

gradient mini-batch size | 20 |

validation mini-batch size | 1000 |

unit time constant $\tau $ | 50 ms |

learning rate | 0.01 |

max gradient norm | 1 |

Parameter . | Setup . |
---|---|

$L1$ weight regularization for weights | 2 |

driven noise variance | $(0.01)2$ |

input noise variance $\sigma in2$ | $(0.01)2$ |

gradient mini-batch size | 20 |

validation mini-batch size | 1000 |

unit time constant $\tau $ | 50 ms |

learning rate | 0.01 |

max gradient norm | 1 |

### 2.3 Eigenspectrum Analysis

For a linear dynamical system (e.g., equation 2.8), the real part of the eigenvalue is responsible for the decay or growth of oscillation. Therefore, the dynamical system becomes unstable in the presence of even a single eigenvalue with a positive real component. However, for an RNN with ReLU activation, the analysis of dynamics is more complex (see appendixes A and B).

The stability of the dynamic system described by the excitatory-inhibitory RNN is strongly influenced by the recurrent connection matrix $Wrec$ (Brunel, 2000; Rajan & Abbott, 2006). Since $Wrec$ is generally asymmetric, and each column of the matrix has either all positive or all negative elements, it will produce a set of complex conjugate pairs of eigenvalues (possibly including purely real eigenvalues) from eigenvalue decomposition $Wrec=U\Lambda U-1$, where the column vectors of matrix $U$ are the eigenvectors and the diagonal elements of $\Lambda $ denote the eigenvalues. Generally, $Wrec$ is nonnormal (unless all submatrices are symmetric and EI and IE connections are identical); as a result, its eigenvectors are *not* orthogonal (Goldman, 2009; Murphy & Miller, 2009).

An alternative way to examine the network connectivity matrix $Wrec$ is through the Schur decomposition, $Wrec=QTQ-1$, where $Q$ is a unitary matrix whose columns contain the orthogonal Schur mode and $T$ is an upper triangular matrix that contains the eigenvalues along the diagonal. The triangular structure of $T$ can be interpreted as transforming an RNN into a feedforward neural network, and the recurrent matrix $Wrec$ corresponds to a rotated version of the effective feedforward matrix $T$, which defines self-connections and functionally feedforward connections (FFC) of the neural network (Goldman, 2009). For a normal matrix, the Schur basis is equivalent to the eigenvector basis. However, unlike eigenvalue decomposition, the Schur decomposition produces the simplest (yet nonunique) orthonormal basis for a nonnormal matrix.

## 3 Results

### 3.1 Sequences Are Generated by Superposition of Driven Harmonic Oscillators

To avoid local minima, we trained multiple excitatory-inhibitory RNNs with different random seeds, hyperparameters, and initial conditions in parallel. In each condition, we trained at least five networks and selected the suboptimal solution with the smallest cost function. Depending on the learning rate parameter and the input-output setup, the excitatory-inhibitory RNN converged relatively slowly (typically $>105$ epochs) to obtain a good solution.

Upon training the excitatory-inhibitory RNN, we examined the learned representation in the RNN dynamics. By examining the temporal trajectories of $x$ and $\varphi (x)$ that were associated with the readout neurons, we found that the RNN behaved approximately as a collection of driven (growing or damping) harmonic oscillators with time-varying frequencies (see Figure 2c). In the trained RNN, some readout neurons had large negative amplitudes in $xi(t)$ because of the lack of constraint—this could be seen in the net excitatory and net inhibitory input to each target neuron (see equation 2.9). By examining the eigenspectrum plane of $(Wrec-I)/\tau $ (see Figure 2b), the maximum imaginary frequency was around 0.125 radian/s (equivalently 0.125/$2\pi \u2248$ 0.02 Hz), which produced one oscillator at the same timescale of time constant $\tau =50$ ms. Each generalized harmonic oscillator was associated with an eigenmode that has nonzero imaginary parts regardless of whether the real part is positive, negative, or zero. The maximum peaks of the harmonic oscillators were temporally aligned to form a neural sequence. The peaks and troughs of the generalized harmonic oscillators were controlled by the time-varying excitation and inhibition levels in the RNN, and the spacing between the neighboring peaks determined the span of sequential activation.

In the case of RNN dynamics, the neural state will become unstable if all components of $xi(t)$ are positive simultaneously (i.e., all neurons fire together). Since this phenomenon rarely happens, the RNN could reach stable dynamics when a few of the $xi(t)$ components were positive (i.e., neurons fire sparsely). In general, it is difficult to derive mathematical solutions to high-dimensional time-varying nonlinear dynamical systems (see equation 2.1). In the case of RNN with threshold-linear (ReLU) units, the system is piecewise linear, and we can derive the analytic solutions of two and three-dimensional systems (see appendixes A and B, respectively). In these low-dimensional systems, the phase space contains regions where neuronal activation is either linear or zero. Importantly, the system is time-varying, and the stability of RNN dynamics may change from moment to moment in the phase space. However, it may be possible to construct dynamic trajectories, such as fixed points or periodic orbits in higher-dimensional systems by patching the eigenmodes across the “switch boundaries” (although we did not implement the patching procedure, only fixed points but not periodic orbits were found in our simulations of the two- and three-dimensional systems; see appendixes A and B).

In light of principal component analysis (PCA), we found that the 100-dimensional neural state trajectory $x(t)$ was usually embedded in a lower-dimensional space (see Figure 2d), with five to six components explaining more than 90% variance. Each principal component (PC) behaved like an oscillator that was intrinsically related to the presence of complex eigenvalues in $Wrec$. To further examine the oscillator structure in the simulated neural dynamics, we also applied the polar decomposition (see appendix C) to high-dimensional $x(t)$ (similar to the rotation jPCA method as shown in Churchland et al., 2012) and visualized the two-dimensional projection in the rotation plane (see Figure 2e). Interestingly, we also found rotation dynamics on the single simulation trial basis while projecting the data onto the eigenvectors associated with jPCA.

In general, it is difficult to infer any structure from the eigenspectrum of $Wrec$ alone. We further applied the Schur decomposition to $Wrec$ and computed the effective feedforward matrix $T$, which is an upper diagonal matrix (see section 2.3). To examine the relative feedforward connection strengths between neurons, we plotted $Tij$ with respect to the neuron index difference $(j-i)$. Interestingly, we found relative large strengths between the first eight Schur modes (see Figure 2f). Furthermore, on average, each mode excited its nearby modes but inhibited modes that were farther away. Because of Dale's principle, $Wrec$ is nonnormal; the nonnormal dynamics in RNNs may offer some benefits such as extensive information propagation and expressive transients (Ganguli, Hug, & Sompolinsky, 2008; Kerg et al., 2019; Orhan & Pitkow, 2020).

### 3.2 Robustness to Tested Input and Time Warping

We further tested the generalization or invariance of RNNs in sequence representation with respect to temporal scaling of the inputs. Specifically, we trained the RNN in a task of mapping five different inputs to five different scaled versions of output sequences. Each sequence was represented by sequential activation of five excitatory neurons. The five input signals were temporally scaled versions of each other. Similarly, the five output sequences were also temporally scaled versions of each other (see Figure 3b). We then tested whether the trained RNN can learn to interpolate or extrapolate in response to the unseen scaled version of input signals. In each test condition, we created five random realizations and computed the error bar for each output sequence duration. The results are shown in Figures 3b and 3c. Specifically, the RNN generalized very well by an approximately linear interpolation but extrapolated poorly in the case of short-scaled input (e.g., 0.1 and 0.2). Our finding here was also consistent with the recent results of RNN without biological constraints, in which temporal scaling of complex motor sequences was learned based on pulse inputs (Hardy, Goudar, Romero-Sosa, & Buonomano, 2018) or on time-varying sensory inputs (Goudar & Buonomano, 2018).

### 3.3 Limit Cycle in the Steady State

Notably, the limit cycle was embedded not only in the eight sequence readout neurons (i.e., $z1(t)$ through $z8(t)$), but also in the remaining 92 neurons. To visualize the low-dimensional representations, we applied PCA to 100-dimensional $x(t)$ during the training trial period ([0,1] s) and then projected the steady-state activations onto the dominant four-dimensional PC subspaces. Again, we observed stable limit cycles in the PC subspace.

Since the limit cycle was embedded in the high-dimensional neural state dynamics, it is difficult to derive theoretical results based on any existing mathematical tools (e.g., the Hopf bifurcation theorem). The dynamic attractors in the steady state are fully characterized by equation 2.1 and the $N\xd7N$ matrix $Wrec$. To date, limited analysis of periodic behavior was obtained for the RNN, except for two-dimensional cases (Jouffroy, 2007; Bay, Lepsoy, & Magli, 2016). To understand the conditions of limit cycles in the excitatory-inhibitory RNN, we decomposed $Wrec$ into a symmetric and a skew-symmetric component and made further low-dimensional approximation (see appendix D). Specifically, the symmetric matrix produces a spectrum with only real eigenvalues, whereas the skew-symmetric matrix produces a spectrum with only imaginary eigenvalues. Furthermore, we projected the $N$-dimensional $x(t)$ onto a two-dimensional eigenvector space. Under some approximations and assumptions, we could numerically simulate limit cycles based on the learned weight connection matrix $Wrec$ (Susman, Brenner, & Barak, 2019).

Limit cycle attractors were frequently observed in our various testing conditions. During training, we varied the sequence duration, the input waveform, and the number of neurons engaging in the output sequence; we frequently found a limit cycle attractor in the steady state. By observing equation 2.9, it is noted that the relative E/I balance is required for the reactivation of sequence output: too much inhibition would suppress the neuronal firing because of below-threshold activity. For each $xi(t)$, if the level of net inhibition was greater than the level of net excitation, the $i$th neuron would not be able to fire even though the memory trace was preserved in the high-dimensional attractor.

### 3.4 Impact of $Wrec$ on Dynamic Attractors

Next, we imposed a sparsity constraint onto the learned $Wrec$. We randomly selected a small percentage of connection weights and set them zeros and examined the impact on the dynamic attractor. Since $WEErec$ was most sensitive to the dynamics, we gradually increased the percentage from 5%, to 10%, 15%, and 20%, and set randomly chosen entries of $WEErec$ to zeros. As a result, the neural sequential activation was reduced during both in-task and steady-state periods (see Figure 5d). Notably, the limit cycle was still present even when 5% to 15% of $WEErec$ elements were zeros, suggesting that the dynamic attractor was robust to sparse coding. Notably, the amplitude of $x(t)$ gradually decreased with increasing sparsity level. When 20% of $WEErec$ elements were zeros, the limit cycle converged to a fixed point.

### 3.5 Extension to Learning Two Sequences

In both setups 1 and 2, the output of readout neurons (units #1–8) were orthogonal to each other for representing two sequences. Next, we considered a different setup (setup 3), in which we set $Nout=16$, where units #1–8 represented the output of sequence 1, and units #9–16 represented the output of sequence 2 (see Figure 8e). Therefore, readout neurons have orthogonal representations for these two sequences. Upon learning these two sequences, we again examined the phase portraits of $x(t)$ during the steady state. Interestingly, in this new setup, the RNN dynamics also exhibited a stable limit cycle attractor (see Figure 8f). This stable limit cycle attractor simultaneously represented the partial forward- and reverse-ordered sequences, with opposite rotation directions in the phase portrait (see Figures 8f and 8g). Therefore, the two orthogonal output sequences learned independently can be coactivated in the steady state, and any two-dimension combination from ${x1,\u2026,x16}$ would also form a periodic oscillator in the two-dimensional phase space. However, the coactivation of forward and reverse sequences still appeared as a single limit cycle (with a shorter period compared to Figures 8a and 8b).

## 4 Discussion

### 4.1 Relation to Other Work

RNNs are capable of generating a rich class of dynamical systems and attractors (Trischler & D'Eleuterio, 2016; Pollock & Jazayeri, 2020). Dynamics of excitatory-inhibitory RNN has been studied in the literature (Brunel, 2000; Murphy & Miller, 2009). Murphy and Miller (2009) used mean field theory to study the linear recurrent dynamics of excitatory-inhibitory neuronal populations and studied the steady-state response with balanced amplification mechanisms. However, since their neural network is linear, no limit cycle attractor can arise from the assumed linear dynamical system. To model neural sequences, Gillett, Pereira, and Brunel (2020) used mean field theory to derive a low-dimensional description of a sparsely connected linear excitatory-inhibitory rate (and spiking) network. They found that a nonlinear temporally asymmetric Hebbian learning rule can produce sparse sequences.

Most RNN modeling work for brain functions has focused on representations and dynamics during the task period (or transient state), yet has ignored the stimulus-absent steady state. It remains unclear how the steady-state dynamics are related to the task-relevant dynamics and how they further contribute to the task. In the literature, the analysis of limit cycles for nonlinear RNN dynamics has been limited to two-dimensional systems (Jourffroy, 2007; Bay et al., 2016; Trischler & D'Eleuterio, 2016). To our best knowledge, no result was available for the existence of excitatory-inhibitory RNN in a high-dimensional setting. Our computer simulation results have shown the emergence of limit cycles from the trained excitatory-inhibitory RNN while learning the sequence tasks. However, no theoretical results have been derived regarding the sufficient or necessary condition. Our computer simulations were built on supervised learning and gradient descent for learning neural sequences, which is close in spirit to prior work (Namikawa & Tani, 2009) that they used an Elman-type RNN structure to learn multiple periodic attractors and the trained RNN could embed many attractors given random initial conditions.

It is possible to show the existence of periodic orbits or limit cycles in a content-addressable memory (CAM) network with an asymmetric connection weight matrix, where the neural firing patterns are binary (Folli, Gosti, Leonetti, & Ruocco, 2018; Susman et al., 2019). However, the CAM networks in previous studies did not consider temporal dynamics.

Upon learning neural sequences, we found that the excitatory-inhibitory RNN exhibited a rotation dynamics in the population response (by jPCA). This rotation dynamics can be robustly recovered in the single-trial data. Our finding is consistent with a recent independent finding while reexamining the neuronal ensemble activity of the monkey's motor cortex (Lebedev et al., 2019). In our case, the rotation dynamics analysis was applied to the complete population, whereas Lebedev's study was applied only to the sequence readout neurons. Notably, it was shown that rotational patterns can be easily obtained if there are temporal sequences in the population activity (Lebedev et al., 2019).

### 4.2 Dynamic Attractors and Memories

Synaptic efficacy is fundamental to memory storage and retrieval. The classical RNN used for associative memory is the Hopfield network, which can store the patterns via a set of fixed points. The weight connection matrix in the Hopfield network is symmetric, thereby yielding only real positive eigenvalues. However, memories stored as time-varying attractors of neural dynamics are more resilient to noise than fixed points (Susman et al., 2019). The dynamic attractor is appealing for storing memory information as it is tolerant to interference; therefore, the limit cycle mechanism can be useful to preserve time-varying persistent neuronal activity during context-dependent working memory (Mante, Sussillo, Shenoy, & Newsome, 2013; Schmitt et al., 2017).

In the excitatory-inhibitory RNN, the stability of a dynamic system is defined by the recurrent connectivity matrix, which can further affect Intrinsically generated fluctuating activity (Mastrogiuseppe & Ostojic, 2017). According to dynamical systems theory, stability about a set-point is related to the spectrum of the Jacobian matrix (i.e., local linear approximation of the nonlinear dynamics) (Sussillo & Barak, 2013). For a general Jacobian matrix, the spectrum consists of a set of complex eigenvalues. The real part of this spectrum defines the system's stability: a system is stable only if all eigenvalues have negative real parts, whereas the imaginary part of the spectrum determines the timescales of small-amplitude dynamics around the set-point, but not stability (Susman et al., 2019).

In the special case, the unconstrained RNN (i.e., without imposing Dale's principle) can be approximated and operated as a line attractor (Seung, 1996), although the approximation of line attractor dynamics by a nonlinear network may be qualitatively different from the approximation by a linear network. Additionally, strong nonlinearity can be used to make memory networks robust to perturbations of state or dynamics. Linearization around some candidate points that minimize an auxiliary scalar function $|G(x)|2$ (kinematic energy) may help reveal the fixed and slow points of high-dimensional RNN dynamics (Sussillo & Barak, 2013; Kao, 2019; Zhang et al., 2021). However, efficient numerical methods to analyze limit cycle attractors of RNNs remain unexplored.

It has been found that our trained excitatory-inhibitory RNN can trigger the neural sequence based on a wide range of input signals, representing one possible mechanism of memory retrieval. This is analogous to remembering a sequence of a phone number triggered by a name, where the form of name presentation may vary. The sequence readout is also robust to noise and system perturbation. During the steady state, the steady-state RNN dynamics displayed either forward or reverse sequences, reminding us of rodent hippocampal memory replay during the offline state (Foster & Wilson, 2006; Diba & Buzsaki, 2007).

Multistability is an important property of RNNs (Cheng, Lin, & Shih, 2006; Ceni, Ashwin, Livi, & Oostlethwaite, 2020). For a general class of $N$-dimensional RNN with ($k$-stair) piecewise linear activation functions, RNNs can have $(4k-1)N$ equilibrium points, and $(2k)N$ of them are locally exponentially stable (Zeng, Huang, & Zheng, 2010). Recently, it has been shown that a two-dimensional gated recurrent units (GRU) network can exhibit a rich repertoire of dynamical features that includes stable limit cycles, multistable dynamics with various topologies, Andronov-Hopf bifurcation, and homoclinic orbits (Jordan, Sokol, & Park, 2019). However, their RNN allows self-connections and does not impose excitatory-inhibitory constraints. While training the excitatory-inhibitory $N$-dimensional RNN to learn two sequences, we found that memory retrieval is dominated by a single memory pattern within one stable limit cycle attractor (see Figures 8a and 8b). However, reactivation of two sequence patterns can be obtained by using orthogonal neuronal representations (see Figures 8e to 8g). The topics of coexistence of bi- or multistability and the mechanism of bistable switching between two limit cycles or between a limit cycle and fixed points would require further theoretical and simulation studies.

### 4.3 Hebbian Plasticity versus Backpropagation

Supervised learning algorithms have been widely used to train RNNs (Werbos, 1988; Jaeger, 2001; Maass, Natschläger, & Markram, 2002; Sussilo & Abbott, 2009; Song et al., 2016). We have used the established BPTT algorithm to train the excitatory-inhibitory RNN, where synaptic modification was applied to all synaptic connections based on gradient descent. In addition, it is possible to introduce a partial synaptic weight training mechanism on the recurrent weight matrix (Rajan et al., 2016; Song et al., 2016).

### 4.4 Limitation of Our Work

## 5 Conclusion

In summary, we trained an excitatory-inhibitory RNN to learn neural sequences and study its stimulus-driven and spontaneous network dynamics. Dale's principle imposes a constraint on the nonnormal recurrent weight matrix and provides a new perspective to analyze the impact of excitation and inhibition on RNN dynamics. Upon learning the output sequences, we found that dynamic attractors emerged in the trained RNN, and the limit cycle attractors in the steady state showed varying degrees of resilience to noise, interference, and recurrent connectivity. The eigenspectrum of the learned recurrent connectivity matrix with growing or damping modes Combined together with the RNN's nonlinearity, are adequate to generate a limit cycle attractor. By theoretical analyses of the approximate linear system equation, we also found conceptual links between the recurrent weight matrix and neural dynamics. Despite the task simplicity, our computer simulations provide a framework to study stimulus-driven and spontaneous dynamics of biological neural networks and further motivate future theoretical work.

## Appendix A: Solving Equation for the Two-Dimensional Dynamical System

For each of four quadrants of $[x1,x2]$, the eigenvalues can be calculated as follows:

- •When $x1>0,x2>0$, we have the determinantThis further implies$\zeta -w11rec+1-w12rec-w21rec\zeta -w22rec+1=0.$(A.16)because of $w11rec=w22rec=0$. Solving equation A.17 yields two complex roots:$\zeta 2+(2-w11rec-w22rec)\zeta +(1+w11recw22rec-w12recw21rec-w11rec-w22rec)=0and\zeta 2+2\zeta +(1-w12recw21rec)=0$(A.17)where $w12rec$ has the opposite signs from $w21rec$ so that $w12recw21rec$ is purely imaginary. Let $\omega =-w12recw21rec$ and $i=-1$, so we have $\zeta 1,2++=-1\xb1i\omega $.$\zeta 1,2++=-1\xb1w12recw21rec,$(A.18)
- •For all three other cases, that is, when $x1>0,x2\u22640$, or when $x1\u22640,x2>0$, or when $x1\u22640,x2\u22640$ in light of equation A.17, we haveas either $w11rec=w21rec=0$ and/or $w12rec=w22rec=0$. We further derive the solutions as$\zeta 2+2\zeta +1=0,$(A.19)$\zeta 1,2+-=\zeta 1,2-+=\zeta 1,2--=-1.$

Since ${x1(t),x2(t)}$ are potentially changing the quadrant location, they will influence the percentage of neuron activation (either 100%, 50%, or 0%) in a time-varying manner. Therefore, the superposition of the eigenmodes will change in both stimulus-driven and spontaneous network dynamics.

## Appendix B: Solving Equation for the Three-Dimensional Dynamical System

Depending on the values of ${x1,x2,x3}$, we can compute the eigenvalues as follows:

- •When $x1>0,x2>0,x3>0$, we have the following $3\xd73$ determinant matrix:From $w11=w22=w33=0$, we further derive$\zeta -w11+1-w12-w13-w21\zeta -w22+1-w23-w31-w32\zeta -w33+1=0.$(B.16)The analytic solution of cubic equation B.17 is available, but this equation does not have $\zeta =-1$ as one of its roots, unlike the remaining cases discussed. The eigenvalues for other cases shown in equations B.2 to B.9 are discussed in the sequel.$\zeta 3+3\zeta 2+\zeta (3-w12w21-w13w31-w23w32)+(1-w12w23w31-w13w21w32-w23w32-w12w21-w13w31)=0.$(B.17)
- •When $x1>0,x2>0,x3\u22640$, from equation B.17, we havebecause of $w13=w23=0$, which is assumed in this case. It is noted that $\zeta =-1$ is one of the roots satisfying equation B.18. Rewriting equation B.18 in a factorized form, $(\zeta +1)(\zeta 2+2\zeta +(1-w12w21))=0$, we obtain the solutions$\zeta 3+3\zeta 2+\zeta (3-w12w21)+(1-w12w21)=0$(B.18)$\zeta =-1or-1\xb1w12w21.$(B.19)
- •
- •
- •For all other remaining cases, namely, when $x1>0,x2\u22640,x3\u22640$, or when $x1\u22640,x2>0,x3\u22640$, or when $x1\u22640,x2\u22640,x3>0$, or when $x1\u22640,x2\u22640,x3\u22640$, we have the solutions as$\zeta =-1.$(B.22)

Therefore, the solutions can contain one real and a pair of conjugate complex values.

## Appendix C: jPCA via Polar Decomposition

where the linear transformation matrix $M$ is constrained to be skew-symmetric (i.e., $M\u22a4=-M$). The jPCA algorithm projects high-dimensional data $x(t)$ onto the eigenvectors of the $M$ matrix, and these eigenvectors arise in complex conjugate pairs. Given a pair of eigenvectors ${vk,v\xafk}$, the $k$th jPCA projection plane axes are defined as $uk,1=vk+v\xafk$ and $uk,2=j(vk-v\xafk)$ (where $j=-1$).

From polar decomposition, if $\lambda k$ and $vk$ correspond to the eigenvalue and eigenvector of matrix $M$, respectively, then $1+\lambda k$ and $vk$ will correspond to the approximate eigenvalue and eigenvector of matrix $Q$. That is, the jPCA eigenvectors span the same space as the eigenvectors of the polar decomposition factor $Q$.

## Appendix D: Low-Dimensional Approximation of Limit Cycle Attractor

## Acknowledgments

We thank Vishwa Goudar for valuable feedback. This work was partially supported by NIH grants R01-MH118928 (Z.S.C.) and R01-NS100065 (Z.S.C.) and an NSF grant CBET-1835000 (Z.S.C).

## References

*Neuron*

*Proceedings of IEEE International Conference on Communications*

*Curr. Opin. Neurobiol.*

*Proc. Natl. Acad. Sci. USA*

*J. Physiol. Paris*

*Nat. Rev. Neurosci.*

*Trends in Cognitive Sciences*

*PLOS Comput. Biol.*

*Interpreting recurrent neural networks behavior via excitable network attractors.*

*Physica D: Nonlinear Phenomena*

*SIAM Journal on Applied Mathematics*

*Nature*

*J. Phys. A Math. Gen.*

*Nat. Neurosci.*

*Advances in neural information processing systems*

*Neuron*

*Neural Networks*

*Nature*

*Nature Neuroscience*

*Proc. Natl. Acad. Sci. USA*

*Proc. Natl. Acad. Sci. USA*

*Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics*

*Neuron*

*eLife*

*Neural Computation*

*Nature Communications*

*Nature*

*Neuron*

*SIAM J. Sci. Stat. Comput.*

*PLOS One*

*The “echo state” approach to analyzing and training recurrent neural networks*

*Gated recurrent units viewed through the lens of continuous time dynamical systems*

*Proceedings of Sixth International Conference on Machine Learning and Applications*

*J. Neurophysiol*

*Advances in neural information processing systems*

*Proc. Natl. Acad. Sci. USA*

*Journal of Neuroscience*

*Sci. Rep.*

*Nat. Rev. Neurosci.*

*Journal of Neuroscience*

*Nature*

*Neural Computation*

*eLife*,

*10*, e59715.

*Nature*

*PLOS Comput. Biol.*

*Neuron*

*Advances in Artificial Neural Systems*

*A probabilistic modeling approach for uncovering neural population rotational dynamics*

*Nat. Neurosci.*

*Proc. Int. Conf. Learning Representations*

*Frontiers in Neursoscience*

*PLOS Computational Biology*

*Physical Review Letters*

*Neuron*

*Nonlinear dynamics*

*Nature*

*Proc. Natl. Acad. Sci. USA*

*PLOS Comput. Biol.*

*Nature Communications*

*Curr. Opin. Neurobiol.*

*Neuron*

*Neural Computation*

*Nature Neuroscience*

*Neural Networks*

*Neural Networks*

*Spiking recurrent neural networks represent task-relevant neural sequences in rule-dependent computation.*

*Nature Neuroscience*

*IEEE Trans. Neural Networks*

*Proceedings of International Conference on Machine Learning*

*A geometric framework for understanding dynamic information integration in context-dependent computation.*