## Abstract

A large body of work has suggested that neural populations exhibit low-dimensional dynamics during behavior. However, there are a variety of different approaches for modeling low-dimensional neural population activity. One approach involves latent linear dynamical system (LDS) models, in which population activity is described by a projection of low-dimensional latent variables with linear dynamics. A second approach involves low-rank recurrent neural networks (RNNs), in which population activity arises directly from a low-dimensional projection of past activity. Although these two modeling approaches have strong similarities, they arise in different contexts and tend to have different domains of application. Here we examine the precise relationship between latent LDS models and linear low-rank RNNs. When can one model class be converted to the other, and vice versa? We show that latent LDS models can only be converted to RNNs in specific limit cases, due to the non-Markovian property of latent LDS models. Conversely, we show that linear RNNs can be mapped onto LDS models, with latent dimensionality at most twice the rank of the RNN. A surprising consequence of our results is that a partially observed RNN is better represented by an LDS model than by an RNN consisting of only observed units.

## 1 Introduction

Recent work on large-scale neural population recordings has suggested that neural activity is often confined to a low-dimensional space, with fewer dimensions than the number of neurons in a population (Churchland, Byron, Sahani, & Shenoy, 2007; Gao & Ganguli, 2015; Gallego, Perich, Miller, & Solla, 2017; Saxena & Cunningham, 2019; Jazayeri & Ostojic, 2021). To describe this activity, modelers have at their disposal a wide array of tools that give rise to different forms of low-dimensional activity (Cunningham & Yu, 2014). Two classes of modeling approaches that have generated a large following in the literature are descriptive statistical models and mechanistic models. Broadly speaking, descriptive statistical models aim to identify a probability distribution that captures the statistical properties of an observed neural dataset, while remaining agnostic about the mechanisms that gave rise to it. Mechanistic models, by contrast, aim to reproduce certain characteristics of observed data using biologically inspired mechanisms, but often with less attention to a full statistical description. Although these two classes of models often have similar mathematical underpinnings, there remain a variety of important gaps between them. Here we focus on reconciling the gaps between two simple but powerful models of low-dimensional neural activity: latent linear dynamical systems (LDS) and linear low-rank recurrent neural networks (RNNs).

The latent LDS model with gaussian noise is a popular statistical model for low-dimensional neural activity in both systems neuroscience (Smith & Brown, 2003; Semedo, Zandvakili, Kohn, Machens, & Byron, 2014) and brain-machine interface settings (Kim, Simeral, Hochberg, Donoghue, & Black, 2008). This model has a long history in electrical engineering, where the problem of inferring latents from past observations has an analytical solution known as the Kalman filter (Kalman, 1960). In neuroscience settings, this model has been used to describe high-dimensional neural population activity in terms of linear projections of low-dimensional latent variables. Although the basic form of the model includes only linear dynamics, recent extensions have produced state-of-the-art models for high-dimensional spike train data (Yu et al., 2005; Petreska et al., 2011; Macke et al., 2011; Pachitariu, Petreska, & Sahani, 2013; Archer, Koster, Pillow, & Macke, 2014; Duncker, Bohner, Boussard, & Sahani, 2019; Zoltowski, Pillow, & Linderman, 2020; Glaser, Whiteway, Cunningham, Paninski, & Linderman, 2020; Kim et al., 2008).

Recurrent neural networks, by contrast, have emerged as a powerful framework for building mechanistic models of neural computations underlying cognitive tasks (Sussillo, 2014; Barak, 2017; Mante, Sussillo, Shenoy, & Newsome, 2013) and have more recently been used to reproduce recorded neural data (Rajan, Harvey, & Tank, 2016; Cohen, DePasquale, Aoi, & Pillow, 2020; Finkelstein et al., 2021; Perich et al., 2021). While randomly connected RNN models typically have high-dimensional activity (Sompolinsky, Crisanti, & Sommers, 1988; Laje & Buonomano, 2013), recent work has shown that RNNs with low-rank connectivity provide a rich theoretical framework for modeling low-dimensional neural dynamics and the resulting computations (Mastrogiuseppe & Ostojic, 2018; Landau & Sompolinsky, 2018; Pereira & Brunel, 2018; Schuessler, Dubreuil, Mastrogiuseppe, Ostojic, & Barak, 2020; Beiran, Dubreuil, Valente, Mastrogiuseppe, & Ostojic, 2021; Dubreuil, Valente, Beiran, Mastrogiuseppe, & Ostojic, 2022; Bondanelli, Deneux, Bathellier, & Ostojic, 2021; Landau & Sompolinsky, 2021). In these low-rank RNNs, the structure of low-dimensional dynamics bears direct commonalities with latent LDS models, yet the precise relationship between the two classes of models remains to be clarified. Understanding this relationship would open the door to applying to low-rank RNNs probabilistic inference techniques developed for LDS models and conversely could provide mechanistic interpretations of latent LDS models fitted to data.

In this letter, we examine the mathematical relationship between latent LDS and low-rank RNN models. We focus on linear RNNs, which are less expressive but simpler to analyze than their nonlinear counterparts while still leading to rich dynamics (Hennequin, Vogels, & Gerstner, 2014; Kao, Sadabadi, & Hennequin, 2021; Bondanelli et al., 2021). We show that even if both LDS models and linear low-rank RNNs produce gaussian distributed activity patterns with low-dimensional linear dynamics, the two model classes have different statistical structures and are therefore not in general equivalent. More specifically, in latent LDS models, the output sequence has non-Markovian statistics, meaning that the activity in a single time step is not independent of its history given the activity on the previous time step. This stands in contrast to linear RNNs, which are Markovian regardless of the rank of their connectivity. A linear low-rank RNN can nevertheless provide a first-order approximation to the distribution over neural activity generated by a latent LDS model, and we show that this approximation becomes exact in several cases of interest, and in particular, in the limit where the number of neurons is large compared to the latent dimensionality. Conversely, we show that any linear low-rank RNN can be converted to a latent LDS, although the dimensionality of the latent space depends on the overlap between the subspaces spanned by left and right singular vectors of the RNN connectivity matrix and may be as high as twice the rank of this matrix. The two model classes are thus closely related, with linear low-rank RNNs comprising a subset of the broader class of latent LDS models. An interesting implication of our analyses is that the activity of an RNN in which only a subset of neurons are observed is better fit by a latent LDS model than by an RNN consisting only of observed units.

## 2 Modeling Frameworks

We start with a formal description of the two model classes in question, both of which describe the time-varying activity of a population of $n$ neurons.

### 2.1 Latent LDS Model

### 2.2 Low-Rank Linear RNN

Note that this factorization is not unique, but a particular factorization can be obtained from a low-rank $J$ matrix using the truncated singular value decomposition: $J=USV\u22a4$, where $U$ and $V$ are semiorthogonal $n\xd7r$ matrices of left and right singular vectors, respectively, and $S$ is an $r\xd7r$ diagonal matrix containing the largest singular values. We can then set $M=U$ and $N=SV\u22a4$.

The model parameters of the low-rank linear RNN are therefore given by $\theta RNN={M,N,P,V0y}$.

### 2.3 Comparing the Two Models

Both models described above exhibit low-dimensional dynamics embedded in a high-dimensional observation space. In the following, we examine the probability distributions $P(y1,\u2026,yT)$ over time series $(y1,\u2026,yT)$ generated by the two models. We show that in general, the two models give rise to different distributions, such that the family of probability distributions generated by the LDS model cannot all be captured with low-rank linear RNNs. Specifically, RNN models are constrained to purely Markovian distributions, which is not the case for LDS models. However, the two model classes can be shown to be equivalent when the observations $yt$ contain exact information about the latent state $xt$, which is in particular the case if the observation noise is orthogonal to the latent subspace or in the limit of a large number of neurons $n\u226bd$. Conversely, a low-rank linear RNN can in general be mapped to a latent LDS with a dimensionality of the latent state at most twice the rank of the RNN.

## 3 Mapping from LDS Models to Linear Low-Rank RNNs

### 3.1 Nonequivalence in the General Case

Iterating equation 3.4 over multiple time steps, one can see that $x^t+1$ depends not only on the last observation $yt$ but on the full history of observations $(y0,\u2026,yt)$, which therefore affects the distribution at any given time step. The process $(y0,\u2026,yt)$ generated by the LDS model is hence non-Markovian.

*do*form a Markov process, meaning that observations are conditionally independent of their history given the activity from the previous time step:

### 3.2 Matching the First-Order Marginals of an LDS Model

We can obtain a Markovian approximation of the LDS-generated sequence of observations $(y0,\u2026,yt)$ by deriving the conditional distribution $P(yt+1\u2223yt)$ under the LDS model and matching it with a low-rank RNN (Pachitariu et al., 2013). This type of first-order approximation will preserve exactly the one-time-step-difference marginal distributions $P(yt+1,yt)$ although structure across longer timescales might not be captured correctly.

### 3.3 Cases of Equivalence between LDS and RNN Models

Although latent LDS and low-rank linear RNN models are not equivalent in general, we can show that the first-order Markovian approximation introduced above becomes exact in two limit cases of interest: for observation noise orthogonal to the latent subspace and in the limit $n\u226bd$, with coefficients of the observation matrix generated randomly and independently.

Our key observation is that if $KtC=I$ in equation 3.4 with $I$ the identity matrix, we have $x^t+1=AKtyt$, so that the dependence on the observations before time step $t$ disappears and the LDS therefore becomes Markovian. Interestingly, this condition $KtC=I$ also implies that the latent state can be inferred from the current observation $yt$ alone (see equation A.7 in appendix A) and that this inference is exact, since the variance of the distribution $p(xt|yt)$ is then equal to 0 as seen from equation A.8. We next examine two cases where this condition is satisfied.

A second case in which we can obtain $KtC\u2248I$ is in the limit of many neurons, $n\u226bd$, assuming that coefficients of the observation matrix are generated randomly and independently. Indeed, under these hypotheses, the Kalman gain given by equation 3.5 is dominated by the term $CVtC\u22a4$, so that the observation covariance $R$ becomes negligible, as shown formally in appendix B. Intuitively this means that the information about the latent state $x^t$ is distributed over a large enough population of neurons for the Kalman filter to average out the observation noise and estimate it optimally without making use of previous observations. Ultimately this makes the LDS asymptotically Markovian in the case where we have an arbitrarily large neural population relative to the number of latent dimensions.

To illustrate the convergence of the low-rank RNN approximation to the target latent LDS in the large $n$ limit, in Figure 2, we consider a simple example with a one-dimensional latent space and observation spaces of increasing dimensionality. To visualize the difference between the LDS and its low-rank RNN approximation, we plot the trace of the autocorrelation matrix of observations $yt$ in the stationary regime, $\rho (\delta )=Tr(E[ytyt+\delta T])$. Since the RNNs are constructed to capture the marginal distributions of observations separated by at most one time step, the two curves match exactly for a lag $\delta \u2208{-1,0,1}$, but dependencies at longer timescales cannot be accurately captured by an RNN due to its Markov property (see Figure 2b). However, these differences vanish as the dimensionality of the observation space becomes much larger than that of the latent space (see Figures 2b and 2c), which illustrates that the latent LDS converges to a process equivalent to a low-rank RNN.

## 4 Mapping Low-Rank Linear RNNs onto Latent LDS Models

We now turn to the reverse question: Under what conditions can a low-rank linear RNN be expressed as a latent LDS model? We start with an intuitive mapping for the deterministic case (when noise covariance $P=0$) and then extend it to a more general mapping valid in the presence of noise.

In presence of noise $\epsilon t$, $yt$ is no longer confined to the column space of $M$. Part of this noise is integrated into the recurrent dynamics and can contribute to the activity accross many time steps. This integration of noise can occur in an LDS at the level of latent dynamics through $wt$, but not at the level of observation noise $vt$, which is independent accross time steps. As noted above, recurrent dynamics only integrate the activity present in the column space of $N$. In the presence of noise, this part of state space therefore needs to be included into the latent variables. More important, a similar observation can be made about external inputs when they are added to the RNN dynamics (see appendix D).

### 4.1 Subsampled RNNs

## 5 Discussion

In this letter, we have examined the relationship between two simple yet powerful classes of models of low-dimensional activity: latent linear dynamical systems (LDS) and low-rank linear recurrent neural networks (RNN). We have focused on these tractable linear models with additive gaussian noise to highlight their mathematical similarities and differences. Although both models induce a jointly gaussian distribution over neural population activity, generic latent LDS models can exhibit long-range, non-Markovian temporal dependencies that cannot be captured by low-rank linear RNNs, which describe neural population activity with a first-order Markov process. Conversely, we showed that generic low-rank linear RNNs can be captured by an equivalent latent LDS model. However, we have shown that the two classes of models are effectively equivalent in limit cases of practical interest for neuroscience, in particular when the number of sampled neurons is much higher than the latent dimensionality.

Although these two model classes can generate similar sets of neural trajectories, different approaches are typically used for fitting them to neural data: parameters of LDS models are in general inferred by variants of the expectation-maximization algorithm (Yu et al., 2005; Pachitariu et al., 2013; Nonnenmacher, Turaga, & Macke, 2017; Durstewitz, 2017), which include the Kalman smoothing equations (Roweis & Ghahramani, 1999), while RNNs are often fitted with variants of linear regression (Rajan et al., 2016; Eliasmith & Anderson, 2003; Pollock & Jazayeri, 2020; Bondanelli et al., 2021) or backpropagation through time (Dubreuil et al., 2022). The relationship uncovered here therefore opens the door to comparing different fitting approaches more directly, and in particular to developing probabilistic methods for inferring RNN parameters from data.

We have considered here only linear RNN and latent LDS models. Nonlinear low-rank RNNs without noise can be directly reduced to nonlinear latent dynamics with linear observations following the same mapping as in section 4 (Mastrogiuseppe & Ostojic, 2018; Schuessler et al., 2020; Beiran et al., 2021; Dubreuil et al., 2022) and therefore define a natural class of nonlinear LDS models. A variety of other nonlinear generalizations of LDS models have been considered in the literature. One line of work has examined linear latent dynamics with a nonlinear observation model (Yu et al., 2005) or nonlinear latent dynamics (Yu et al., 2005; Durstewitz, 2017; Duncker et al., 2019; Pandarinath et al., 2018; Kim et al., 2008). Another line of work has focused on switching LDS models (Linderman et al., 2017; Glaser et al., 2020) for which the system undergoes different linear dynamics depending on a hidden discrete state, thus combining elements of latent LDS and hidden Markov models. Both nonlinear low-rank RNNs and switching LDS models are universal approximators of low-dimensional dynamical systems (Funahashi & Nakamura, 1993; Chow & Li, 2000; Beiran et al., 2021). Relating switching LDS models to local linear approximations of nonlinear low-rank RNNs (Beiran et al., 2021; Dubreuil et al., 2022) is therefore an interesting avenue for future investigations.

## Appendix A: Kalman Filtering Equations

We reproduce in this appendix the recurrence equations followed by the conditional distributions in equation 3.1 for both the latent LDS and the linear RNN models.

From equation A.10, we see that the predicted state at time $t+1$, and thus the predicted observation, depends on observations at time steps $\tau \u2264t-1$ through the term $x^t$, making the system non-Markovian. Also note that equations for the variances do not involve any of the observations $yt$, showing these are exact values and not estimations.

## Appendix B: Equivalence in the Large Network Limit

Here we make the assumption that the coefficients of the observation matrix are generated randomly and independently. We show that in the limit of large $n$ with $d$ fixed, one obtains $KtC\u2192I$ so that the LDS is asymptotically Markovian and can therefore be exactly mapped to an RNN.

We start by considering a latent LDS whose conditional distributions obey equations A.1 to A.11, with the Kalman gain obeying equation A.9. To simplify equation A.9, we focus on the steady state where variance $Vt$ has reached its stationary limit $V$ in equation A.11.

Assuming the coefficients of the observation matrix are independent and identically distributed (i.i.d.) with zero mean and unit variance, for $n$ large, we obtain $C\u22a4C=nI+O(n)$ from the central limit theorem so that $(C\u22a4C)-1=O(1/n)$ (which can again be proven with a Taylor expansion). This finally leads to $KtC=I+O(1/n)$.

## Appendix C: Derivation of the RNN to LDS Mapping

## Appendix D: Addition of Input Terms

## Acknowledgments

We thank both reviewers for constructive suggestions that have significantly improved this letter. In particular, we thank Scott Linderman for the alternative proof in appendix B. A.V. and S.O. were supported by the program Ecoles Universitaires de Recherche (ANR-17-EURE-0017), the CRCNS program through French Agence Nationale de la Recherche (ANR-19-NEUC-0001-01), and the NIH BRAIN initiative (U01NS122123). J.W.P. was supported by grants from the Simons Collaboration on the Global Brain (SCGB AWD543027), the NIH BRAIN initiative (R01EB026946), and a visiting professorship grant from the Ecole Normale Superieure.

## References

*Advances in neural information processing systems, 27*

*Current Opinion in Neurobiology*

*Neural Computation*

*eLife*

*IEEE Transactions on Circuits and Systems I: Fundamental Theory and Applications*

*Current Opinion in Neurobiology*

*Recurrent dynamics of prefrontal cortex during context-dependent decision-making*

*Nature Neuroscience*

*Nature Neuroscience*

*Proceedings of the International Conference on Machine Learning*

*PLOS Computational Biology*

*Neural engineering: Computation, representation, and dynamics in neurobiological systems*

*Nature Neuroscience*

*Neural Networks*

*Neuron*

*Current Opinion in Neurobiology*

*Advances in neural information processing systems*

*Neuron*

*Current Opinion in Neurobiology*

*Journal of Basic Engineering*

*Neuron*

*Journal of Neural Engineering*

*Nature Neuroscience*

*PLOS Computational Biology*

*Phys. Rev. Research.*

*Proceedings of the 20th International Conference on Artificial Intelligence and Statistics*

*Advances in neural information processing systems*

*Nature*

*Neuron*

*Advances in neural information processing systems, 30*

*Advances in neural information processing systems*

*Nature Methods*

*Neuron*

*Inferring brain-wide interactions using data-constrained recurrent neural network models.*

*Advances in neural information processing systems, 24*

*PLOS Computational Biology*

*Neuron*

*Neural Computation*

*Current Opinion in Neurobiology*

*Physical Review Research*

*Advances in neural information processing systems, 27*

*Neural Computation*

*Phys. Rev. Lett.*

*Current Opinion in Neurobiology*

*High-dimensional statistics: A non-asymptotic viewpoint.*

*The Kalman filter*

*Advances in neural information processing systems, 18*

*Derivation of Kalman filtering and smoothing equations*

*Proceedings of the International Conference on Machine Learning*