Self-Organization of Nonlinearly Coupled Neural Fluctuations Into Synergistic Population Codes

Abstract Neural activity in the brain exhibits correlated fluctuations that may strongly influence the properties of neural population coding. However, how such correlated neural fluctuations may arise from the intrinsic neural circuit dynamics and subsequently affect the computational properties of neural population activity remains poorly understood. The main difficulty lies in resolving the nonlinear coupling between correlated fluctuations with the overall dynamics of the system. In this study, we investigate the emergence of synergistic neural population codes from the intrinsic dynamics of correlated neural fluctuations in a neural circuit model capturing realistic nonlinear noise coupling of spiking neurons. We show that a rich repertoire of spatial correlation patterns naturally emerges in a bump attractor network and further reveals the dynamical regime under which the interplay between differential and noise correlations leads to synergistic codes. Moreover, we find that negative correlations may induce stable bound states between two bumps, a phenomenon previously unobserved in firing rate models. These noise-induced effects of bump attractors lead to a number of computational advantages including enhanced working memory capacity and efficient spatiotemporal multiplexing and can account for a range of cognitive and behavioral phenomena related to working memory. This study offers a dynamical approach to investigating realistic correlated neural fluctuations and insights to their roles in cortical computations.


Introduction
The firing activity of cortical neurons in the brain exhibits large fluctuations over time and across trials that are characterized by rich spatiotemporal correlation structures as revealed by large-scale simultaneous cortical recordings [52,63].It has been suggested that correlated neural variability may play synergistic or destructive roles in neural probabilistic representations such as neural population coding [30,44] and working memory [11,46].However, existing theoretical analyses of the stochastic dynamics of neural systems primarily rely on firing rate models with additive gaussian noise or simplified Poisson neuron models without correlations [11,31,46,50,68].As a result, how correlated neural fluctuations may arise from the intrinsic neural circuit dynamics and subsequently affect the computational properties of neural population remains less well understood [18,28].The main difficulty lies in resolving the nonlinear coupling between correlated fluctuations with the overall dynamics of the system.
It is well known that simple structures in a recurrent circuitry can give rise to many emergent properties to neural population activity including symmetry breaking and spontaneous pattern formation [14,25].One such example is found in bump attractors, a type of self-sustained, spatially localized patterns arising from neural circuit models with shortrange excitation and long-range inhibition [3,14].Due to their translation symmetry, bump attractors can be used to encode and store continuous features such as spatial location and orientation [73], and have been suggested as a possible neural mechanism for a range of perceptual and cognitive processes including spatial navigation [40,42,74], spatial working memory [13,69,71,75], sensory evidence accumulation [22], and other cortical functions [8,56].Modeling studies have shown thatmultiple bump states can undergo complex interactions including repulsion, annihilation,merging, and splitting [33,48].It has been proposed that neural fluctuations may influence the coding properties of bump attractors in two ways.At short timescales, noisy spike count directly contributes to the coding error of a bump attractor, whereas at long timescales, noise further induces a random drift of the bump attractor along the attractor manifold, resulting in degradation of working memory over time [11].Thus, it is critical to understand how correlated fluctuations may arise intrinsically from recurrent neural circuitry and how they affect the dynamics of bump attractors and subsequently the relevant cognitive functions.
In this study, we investigate how synergistic neural population codes originate from the intrinsic dynamics of correlated neural fluctuations by using a biologically realistic model known as the moment neural network (MNN) [23,38], which faithfully captures the nonlinear noise coupling of spiking neurons up to second-order statistical moments [51].As suggested by experimental and theoretical studies [18,53], second-order statistical moments can provide a minimalistic yet adequate way for capturing the statistical properties of fluctuating neural activity.We construct a bump attractor network based on this model and show that complex spatial correlations naturally emerge from the intrinsic dynamics of recurrent circuits.Analysis of the model reveals the interplay between differential and noise correlations as a key factor determining the computational properties of neural population codes in bump attractors.We also find that negatively correlated neural fluctuations can induce stable bound states between adjacent bumps, a phenomenon previously unobserved in firing rate models.These novel dynamical properties of bump attractors can account for a diversity of experimental observations of cognitive and behavioral responses related to working memory [60,65], and provide a circuit mechanism for efficient neural coding based on spatiotemporal multiplexing [1,12] as well as enhanced working memory capacity.
This study offers new insights into how nonlinear noise coupling shapes correlated neural fluctuations into synergistic neural population codes and opens up a dynamical approach to investigating correlated neural fluctuations and their computational properties in neural systems.

A dynamical model for correlated neural variability
A major challenge faced by firing rate models is difficulties in capturing realistic neural fluctuations that are nonlinearly coupled to the recurrent circuit dynamics.To illustrate this point, consider the schematic diagram shown in Figure.1(a).In simplified neural circuit models, neural fluctuation is often modeled as either an external additive input or independent Poisson variability.In a more realistic setting, correlated neural fluctuation may arise intrinsically through the recurrent neural circuit dynamics, as represented by the strong nonlinear coupling between the spike count mean and spike count covariance [18,28,38].Therefore, to understand the coding and computational properties of bump attractors, it is imperative to develop a model that can accurately describe the statistics and nonlinear coupling of fluctuating neural activity.To overcome this challenge, we employ a model known as the moment neural network (MNN), which faithfully captures the nonlinear coupling of correlated fluctuations in a recurrent population of spiking neurons [38].The model is governed by the following closed system of equations describing the dynamics of the second-order statistical moments of neural activity, where the mean firing rate µ i and firing covariability C ij represent the mean and covariance of the spike count per unit time, respectively, and τ is the membrane time constant.The functions ϕ µ , ϕ σ and ψ are pointwise activation functions describing the relationship between the input current statistics and the output spike train statistics.For the leaky integrate-and fire (LIF) spiking neuron model, the functional form of the activation functions are given by equations ( 8) and (9) in Methods.Unlike previous models with heuristic activation functions [3,14,15,73], these activations are derived through a mathematical technique known as the diffusion approximation [4,5]which faithfully captures the nonlinear coupling of mean firing rate and firing variability across populations of neurons.The quantities without recurrent noise-coupling with recurrent noise-coupling [ ] [ ] cov [ , ] cov [ , ] n n n Figure 1: Nonlinear noise coupling in a bump attractor network.(a) Comparison of neural circuit models with and without nonlinear noise coupling.In models without nonlinear noise coupling (left panel), the recurrent dynamics affect the spike count mean (solid arrow) with spike count covariance injected additively to the system (dashed arrow).In models with nonlinear noise coupling (right panel), both spike count mean and spike count covariance are coupled recurrently with each other through nonlinear interactions (purple and green arrows).Here, n represents spike count.(b) A ring network with short-range excitation (red curves) and long-range inhibition (blue curves) supporting bump activity pattern.(c) Synaptic strengths w as a function of the distance ∆x between two neurons.(d) The mean firing rate µ i of a bump attractor forms a ring manifold (black line) parameterized by its center ofmass, additionally equipped with a firing covariability C ij (orange ellipsoid; scaled for visibility) at each point on the attractor manifold.The plot shows the attractor manifold in the space spanned by the activity (spike count) of three neurons.
μi and Cij correspond to the mean and the covariance of the total synaptic current, respectively, and are defined as where w ij is the synaptic weight from the ith neuron to the jth neuron, σ2 i = Cii denotes the current variability, and µ ext and σ 2 ext are the mean and variance of a spatially uniform external input, respectively.We fix σ 2 ext throughout this study and systematically vary µ ext .See Methods for details of the parameter settings.We note a couple of important features of this model.First, unlike models with linearly coupled covariance as considered in [10,62], equations ( 1) and ( 4) capture the nonlinear interactions between mean firing rate and firing covariability derived from realistic spiking neuron models.Second, any covariance structure that emerges from the model is due to intrinsic dynamics of the recurrent circuit, not determined by the external input, which is uncorrelated.
To implement the bump attractor, each neuron i is assigned to spatial coordinates x i ∈ [−π, π) over a feature space with a periodic boundary condition and is connected to each other via synaptic connections with short-range excitation and long-range inhibition according to equation (10) in Methods.A schematic diagram of this neural circuit structure is shown in Figure 1b, and the synaptic connections are shown in Figure 1c.

Correlated variability in bump attractors
In firing rate-based models without correlation, distance-dependent synaptic coupling such as that shown in Figure 1b can give rise to stable activity state known as bump attractors [3].Due to the translation invariance of the synaptic coupling, the mean firing rate µ i of a bump attractor forms a ring manifold parameterized by its center of mass.Experimentally, bump or ring attractors have been found in various biological neural systems such as the fruit fly ellipsoid body [29], primate V1 [52], and primate prefrontal cortex [71].However, the firing rate-based model is inadequate, since it omits the fact that neurons in the ring attractors are correlated (e.g., Fig. 7 in [71]).The MNN overcomes this shortcoming by integrating the ring manifold with the intrinsic covariance structure of neural activity that is nonlinearly coupled with the mean firing rate.As a result, the topology of the attractor manifold in our model is no longer represented by points on a ring embedded in the Euclidean space R n spanned by the mean firing rates µ, but also extends to the positive semidefinite cone S n + (R) spanned by the firing covariability C.This topological structure can be visualized as a family of ellipsoids with their centers located on the ring manifold and with their sizes and orientations representing the covariance structure of neural activity (Figure 1d).
In the following, we reveal the emergence of intrinsic spatial correlations in the bump attractor network through numerical simulations of the system defined by equations ( 1)-( 4).Similar to conventional firing rate-based models [3,14], when the external input mean µ ext is weak, the system has only one stable solution, the resting state (see state I in Figure 2a).As µ ext increases, stable bump states emerge (see State II in Figure 2a).Consistent with previous findings [3,14], the mean firing rate µ i = µ(x i |s) of the bump state exhibits a unimodal spatial profile peaked at its center s.In contrast, the firing variability σ 2 i = σ 2 (x i |s) is bimodal, with two peaks near the edges of the bump and a local minimum at the center.This is in marked contrast to the commonly assumed independent Poisson activity, in which the firing variability is proportional to the mean firing rate or uncorrelated gaussian noise with constant magnitude [11,39].We find that neural activity is positively correlated within the bump and uncorrelated outside it.Note that the stable bump state coexists with the stable resting state, allowing the system to switch between them when subjected to appropriate external stimuli.Furthermore, unlike firing rate-based models, additional phase transitions occur in the correlation structure of the bump attractor.As µ ext further increases, negative correlations emerge between neurons on the opposite sides of the bump, while the shape of the mean firing rate remains qualitatively the same (see state III in Figure 2a).Note that both the mean and variance of the external input are uniform, suggesting that the spatial correlation structure is an emergent property of the system.Similar to firing rate-based models, further increasing µ ext eventually leads to the saturation of mean firing rate into a spatially uniform activity.However, we find that the same case is not true for spatial correlations, which transform into a periodic wave pattern (see state IV in Figure 2a).Such a distance-dependent correlation structure may potentially arise from synchronized firing with the same and the opposite phases.
To quantify the transition between these different activity patterns, we calculate the height h and the full width at half maximum b of the bump solution for varying the external input mean µ ext .As shown in Figure 2b, the phase diagrams constructed from these measurements reveal four distinct regimes occupied by the neural activity patterns I-IV.We find that both h and b vary smoothly as a function of µ ext within each regime and that they exhibit discontinuous jumps at phase transitions.The discontinuous jumps are qualitatively consistent with saddle-node bifurcations found in firing rate bump attractor models [3,14].It is worth noting that both the mean firing rate and the correlation coefficients exhibited by our model are within a biologically plausible range, consistent with experimental observations [71].
To further illustrate the internal correlation structure of the bump attractor, we focus on two neurons and inspect how their mean firing rate and correlation depend on the bump center s.Specifically, given two neurons located at x 1 = 0.2π and x 2 = 0.4π, their mean firing rates µ follow bell-shaped tuning functions peaked at x 1 and x 2 respectively.This holds for both states II and III, as shown in the top panels in Figure 2(c).For each state, a distinct pattern is found in the correlation coefficient between these two neurons.For state II, the correlation coefficient ρ(x 1 , x 2 ) reaches the maximum when s is located in the middle between x 1 and x 2 , and monotonically decays to zero when s move away from the middle point in both directions.For State III, when x 1 < s < x 2 (in-flank cases), we have ρ(x 1 , x 2 ) < 0; when s = x 1 or s = x 2 (peak cases), we have ρ(x 1 , x 2 ) = 0; and when s < x 1 or s > x 2 (out-flank cases), we have ρ(x 1 , x 2 ) > 0. We find that the correlation pattern predicted by state III of our model is consistent with experimentally observed pairwise correlation structures in the primate prefrontal cortex [71].This study analyzed neuronal recordings in monkeys during a visuospatial delayed response task, in which they were required to memorize the spatial locations of transient stimuli and found that the tuning curves and pairwise correlation structures in the prefrontal cortex corroborated the bump attractor hypothesis, that is, the neural population held a bump activity that encoded the cue location s ∈ [−π, π) by the bump center during the delay period.As shown in Figure 7f of their paper, in the out-flank, in-flank and peak cases, the sign of the correlations of two neurons with two preferred stimuli x 1 and x 2 are the same as our prediction in State III of Figure 2c.Additionally, our model predicts the existence of a pairwise correlation structure as in state II of Figure 2c, which has not been experimentally observed before.
Importantly, in both [71] and our model,the correlations of each neuronal pair are dependent on the location of the bump center s, and this is more general than previous models, where the correlations are assumed to be translation invariant [6,20,58].Although both the rate-based model [71]and theMNN display a bell-shaped firing rate profile during the delay period of working memory (compare Figure 8a in their paper with State III in Figure 2), the MNN offers a number of unique advantages.First, in the rate-based model, the spiking activity of each neuron is modeled as an inhomogeneous Poisson process whose instantaneous firing rate is determined by the dynamics of the rate-based model.In contrast, the MNN directly captures the dynamics of neural spiking activity in terms of the nonlinear coupling between mean firing rate and firing covariability, without imposing such simplifying abstractions.Moreover, in the rate-based model, the diffusion of bump state along the attractor manifold due to external noise is required to explain several experimental observations such as negative noise correlation and stimulus-dependent Fano factor.However, by considering the intrinsic covariance structure in the MNN, many of these experimental observations can be explained without invoking the diffusion of bump state.See the supplementary information (Supplementary Information) for detailed comparisons between the stochastic rate-based model and the MNN.These analyses demonstrate that by considering the nonlinear noise coupling in recurrent neural circuits, the MNN can account for the correlation structures of neural fluctuations in experimental data and lay the theoretical foundation for exploring the role of correlated neural fluctuations in brain functions such as neural coding and working memory.

Noise covariance influences coding accuracy
Due to the translation symmetry of bump attractor, its center s can be used to encode continuous variables.We analyze how the coding accuracy of bump attractor is influenced by correlated activity state.The coding accuracy can be quantified using the linear Fisher informationI linear (∆t) = u(s) T C(s) −1 u(s)∆t, where u i (s) = dµi(s)  ds corresponds to the tangent vector along the attractor manifold and ∆t is the spike count time window.The linear Fisher information I linear (∆t) is approximately independent of s for large population size due to translation invariance.Linear Fisher information provides an upper bound on the certainty of the bump's location as can be determined by an optimal linear decoder from random spike count [9,24,26].Figure 3a shows the linear Fisher information rate (I := Ilinear(∆t) ∆t ) of a bump attractor for encoding s under varying the external input mean µ ext .We observe that the phase transition from state II to state III caused by increasing µ ext coincides with an abrupt decrease in I. Since cognitive performance at the behavioral level is limited by the coding accuracy in upstream neural populations [35], these two dynamical regimes of bump activity state may correspond to different levels of cognitive performance.In particular, our model predicts that variations in global circuit parameters such as task-independent external input mean µ ext are sufficient to induce shifts in cognitive state, which may manifest as a sudden deterioration of task performance.To understand what causes the drastic difference in the coding accuracy between bump states II and III, we turn to analyze the contribution of correlated neural variability to the linear Fisher information.We find that the linear Fisher information is jointly influenced by two types of firing covariability, differential covariance and noise covariance [6].Differential covariance u(s)u(s) T measures the covariance due to changes in the mean firing rate µ under infinitesimal shifts in s [30], whereas noise covariance C(s) in our model measures the spike count covariance per unit time between two neurons for fixed s. Figure 3b compares the spatial structures of differential covariance and noise covariance in states II and III.Both states exhibit similarly shaped differential covariance with positively correlated neurons at the same side of the bump and negatively correlated neurons at the different sides of the bump, consistent with previous firing rate-based models of bump attractor [73].However, our model reveals distinct shapes in the noise covariance that could not be captured by previous models.For state II, the noise covariance is nonnegative, whereas for state III, the noise covariance contains both positive and negative values.It turns out that unlike state II, both noise and differential covariances have the same sign in state III, which could cause the sudden decrease in linear Fisher information.This result is consistent with previous theoretical analysis of neural population coding [41], which attributes the information decrease to the overlap between these two kinds of covariances.
The interplay between the differential covariance and noise covariance can be further understood geometrically by considering two-dimensional projections of the bump attractor state onto the space spanned by the spike count of two neurons.As shown in Figure 3c, the black lines represent the mean firing rate of the bump attractor in the space spanned by the activity of two neurons whose preferred s differ by 0.2π.Orange ellipses represent the firing covariability (scaled for visibility) for when the mean firing rates take specific values on the ring manifold as marked by the red dots.In this figure, the differential covariance corresponds to the tangent space of the attractor manifold represented by the mean firing rate, whereas noise covariance corresponds to the orange ellipses.Remarkably, the major axis (blue line) of the noise covariance is roughly orthogonal to the attractor manifold in state II but is roughly parallel to the attractor manifold in state III.As a result, the coding error, which is dominated by the projection of the noise covariance onto the attractor manifold, is smaller in state II than in state III.
Previous theoretical studies on neural population coding suggest that Fisher information may saturate with increasing neural population size due to information-limiting correlations [6,41,44].It is shown analytically that in a neural population with homogeneous tuning function and distance-dependent noise correlation ρ( Fisher information grows linearly or superlinearly with population size when the noise correlation is zero or negative, respectively, but saturates to a finite value when the noise correlation is positive [59].However, this saturation may disappear when the neural population has heterogeneous tuning [20].It is pointed out that such information-limiting correlation can be largely attributed to the overlap between noise correlation C and differential correlation uu T [41]. We run simulations of the neural circuit model with varying numbers of neurons to investigate the effect of population size on linear Fisher information rate I for states II and III, respectively (see Figure 3d).We find that in both cases, linear Fisher information grows approximately linearly with the population size, though this growth is severely reduced in the case of state III.This outcome can be explained by observing that the noise covariance in state III significantly overlaps with differential covariance but not in state II (see Figure 3b).While information is reduced in state III, it does not saturate to a finite limit as the noise covariance is similar in sign but not completely parallel to the differential covariance.This is confirmed by calculating the ratio between the information and the population size, which converges to a nonvanishing value in the limit of large population size (Supplementary Information, including Figure S1),and is further elucidated by theoretical analysis using simplified covariance structures (see equations S1 and S30 in Supplementary Information).
These results are thus consistent with the theory that differential correlation serves as an information-limiting correlation [41].Furthermore, our model captures the emergence of mean firing rate µ(x) and noise covariance C(x, x ′ ) through the intrinsic dynamics of the recurrent circuit, thereby offering an explanation about the dynamical origin of information-limiting correlations.Importantly, our model demonstrates that such information-limiting correlation can be switched on and off by changing a global parameter (the external input mean µ ext in this case), which determines the dynamical regime of the bump attractor.
Previously, it has also been found that the information content in a bump attractor is more concentrated around the edges of the bump state with independent Poisson activity [57].We also investigate the contribution of linear Fisher information from each neuron as shown in Supplementary Information, Figure S2.We find that in state II, the neurons located at the boundaries of the bump are more important for coding than other neurons, as consistent with previous models [57].Unexpectedly, in state III, the neurons near the center of the bump provide negative contributions to the linear Fisher information, suggesting that some neurons may be harmful for coding.Since the correlated fluctuation is coupled with the neural dynamics in our model, our findings suggest a way for the neural population to directly regulate its coding efficiency through manipulating its intrinsic correlation structure.
The analysis above primarily focuses on the coding property of stationary bump activity.In the context of working memory, this can be thought as representing the neural activity state during the delay epoch of working memory.In addition to such time-independent mnemonic coding subspace, [43] have also identified the presence of a dynamic coding subspace by analyzing neural spike data of the primate prefrontal cortex during working memory tasks.Following [43], we investigate the dynamic coding in our MNN by analyzing the temporal dynamics of the bump state during and after cue presentation for each of states II and III (see Figures S5-S6 in Supplementary Information).We observe that the population correlation (equation S1 of [43]) between the population activity states and a reference population activity at the cue epoch reaches its peak during the cue epoch and gradually decays during the delay epoch (see Figure S5c, Supplementary Information).This observation suggests the existence of a dynamic subspace that undergoes decay during the delay epoch.Furthermore, we find that the linear Fisher information of the dynamic coding in state II (see Figure 2)is significantly higher than that of the mnemonic coding, and the linear Fisher information of both types of coding becomes similar during the delay epoch (see Figure S5d, Supplementary Information).This is consistent with that study, which demonstrated that the decoding accuracy of dynamic coding is substantially higher than that of mnemonic coding during the cue epoch, but both accuracies become similar during the delay epoch.Interestingly, we find the opposite result in state III (see Figure 2), where the linear Fisher information of mnemonic coding is significantly higher than that of the dynamic coding during the cue period.This suggests that the brain might adopt a correlation structure in state II during the cue epoch for better coding efficiency.(See the Supplementary Information for details.)

Stable bound states of bump attractors
In the previous section, we revealed the influence of correlated neural variability on neural coding by a single bump attractor state.In this section, we further investigate its impact on the dynamical interaction of multiple bump attractor states.In bump attractor networks, multiple bump states elicited by different stimuli may coexist in the same neural circuit and interact through distance-dependent synaptic connections.Conventional firing rate-based models predict that such interactions typically lead to either repulsion due to long-range synaptic inhibition or merging due to short-range excitation [33,72].
In our model, however, we find that bump attractors may become dynamically coupled through negatively correlated fluctuations, forming stable bound states previously unobserved in firing rate-based models.Figure 4a shows two bumps (see State II in Figure 2a) forming such a stable bound state.Both the mean firing rate µ and firing variability σ 2 of each of them largely retain the similar shapes as that of a single isolated bump.However, the system now also exhibits negative correlations between two bumps, in addition to the positive correlations within each bump.
Remarkably, the interbump distance d of such a two-bump state forms a stable equilibrium, that is, if the system is initialized with the interbump distance d smaller or larger than the stable equilibrium, the bumps will repulse or attract each other to eventually reach the stable distance, as shown in Figure 4b.Similar phenomena of stable bound states have been found in other noise-coupled nonlinear systems such as interacting solitons in mode-locked lasers [27,70] and electron-electron attraction due to lattice vibration in superconductors [7].
To quantify the interaction between two bumps, we employ a projection method to calculate the effective interaction energy.In this method, we first clamp one of the two bumps in Figure 4b and consider its influence on the other bump (the free-moving bump) as if it acts like an external input.We next assume that the shape of the free-moving bump does not vary significantly as it is shifted around the stable equilibrium s f by z.Under these simplifications, we can project the entire system state on to the neutrally stable left eigenspace along the attractor manifold represented by the mean firing rate of the free-moving bump to arrive at an approximate equation of motion for the bump's spatial deviation z [11].Due to the complexity of our model, explicit analytical solution is prohibitive, and we instead resort to numerical analysis as follows.We approximate the mean firing rate and firing covariability of the free bump as , respectively, where µ f (x|s f ) and C ff (x, x ′ |s f ) represent the shape of the free bump's mean and covariance at the stable equilibrium s f .Following [62], we denote ⟨f, g⟩ = 2π N i f (x i )g(x i ) as the inner product, and calculate the left eigenvector of the system for the free bump near the equilibrium as v = a dμ f dz , where μf is the synaptic current received by the free bump (equation S37), and a = 1/⟨ dμ f dz , dµ f dz ⟩ is a normalization coefficient.Projecting both sides of equation ( 1) for the full system to the left-eigenspace spanned by v, we obtain where µ(z) and C(z) denote the system state after shifting the free bump by z, d = s c − s f − z is the interbump distance, and s c is the peak location of the clamped bump.Based on this equation of motion, given the distance d between two bumps, we can define the corresponding effective interaction energy as E (See equations S32 to S46 for details of the derivations ofb(d).)As shown in Figure 4c, the stable bound state of bump attractors corresponds to the local minimum of the effective interaction energy with an interbump distance of d * = s c − s f = 0.65π.Interestingly, when the negative correlation between two bumps is clamped to zero, the bound state loses its stability as indicated by the disappearance of the local minimum in the effective interaction energy (see Figure 4c), resulting in repulsive interaction.We find that as we increase the strength of the external input mean µ ext , the stable distance d * between two bump attactors increases, as shown by the black curve in Figure 4d.Interestingly, the stable distance d * exhibits a steep increase near µ ext = 0.93, which coincides with the transition point between states II and III of a single bump in Figure 2.However, the internal covariance structures of these two bumps do not change until µ ext is much higher, presumably due to the interaction between the bumps (see Figure S3).The stable distance curve in Figure 4d is laced over the effective interaction energy, whose local minimum for each fixed µ ext coincides with d * .These results indicate that negative interbump correlation is responsible for generating the stable-bound states of bump attractors in our model, preventing strong repulsion between two bumps as typically observed in firing rate-based models.
To gain further theoretical understanding of the origin of negative interbump correlations, we consider the overall interactions between the neural populations of these two bumps, which can be roughly characterized by two neural masses with self-excitation and mutual inhibition.Denote the mean firing rate and firing covariability of these two neural masses at the steady state as and the average synaptic weights between the two populations as w = w11 w12 w21 w22 , such that w 11 , w 22 > 0 and w 12 , w 21 < 0, as required by the self-excitation and the mutual inhibition.Under mild conditions over the neural activation, which are satisfied in our model, the activity of these two populations of neurons becomes negatively correlated: c 12 < 0 (see equations S47 and S55 for proof).This implies that the negative correlation is a direct consequence of the inhibition between two neural populations.This analysis leads to the surprising conclusion that mutual inhibition between bumps increases working memory capacity by reducing the minimal distance between adjacent bumps, contrary to conventional wisdom suggesting that inhibitions should limit working memory capacity through repulsive interactions between bumps [21,33].We turn to this point in the next section.

Intrinsic negative correlations regulate dynamical interactions between memorized items
Previous studies using a firing rate-based model found that the interactions between two bumps are attractive when they are close, and they become repulsive past an unstable equilibrium [72], similar to the energy landscape indicated by the dashed line in Figure 4c.However, such attractive or repulsive interactions will either cause the two bumps to merge, preventing the system from discriminating two distinct items [2],or to drift away from each other, thereby limiting the coding resolution.First, our model demonstrates that through negative correlations, two bumps can maintain a closer distance than the case without negative correlations.Moreover, when two bumps are too close, the repulsive interaction will restore their distance to the stable equilibrium.As a result, our findings provide a mechanism enabling discrimination of similar items encoded by bump states [36].Second, our results suggest that nonlinearly coupled, correlated noise can have unexpected effects of stabilizing the spatial location of bump states, in contrast to previous findings that bump states should drift away along the attractor manifold under the influence of additive uncorrelated noise [11].In particular, when the position of one bump is maintained through external mechanisms such as external stimulus or top-down attention, another adjacent bump state can remain stabilized through the attraction toward the equilibrium distance.Our model thus provides a potential way to improve the coding accuracy of a bump attractor against this drifting effect.
The preceding analysis of bump interactions can be applied to understand a range of experimental observations of cognitive and behavioral processes involved in working memory [71,73] and eye movement presumably reflecting higher cognitive states [60,65,72].It has been widely observed that the trajectory of saccadic eye movement (typically from an initial fixation to a target location) can deviate either toward [66] or away from [61]an element other than the target in visual field.In one study [66],human participants were instructed to perform two saccades from fixation toward different targets in consecutive trials.When the delay between two trials was short and the distance between two targets was large, the trajectory of the second saccade deviated toward the first target instead of directly going to the second target through a straight line.In another study [61], on the contrary, the trajectory of eye movement deviated away from a nearby visual stimulus whose location was required to be memorized.Existing theoretical proposals for explaining these phenomena involve top-down modulation whose ability to control rapid eye movements, however, appears to be limited both temporally and spatially [64], and an exact mechanism is still unclear [60].
The stable bound states found in our model provide an alternative neural mechanism for the above phenomena without resorting to top-down modulation.We first establish the connection between our model and the above two experiments.In our bump attractor model, the ring attractor manifold represents the angular direction in visual space, whereas the previously memorized and the current targets of eye movement are encoded by the centers of bump activities.The clamped bump in our model as in Figure 4correspond to the direction of the previously shown target in [66] or that of the stimulus stored in working memory in [61].In both cases, the clamped bump induces an energy landscape like that in Figure 4(c),which interacts with the free-moving bump encoding the angular direction of the affected eye movement.It turns out that the influence of the energy landscape induced by the clamped bump is consistent with these experiments: the interaction of the two bumps is attractive when their distance is large, which can explain the deviation of eye movement toward the previously shown target in [66],and is repulsive when their distance is small, which can explain the deviation of the eye movement away from the stimulus location in [61].The nontrivial dependence of bump interaction on their distance as predicted by our model can be tested by future experiments through systematically investigating the dependence of behavioral errors during working memory tasks on the difference between memorized items or between one memorized item and a distractor stimulus.These results demonstrate that negatively correlated neural fluctuations modulate the interaction between bump states, which in turn influences cognitive functions and behavior and highlights the importance of considering correlated neural fluctuations in modeling neural circuit dynamics.In the next section, we analyze how these noise-mediated dynamical effects can facilitate neural representation in working memory.
7 Intrinsic negative correlations lead to efficient coding strategies for working memory Due to their translation invariance and bistability, bump attractors have been proposed as the neural substrate for spatial working memory [21,46,71,73,75].The number of bumps the system can simultaneously maintain is interpreted as working memory capacity.To investigate how working memory capacity is influenced by correlated neural fluctuations, we elicit multiple bump states in our model with bell-shaped transient inputs.
Neural activity states holding multiple (three and four) bumps (see state II, µ ext = 0.935) are shown in Figure 5a.The shape of mean firing rate µ and firing variability σ 2 of individual bumps are qualitatively consistent with that of an isolated bump (Figure 2), but with slight deformations, lower mean firing rate, and higher firing variability.When the number of bumps is three, correlations with different bumps are mainly negative, similar to that in a two bumps' case (see Figure 4).When the number of bumps is four, adjacent bumps are negatively correlated (spaced by π/2), (a) (b) Figure 5: Multiple bump attractors with intricate correlation structures.The mean firing rate, firing variability, and correlation coefficient of the neural circuit with three (left panels) or four (right panels) bumps.In the three-bump case, correlations between different bumps are mainly negative.In the four-bump case, adjacent bumps are negatively correlated, while every other bump is positively correlated.(b) The circuit fails to support more than two bumps when the negative correlation is clamped to zero.
while every other bump is positively correlated (spaced by π).As we have shown with the two-bump case, negative correlation can reverse the strong repulsion between adjacent bumps, suggesting that negative correlation could be a key factor for enabling the multibump configuration shown above.To test this idea, we clamp the negative correlations in the four-bump case in Figure 5a to zero and find that the activity of some bumps is quenched by the mutual inhibition from adjacent bumps, resulting in the failure for the system to hold more than two bumps, as shown in Figure 5b.This observation is consistent with that the removal of negative correlation turns the interaction between two bumps from a balance of attraction and repulsion into pure repulsion, as described in Figure 4.These results suggest that negative correlations can enable multiple bumps to become more closely packed, thereby enhancing the memory capacity of the system.
Experimental evidence suggests that working memory capacity is limited [16,17].There are two theoretical mechanisms through which such limit may occur.The first is a spatial mechanism that mutual inhibition limits the number of stable bump states that can be simultaneously maintained.The second of them is a temporal mechanism based on the theta-gamma coupling of neural oscillation, in which a limited number of gamma cycles, representing distinct memorized items, can be fitted in each theta cycle [37].Interestingly, although our model does not explicitly represent temporal oscillations, it is compatible with the interpretation that the observed correlations may reflect the synchrony or antisynchrony between different bump states.Thus, by taking into account correlated neural variability, our model hints toward a unifying perspective for the limit of working memory capacity based on spatiotemporal working memory representation.
These results also suggest an efficient computational strategy for memory storage by fluctuating neural activity, in which limited resources (i.e., neural spikes) are distributed across both space and time.Spatially, negative correlations increase memory capacity by reducing the distance between adjacent bumps (see Figure 4).Temporally, negative correlations can be interpreted as a form of antisynchrony that corresponds to temporally less concentrated bump activity states for each stored item.Conceptually, this is similar to time-division multiplexing recently proposed in a spiking neural network model [1,12] and can also be understood as a form of sparse coding.
We also find that negative correlations can improve the robustness of working memory by preventing the forgetting of a previously stored bump when a new stimulus arrives.To demonstrate this, we sequentially inject transient stimuli with durations of T = 0.1 s and various spatial separations and investigate the resulting dynamics of bump attractors.As shown in Figure 6, if the initial distance between two bumps is sufficiently large (0.6π), both remain persistent after the removal of the transient stimuli and reach a stable distance under either attractive (top panel) or repulsive (middle panel) interactions.If the initial distance between two bumps is sufficiently small (0.5π), the two bumps will merge into one bump (bottom panel).Next, we clamp the negative correlations to zero and observe the effect on the dynamics of sequentially memorized bump states (see the right column in Figure 6).If the initial distance between two bumps is sufficiently large (d = 0.8π, top panel), the second bump is maintained, while the previously stored bump is quenched.
If the initial distance between two bumps is sufficiently small (d = 0.6π, middle panel), the two bumps will annihilate each other.If the initial distance between two bumps is too small (d = 0.5π, bottom panel), the two bumps will merge into one bump, which is the same outcome as the case with negative correlations.These results suggest that the negative correlations help to prevent forgetting due to competition from similar memorized items If the initial distance between two bumps is sufficiently small (0.5π),the two bumps will merge into one bump (bottom panel).The right column shows the cases in which we clamp the negative correlations to zero.If the initial distance between two bumps is d = 0.8π, the second bump is maintained, while the previously stored bump is quenched (top panel).If the initial distance between two bumps is d = 0.6π, the two bumps will annihilate each other (middle panel).
If the initial distance between two bumps is d = 0.5π, the two bumps will merge into one bump (bottom panel).The system is initialized with zero mean and covariance, and two transient stimuli of duration T = 0.1 s (marked by the red rectangles) are exerted at two locations sequentially.The grayscale indicates mean firing rate (sp/s).

Discussion
In this work, we have revealed the emergence of synergistic neural population codes and their mechanistic origin from nonlinearly coupled, correlated neural fluctuations in a bump attractor network.Our model demonstrates that complex spatial correlation structures can emerge from a bump attractor network model, as shown in Figure 2, suggesting the need for closer experimental investigations of the pairwise correlations of neural population activity in the brain.As shown in Figure 3a, the model additionally shows that global neural circuit parameters (such as the external input mean) can drastically change the dynamical regimes of bump attractor states with distinct noise correlation patterns, which can be interpreted as corresponding to different cognitive states.Specifically, our model predicts that behavioral error should be larger when the noise covariance is parallel to the tangent space of the ring manifold than when the noise covariance is orthogonal to the tangent space of the ring manifold.Future experimental studies could investigate this relationship between the neural population codes and behavioral performance using simultaneous recordings of cortical neurons during cognitive tasks.Furthermore, we reveal correlation induced dynamical interactions between two bump states, which lead to a stable bound state of two bumps as characterized by a local minimum in the effective interaction energy between them as shown in Figure 4.As a result, the interaction between two bumps, which represent the memorized items, can be attractive or repulsive depending on their distance (i.e., difference in the encoded features).To test our theory, future experimental studies could systematically investigate how behavioral errors during working memory tasks depend on the difference between memorized items or between one memorized item and a distractor stimulus.We also show that the negative correlation between the neural activity of two bumps are originated from mutual inhibitions in the recurrent network, a property often overlooked in previous firing rate models of bump attractors.These results thus highlight the importance of incorporating correlated neural variability in modeling studies of cortical dynamics.
The validity of analyzing stochastic neuronal dynamics through approximations up to second-order moments is supported by both empirical evidence and theoretical analysis.Through examining multielectrode array recordings in the vertebrate retina [53],it is found that the strong collective behavior exhibited by a neural population is well accounted for by a model that captures the observed pairwise correlations but assumes no higher-order interactions, even in the case where the pairwise correlations are weak.A theoretical study [18] investigates the effect of higher-order cumulants to the dynamics in a stochastic binary neural network model.It is found that in a weakly correlated state, the contributions from higher-order cumulants to the network dynamics are significantly smaller than pairwise interactions.This suggests that the first two moments are fairly accurate at capturing the stochastic dynamics of the system.In summary, it can be argued that our MNN offers a minimalistic yet effective representation of fluctuating neural activity and is suitable for the purpose of modeling working memory in this work.
The MNN model presented in this study has a couple of limitations.First, the current version of MNN lacks the depiction of temporal statistics of neural activity, which may encode computationally useful information such as phase and time-delayed synchrony.Future studies are required to develop moment mappings that account for temporal statistics such as auto-and cross-covariances of neural spike trains [18].Second, in order to simulate the dynamics of the MNN, it is necessary to store the covariance matrix of the system, which entails a quadratic space complexity.Furthermore, multiplying the covariance matrix with the connection weights results in a cubic time complexity.As a consequence, simulating large-scale neural systems could require significant computational resources.However, this trade-off is worthwhile considering that high dimensional joint probability distribution of neural activity is computationally intractable.
In the context of ring attractor networks, the MNN based on the gaussian field approximation can be generalized in a number of ways.First, we can consider a two-dimensional spatial structure instead of just one to incorporate more complex two-dimensional patterns [25,47].A straightforward 2D extension of the model considered in this paper is included in Supplementary Information, where the firing covariability manifests as more complex patterns compared to the one-dimensional case.Second, this study considers stationary bump activity with correlated variability by using a model derived from the current-based leaky integrate-and-fire neurons without adaptation.Previous theoretical studies of bump attractors using firing rate models suggest that neural adaptations can lead to the formation of complex dynamical patterns such as propagating and rotating waves [14,25,48,49,55] and that short-term synaptic facilitation and depression can have a significant impact on the dynamics of bump diffusion [49,54].It is thus of great theoretical interest to further extend our model to incorporate neural adaptation, such as spike frequency adaptation, or synaptic depression, and investigate the influence of correlated neural fluctuations on the dynamics of the resulting wave patterns as well as its functional implications.Third, neural fluctuations may also cause bump states to drift away along the attractor manifold over time, resulting in further degradation of working memory.It has been shown that uncorrelated noise imposes a bound on the coding capability of bump attractor [11].The model presented in this paper may be extended to analyze the influence of correlated noise on this diffusive effect and its impact on working memory performance.
Although we have primarily focused on bump attractor networks in this paper, the general approach to modeling correlated neural variability using the MNN can be easily extended to other types of models.For instance, [43] proposed a parsimonious linear model to interpret the coexistence of dynamical coding and mnemonic coding in working memory.
The network structure of this model can be applied to our MNN to gain a deeper understanding of the specific role played by firing covaribility in dynamical coding and mnemonic coding in working memory.Another extension is to divide the neural population into excitatory and inhibitory ones, as consistent with Dale's principle [19], for analyzing the dynamics and computational roles of firing covaribility in balanced network [67] and in neural oscillations [45].Moreover, the MNN considered in this paper is derived from the simple leaky integrate-and-fire neuron model.The general framework of MNN can further incorporate more complex spiking neuron models with slow and fast synaptic dynamics, which are important in generating complex dynamics such as bursting activity [32].Finally, the modeling framework based on the MNN also opens up new opportunities for investigating the role of neural correlations in synaptic plasticity, learning, and attention [34].

Appendix 8.1 The moment neural network
To capture the nonlinear coupling of correlated neural activity, we employ a model known as the moment neural network derived from the following leaky integrate-and-fire neuron model [23,38] where V i is the membrane potential of the ith neuron, L is the leak conductance, I i (t) is the total synaptic current, w ij is the synaptic weight from neuron j to neuron i, and S j (t) is the spike train from pre-synaptic neurons.When V i reaches the firing threshold V th the neuron will release a spike which is transmitted to other connected neurons, and then V i is reset to the resting potential V res and enters a refractory period T ref .
The functions ϕ µ and ϕ σ together map the mean μ and variance σ2 of the input current I i (t) to that of the output spikes according to [23,38] ϕ µ : (μ, σ2 ) → µ, µ = 1 where T ref is the refractory period with integration bounds The constant parameters L, V res , and V th are identical to those in equation (7).The pair of Dawson-like functions g(x) and h(x) appearing in equations ( 8) and ( 9) are g The mapping ψ, which we refer to as the linear response coefficient, is equal to ψ(μ, σ2 ) = ∂µ ∂ μ and it is derived using a linear perturbation analysis around correlation coefficient ρij = 0 [38,51].This approximation is justified as pairwise correlations between neurons in the brain are typically weak.Throughout this paper, we set neuron parameters to be V th = 20 mV, V res = 0 mV, T ref = 5 ms, and τ = 1/L = 20 ms.An efficient numerical algorithm is used for implementing the moment mappings [47].

Network parameters
We construct a ring network by setting distance-dependent synaptic weights with short-range excitation and long-range inhibition as follows, where κ(x; d) = exp[ 1 d 2 (cos x − 1)], w E and w I control the strengths of excitatory and inhibitory synapses, and d E and d I control their spatial range, respectively.Throughout this paper, we set the synaptic weight parameters to be w E = 15, w I = 6, d E = 0.5, d I = 1, and N = 400.We fix σ 2 ext = 0.01 throughout this study and systematically vary µ ext .We use Euler's method for simulating equations ( 1) and ( 2) with a time step of ∆t = 0.5τ .

Linear Fisher information analysis
To quantify the amount of information content encoded by a bump attractor, we calculate the linear Fisher information I linear as follows [30].Given a bump state with mean firing rate µ(x|s) and firing covariability C(x, x ′ |s) centered at s, the linear Fisher information can be calculated where u i (s) = dµi(s) ds and ∆t is the spike count time window.Note that I linear (∆t) becomes approximately independent of s due to the translation invarianace when the population size is large.For calculating I linear (∆t), we apply centered finite difference to approximate u(s) and use Moore-Penrose pseudo-inverse for calculating C(s) −1 .observe the symmetry in the differential covariance, uu T , which has the following exact reflection symmetry where we divide u into two blocks Note that to simplify notations, we have assume that the population size N = 2n is even and that bump center s = 0, though this does not affect the conclusion of the analysis in the limit of large population size.This symmetry suggests that we can analyze the interaction between noise covariance and differential covariance by applying above spatial division to the noise covariance where C 11 , C 12 , C 21 , C 22 ∈ R n×n .We impose the following abstractions for state II, and for state III, for all i = j.
We first prove the non-saturation of the linear Fisher information for state II.Instead of directly calculating the linear Fisher information using Eq.(S31), we turn to construct a local linear decoder that estimate the bump location near s and show that the variance σ 2 of this estimate diminishes as N goes to infinity.We now construct a linear decoder as follows where µ is the mean firing rate when the bump center is located at s, r is the firing rate of the system.The goal is to derive a lower bound on the Fisher information based on this local linear decoder.The variance of this estimation ŝ(r) is Using the division in Eq. ( S3), we have Then, according to the symmetry condition in Eq. (S4), we obtain Therefore, we have For the synaptic weight defined by Eq. ( 10), the bump attractor converges to a finite spatial profile µ * (x; s), C * (x, x ′ ; s) in the limit of spatial continuum as N → ∞, which implies that where ∥•∥ 2 denotes the L 2 -norm and u * (x; s) = ∂µ * (x;s) ∂x , and that max i,j |C ij | converges to the finite limit C * max = max x,x ′ |C * (x, x ′ ; s)|.Note that ∥u * ∥ 2 is a constant independent of N .Therefore, we have which imposes a lower bound on the Fisher information according to the Cramér-Rao inequality, Therefore, the linear Fisher information I linear increases at least linearly as the population size N grows.In particular, the linear Fisher information does not saturate in state II.
In fact, the above conclusion holds for a wide class of the bump attractor states, i.e., the information does not saturate, as long as the mean firing rate µ and the firing variability C satisfy Eq. (S1) and Eq.(S4).The condition for the differential covariance [Eq.(S1)] is general, which is automatically satisfied if the bump is symmetrical.The condition for the noise covariance [Eq.( S4)] requires a mirror symmetry for the off-diagonal elements.As we have shown in our model, noise covariance structure qualitatively consistent with such mirror symmetry can naturally emerge from the recurrent dynamics of the bump attractor network.Intuitively, the mirror symmetry makes the N 2 − 4N terms in the summation ij (u) i C ij (u) j cancel out following the reflection anti-symmetry condition of the differential covariance [Eq.( S11)-( S12)].The remaining terms lead to a 1/N scaling in the variance var[ŝ(r)] when N is large enough.
For state III, we apply the same decoder [Eq.( S5)] as that used in state II and the spatial division [Eq.S3].Then, according to the symmetry condition in Eq. (S5), the term u T Cu in the estimate variance var[ŝ(r)] in Eq. (S7) becomes This does not clearly manifest whether or not the variance can diminish as N → +∞ due to the term 4u T 1 C 11 u 1 .Unfortunately, it is infeasible to determine a general statement about the scaling behavior of Fisher information under the condition in Eq. (S5).Nevertheless, we turn to prove the non-saturation of the linear Fisher information in a special case where a > |c| ≥ 0 are constants.Note that we c > 0, this noise covariance matrix C has the same sign as that in the differential covariance uu T .The linear Fisher information for this case can then be calculated.First, we have where O n ∈ R n×n is the matrix whose elements are all one.Thus, we have According to Eq. (S1), Eq. (S16) and by noting that where ∥u * ∥ 1 := π −π |u * (x; s)|dx is the L 1 norm of u * (x; s), we have Finally, by invoking to Hölder's inequality, we have and the equality holds if and only if Provided that µ(x; s) is even and twice differentiable, this leads to a spatially uniform solution of the mean firing rate, which is trivial.For general cases, the inequality Eq. ( S30) is strict so that the linear Fisher information increases linearly as the population size N grows.
In particular, if c > 0, this is an instance where the noise covariance C has the same sign as that of the differential covariance uu T without the saturation of information.Thus, it is not surprising to see that the linear Fisher information in state III does not saturate as shown in Fig. S1, in which case the both covariances have the same sign.Here, the bump center s = 0.

Contribution to Fisher information by individual neurons
Another property of interest regarding neural coding is the contribution to Fisher information by individual neurons in the network.We calculate the amount of linear Fisher information rate provided by each neuron located at x i as As shown in Fig. S2, we find that in state II, the neurons located at the boundary of the bump is more important for coding than other neurons, as consistent with previous models (Seung and Sompolinsky, 1993).However, in state III, the neurons near the center of the bump provide negative contributions to the linear Fisher information, which is previously unpredicted, suggesting that some neurons may be harmful for coding.

Bump interaction analysis
To investigate the mechanism of the interaction between two bumps near the stable bound state, we focus on the dynamics of the one of the bumps (free moving bump) and freeze that of the other bump (the clamped bump) which acts on the free bump as if it is an external feedforward input.To facilitate the analysis, we set the origin to be the center between the two bumps at equilibrium distance.Due to the translation symmetry of the model, the origin chosen does not affect the generality of the proceeding analysis.Denote the mean firing rate and the firing covariability of the two-bump configuration in Fig. 4(b) as µ = (µ i ) and C = (C ij ).We decompose the neural activity state based on the neurons' spatial positions x i such that where µ f i = µ i H(−x i ) and µ c i = µ i H(x i ) with H denoting the Heaviside step function.The Heaviside step function H(x) = 1, if x ≥ 0, and zero otherwise.We also decompose C accordingly as follows where This leads to the following modified equation for the mean firing rate of the free bump where the synaptic current is the synaptic current covariability is where kl )w jl .We refer to C fc and C cf as the inter-bump covariances, and C ff and C cc as the intra-bump covariance.In our model, we find that the inter-bump correlations are negative.We apply the following projection methods as inspired by (Burak and Fiete, 2012).Denote the activity of the free bump in the stable bound state as where s f is the peak location of the free bump.Assume that its shape does not change significantly with small spatial deviations z in the vicinity of the stable location s f such that The spatial deviation z also shifts the intra-bump covariance of the free bump and the inter-bump covariances while leaving the intra-bump covariance of the clamped bump unchanged, that is We next calculate the neutrally stable left eigenspace along the attractor manifold represented by the mean firing rate of the free moving bump as v = a dμ f dz , parameterized by the bump's spatial deviation z, where a = 1/⟨ dμ f dz , dµ f dz ⟩ is the normalization coefficient such that ⟨v, dµ f dz ⟩ = 1 ( dµ f dz is the right eigenvector of the system).Substitute above definition for µ f i into Eq.( 1), we get τ ∂µ i ∂t = −τ u ′ i dz dt on the left-hand side, and ϕ µ (μ i , Ci ) on the right-hand side.Then we numerically project the system state onto the space spanned by v, to obtain the equation of motion for the spatial deviation where the distance between the two bumps is d = s c − s f − z.The projection method is valid near the stable equilibrium at z = 0. We interpret b(d) as the influence of the clamped bump on the motion of the free moving bump along the attractor manifold when their distance is d.Next, we define the energy landscape induced by this effective interaction as The effective interactive energy is calculated in two cases, one with all covariances included and the other with the inter-bump covariance C fc and C cf fixed at zero.All derivatives are computed numerically with centered finite difference.

The origin of the negative correlations
To gain further theoretical understanding about the origin of negative correlations between two bumps, we consider the overall interactions between the neural populations of these two bumps, which can be roughly characterized by two neural masses The quantities (µ 1 , c 11 ) and (µ 2 , c 22 ) represent the overall population activity of two bumps, and c 12 = c 21 are the covariances between them.We can prove that under a mild condition of the weight matrix, two populations should be negatively correlated, i.e., c 12 < 0. Based on Eq. ( 2), the steady state covariance should satisfy

S4:
The dynamics of two bumps when intra-bump correlation are fixed to zero.The plots show the mean firing rate µ(x) (top panels), firing variability σ 2 (x) (middle panels) and the correlation coefficients ρ(x, x ′ ) (bottom panels) for varying external input mean µ ext .In all cases, the distance between bumps are d = π, the maximal distance permitted by the model.the state B when µ ext takes moderately large values.In this case, when the initial positions of two bumps are close enough, the system evolves into the state C, and remains in state B otherwise.
Figure S3(b) shows the height h of and the distance d between two bumps under for varying µ ext .It is observed that both h and d increase as µ ext increases for the two-bump case (black line), while the variation of h is non-monotonic for the merged bump case (blue line).Note that when µ ext ∈ [0.930, 0.938], the equilibrium distance between two bumps has two stable values, depending on their initial distance.
As shown in Fig. S4, we clamp the intra-bump correlations to zero, and find that the resulting distance between two bumps is always at d = π, the maximal distance permitted by the model, leading to reduced memory capacity.
Mnemonic and dynamic coding during working memory Murray et al. (2017) analyzed the temporal dynamics of the neural activity during visual working memory tasks and found that during the cue epoch (the presentation of the transient stimulus) a mnemonic coding space coexists with a dynamic coding space, and that during the delay epoch, the dynamic coding space decays while the mnemonic coding  persists.To test whether our model is consistent with these results, we investigate the dynamics of mean firing rate and firing covariability of the MNN during the cue and delay epochs following Murray et al. (2017).We record the bump dynamics during and after the presentation of a transient stimulus, as shown in Fig. S5(a)-(b).Note that the stimulus is presented in the same fashion as the first stimulus in 6.
To analyze the dynamics, we first calculate the population correlation between the bump activity states at different timepoints and those at two reference timepoints: a "sensory" state at the cue epoch and a "late memory" state at the end of the delay epoch.The time evolution of the population correlation is shown in Fig. S5(c).the mean firing rate of two states µ a = (µ a i ) N i=1 and µ b = (µ b i ) N i=1 , their population correlations are calculated as (following Eq.S1 in the paper by Murray et al. (2017)) where cov and var are taken over neurons in the population.In other words, the population correlation measures the similarity between the mean firing rate at two different timepoints.We find that the correlation with the sensory state peaks at the cue epoch and then decays during the delay epoch, while the correlation with the late memory state increases across time and peaks at the delay epoch, as largely consistent with the finding by Murray et al. (2017) (see Fig. 1C in their paper).One difference is that the correlation with the late memory state peaks at the start of the delay period in the MNN, whereas in the experimental data it keeps growing before peaking at the end of the delay period.The cause of this discrepancy is analyzed as follows.In this paper, we focus on the internal properties of a stationary bump state without considering its diffusion along the ring manifold due to fluctuating neural activity.This diffusion would decrease the population correlation at two timepoints as their separation increases.If we take into account the diffusion, the population correlation between a given timepoint and the late memory state should then continue to rise and reach its highest point at the end of the delay epoch, in agreement with the experimental observations.Murray et al. (2017) have also shown that the decoding accuracy based on the dynamic coding is significantly higher than that based on the mnemonic coding, and the accuracy based on both types of coding becomes similar during the delay epoch.The MNN provides a convenient way to analyzing the mnemonic and dynamic coding of neural population activity through calculating the linear Fisher information, which provides an upper bound to the accuracy obtainable by a linear decoder, without requiring explicit construction of such a decoder.At a timepoint t, the linear Fisher information of the dynamic coding is calculated as where u t (s) = dµt(s) ds represents the dynamic coding subspace and µ t (s) is the mean firing rate at timepoint t.Similarly, the linear Fisher information of the mnemonic coding is calculated as where u T (s) = dµ T (s) ds represents the mnemonic coding subspace and µ T is the mean firing rate at the end of the delay epoch.In both cases, C t (s) is the firing covariability at timepoint t.As shown in Fig. S5(d), the linear Fisher information of the dynamic coding subspace is significantly higher than that of the mnemonic coding subspace during the cue epoch, and they converge to a similar value toward the end of the delay epoch.These results are qualitatively consistent with the analysis on decoding accuracy based on experimental data (Murray et al., 2017).
Interestingly, the opposite result is found when we repeat the same analyses on State III in which the stationary bump state exhibits both positive and negative correlations.As shown in Fig. S6, the Fisher information based on the mnemonic coding subspace now has a higher peak than the dynamic coding subspace.Above analyses based on the MNN thus indicate that the structure of neural correlation can have significant impact on the properties of dynamic coding during working memory.Moreover, the consistency in dynamic coding between experimental observation and State II of our model predicts that the brain might adopt the type of correlation structure in State II as a way to optimize coding efficiency during the cue period.

Two dimensional case
We generalize our model with correlated neural variability to two spatial dimensions.Consider N = N x × N y neurons on a square lattice with periodic boundary conditions, where neuron i is assigned with two-dimensional spatial coordinates (x i , y i ) over [−π, π) 2 .The synaptic weight between neuron i and j is The two-dimensional neural circuit supports multiple localized bump states as shown in the left panel of Fig. S7(a).Similar to State II in Fig. 2 of the one-dimensional case, the spatial profile of the mean firing rate µ of each bump is unimodal, whereas that of the firing variability C shows a dent in the center and is higher around the rim of the bump.We find that two-dimensional bumps exhibit more intricate spatial correlation structures as shown in the right panels of Fig. S7(a).Each plot represents the two-dimensional map of correlation coefficients between all neurons in the circuit and a selected neuron (marked with black dot) located at a distance of 0.1π from the center of the bump at an angle of 0 • , 45 • or 180 • .For the example shown, two of the bumps exhibit positive intra-bump correlations and negative inter-bump correlations, analogous to State II in the one dimensional case, whereas one bump exhibits polarized correlations analogous to State III of the one dimensional case.Detailed comparisons with stochastic rate-based neural network model Wimmer et al. (2014) analyzed single neuron recordings from the prefrontal cortex of the monkey brain and reported a series experimental observations during visual working memory tasks.They then used a rate-based bump attractor model driven by an uncorrelated additive white noise for theoretical interpretation on these experimental observations.In the following, we present a comparison between their model and the MNN model presented in this paper and highlight how the MNN offers alternative explanations to some of these experimental observations.One of those experimental observations reported by Wimmer et al. (2014) is significant positive correlation between neuronal activity and behavioral response deviation during the delay period of working memory (see Fig. 4 in their paper).This rate-behavior correlation can be calculated in the MNN as follows.Let n be random spike count with mean µ∆t and covariance C∆t.Consider the decoder θ = Arg (2014) the rate-behavior correlation is primarily caused by the diffusion of bump attractor whereas in our case the correlation is due to the intrinsic fluctuation inside the bump (though the two mechanisms are not mutually exclusive).Wimmer et al. (2014) have also shown that the spike count Fano factor (FF) of neuronal response shows systematic variation across different cue locations and that the FF for accurate and inaccurate trials differs significantly at the end of the delay period (see Fig. 6d in their paper).In the rate-based model, it is assumed that each neuron fires independent Poisson spikes with its instantaneous rate specified by the model.Under this assumption, the FF should always be equal to one unless the bump moves randomly causing the instantaneous firing rate to change over time.In the MNN, on the other hand, variable FF is produced by the intrinsic recurrent dynamics of the model, which is present even when the bump remains stationary.As shown in Fig. S8 (b), the FF for both State II and State III in the MNN exhibits a U-shaped tuning profile with values ranging from zero and one.This stimulus-dependent variability reduction is qualitatively consistent to that found in cortical neurons (Ponce-Alvarez et al., 2013).Note that the range of FF shown in Fig. S8 (b) is smaller than in experiments (Wimmer et al., 2014).This discrepancy could potentially be resolved if we consider the combined effect of the intrinsic variability of the bump state and its diffusive movement.Wimmer et al. (2014) have further found that the noise correlation between pairs of neurons is negative for those pairs with dissimilar tuning when responding to a middle flank stimulus.This finding can be captured by State III of our MNN (left of Fig. 2c).For more convenient comparisons, we plot this result in the same format used by Wimmer et al. (2014)   (c) Noise correlation of neuronal pair with dissimilar preferred cue locations.We select the bump center at 0.45π for representing the in-flank case, 0.5π for representing the peak case, and 0.56π for representing the out-flank case.

Figure 2 :
Figure 2: Correlated variability emerging from a bump attractor network.(a) Mean firing rate µ(x), firing variability σ 2 (x) and the correlation coefficients ρ(x, x ′ ) of neural activity states for varying external input mean µ ext , including spatially uniform resting state (state I, µ ext = 0.920), bump state with positive correlations (state II, µ ext = 0.929), bump state with both positive and negative correlations (state III, µ ext = 0.948), and globally active state with spatially periodic correlations (state IV, µ ext = 0.986).(b) The height h (top panel) and width b (lower panel) for varying external input mean µ ext , revealing distinct phase transitions between different activity states shown in panel a.(c) The correlation coefficient ρ between two neurons with preferred features x 1 and x 2 as the bump location s varies in state II and state III.

Figure 3 :
Figure 3: Noise and differential covariance together influence coding accuracy.(a) Linear Fisher information rate I under varying external input mean µ ext .(b) Differential covariance and noise covariance of state II (µ ext = 0.925) and state III (µ ext = 0.931).(c)Projecting the system state onto the space spanned by the activity of two neurons whose preferred s differs by 0.2π.The black line shows the ring manifold represented by the the mean firing rates of two neurons.Orange ellipses represent the firing covariability when the bump center is located at different values represented by the red dots.Both state II (left panel) and state III (right panel) show similar structures in mean firing rate, but not the firing covariability.The major axis (blue lines) of the firing covariability in state II is roughly orthogonal to the ring manifold, whereas that in state III is roughly parallel to the ring manifold.(d) Linear Fisher information rate I grows linearly with the neural population size N .

Figure 4 :
Figure 4: Interactions of bump attractors mediated by negatively correlated fluctuations.(a) Mean firing rate µ, firing variability σ 2 and correlation coefficient ρ of two bumps in stable bound state.(b) Two bumps separated by a distance closer to or farther from the stable equilibrium at d * ≈ 0.63π experience repulsion or attraction, respectively.(c)The effective interaction energy E as a function of interbump distance d (solid line) exhibits a local minimum at d ≈ 0.65π (dot), which disappears when the negative correlation is removed (dashed line).We set µ ext = 0.922 in panels a-c.(d) Stable equilibrium distance d * between two bumps for varying external input mean µ ext (black lines), overlaid on the effective interaction energy E (heat map).

Figure 6 :
Figure 6: Negative correlations prevent forgetting of sequentially memorized items.The left column shows the cases in which we keep the negative correlations.When the initial distance between two bumps is 0.8π (top panel) or 0.6π (middle panel), both of them remain persistent after the removal of the transient stimuli and reach a stable distance.If the initial distance between two bumps is sufficiently small (0.5π),the two bumps will merge into one bump (bottom panel).The right column shows the cases in which we clamp the negative correlations to zero.If the initial distance between two bumps is d = 0.8π, the second bump is maintained, while the previously stored bump is quenched (top panel).If the initial distance between two bumps is d = 0.6π, the two bumps will annihilate each other (middle panel).If the initial distance between two bumps is d = 0.5π, the two bumps will merge into one bump (bottom panel).The system is initialized with zero mean and covariance, and two transient stimuli of duration T = 0.1 s (marked by the red rectangles) are exerted at two locations sequentially.The grayscale indicates mean firing rate (sp/s).

Figure S2 :
Figure S2: The contribution to the linear Fisher information by individual neurons.The mean firing rate (top panels) and the spatial distribution of the linear Fisher information rate I(x; s) (bottom panels) for state II (µ ext = 0.925) and III (µ ext = 0.932).Here, the bump center s = 0.

Figure S3 :
Figure S3: Additional phase transitions for two-bump solutions.(a) Mean firing rate µ(x), firing variability σ 2 (x) and the correlation coefficients ρ(x, x ′ ) of neural activity states for varying external input mean µ ext , including when the inter-bump correlations are negative (state A, µ ext = 0.940), when the inter-bump correlations contain both positive an negative values (state B, µ ext = 0.992), and when the two bumps partially overlap each other (state C, µ ext = 1.000).(b) The height h (top panel) and distance d of two bumps (bottom panel) for varying external input mean µ ext , revealing distinct phase transitions between different activity states shown in (a).

Figure S5 :
Figure S5: Dynamic coding of the working memory at the state II in Fig. 2 for the transient stimulus (a) The dynamics of mean firing rate µ(x) at the cue presentation (red rectangle) and delay epoch.(b) The dynamics of correlation coefficients ρ(x, x ′ ) at the cue epoch and delay (c) Population correlation between states and two reference states: sensory state (purple) and late memory state (orange).The sensory state is defined by the start of the cue epoch and the late memory state by the end of the delay epoch.The gray area represents the cue epoch.(d) Fisher information over time for the mnemonic (blue) and dynamic (red) coding subspaces.

Figure S6 :
Figure S6: coding of the working memory at the state III in Fig. 2 for the transient stimulus (a) The dynamics of mean firing rate µ(x) at the cue presentation (red rectangle) and delay epoch.(b) The dynamics of correlation coefficients ρ(x, x ′ ) at the cue epoch and delay epoch.(c) Population correlation between states and two reference states: sensory state (purple) and late memory state (orange).The sensory state is defined by the start of the cue epoch and the late memory state by the end of the delay epoch.The gray area represents the cue epoch.(d) Fisher information over time for the mnemonic (blue) and dynamic (red) coding subspaces.
Increasing µ ext leads to almost the same pattern in terms of mean firing rate µ and firing varibility σ 2 but symmetry breaking in the correlation patterns, similar to the transition from State II to State III in [Fig.2(a)].Both the intra-bump and inter-bump correlations may contain positive and negative values, depending on the relative locations of the neurons.

Figure S7 :
Figure S7: Bump attractors with correlated variability in two spatial dimensions.The displayed items are the two-dimensional maps of the mean firing rate µ, firing varibility σ 2 and correlation coefficients between all neurons and neurons at locations marked by the black dots.The external input mean is set to be µ ext = 0.945 in (a), µ ext = 0.950 in (b).

n
j exp(iθ j ), where θ j ∈ [−π, π) is j-th neuron's preferred direction.The behavioral deviation following the definition by Wimmer et al. (2014) can then be modeled as D j = α| θ − θ * |, where θ * denotes cue location and α = sgn(|θ * − θ j | − | θ − θ j |) indicates whether the behavioral response θ is closer to, or further away from, the neuron's preferred location θ j .Without loss of generality, we fix the cue location to be θ * = 0. Note the absolute value calculates distance on a circle.The rate-behavior correlation is then defined as c j = Corr[D j , n j ] (Wimmer et al., 2014), which is calculated by sampling n from the Gaussian distribution N (µ∆t, C∆t).As shown in Fig. S8 (a), the bump activity in the MNN exhibits positive rate-behavior correlation as consistent with the analysis on experimental data (Fig. 4c in that paper).Moreover, distinct patterns are found for State II and State III in our model.For State II, the rate-behavior correlation shows a single peak centered at a neuron's preferred cue location.In contrast, the rate-behavior correlation for State III shows two peaks next to a neuron's preferred cue location, where the correlation is zero.The latter is consistent with experimental findings by Wimmer et al. (2014) (see Fig. 4e in their paper).Note that in the stochastic rate-based model considered by Wimmer et al.
[Fig.S8 (c)].No negative correlation is found in State II.Note that the noise correlation in MNN is quantitatively closer to experimental data (Fig.7fin their paper) than in the rate-based model used byWimmer et al. (2014)  (Fig.S6gof their paper).

Figure S8 :
Figure S8: Rate-behavior correlation, spike count Fano factor and neural pairwise correlation in the MNN model.(a) Rate-behavior correlation c j as a function of cue location.Top row: State II; bottom row: State III.(b) Stimulus-dependent reduction of spike count Fano factor.Magenta lines represent the case with behavior response precisely matches the cue location and green lines trials with behavior response shifted from the cue location by 0.1π.(c)Noise correlation of neuronal pair with dissimilar preferred cue locations.We select the bump center at 0.45π for representing the in-flank case, 0.5π for representing the peak case, and 0.56π for representing the out-flank case.
). (c) Projecting the system state onto the space spanned by the activity of two neurons whose preferred s differs by 0.2π.The black line shows the ring manifold represented by the the mean firing rates of two neurons.Orange ellipses represent the firing covariability when the bump center is located at different values represented by the red dots.Both state II (left panel) and state III (right panel) show similar structures in mean firing rate, but not the firing covariability.The major axis (blue lines) of the firing covariability in state II is roughly orthogonal to the ring manifold, whereas that in state III is roughly parallel to the ring manifold.(d) Linear Fisher information rate I grows linearly with the neural population size N .