## Abstract

The way grid cells represent space in the rodent brain has been a striking discovery, with theoretical implications still unclear. Unlike hippocampal place cells, which are known to encode multiple, environment-dependent spatial maps, grid cells have been widely believed to encode space through a single low-dimensional manifold, in which coactivity relations between different neurons are preserved when the environment is changed. Does it have to be so? Here, we compute, using two alternative mathematical models, the storage capacity of a population of grid-like units, embedded in a continuous attractor neural network, for multiple spatial maps. We show that distinct representations of multiple environments can coexist, as existing models for grid cells have the potential to express several sets of hexagonal grid patterns, challenging the view of a universal grid map. This suggests that a population of grid cells can encode multiple noncongruent metric relationships, a feature that could in principle allow a grid-like code to represent environments with a variety of different geometries and possibly conceptual and cognitive spaces, which may be expected to entail such context-dependent metric relationships.

## 1 Introduction

Grid cells appear to comprise an essential component of the cognitive representation of space in rodents (Hafting, Fyhn, Molden, Moser, & Moser, 2005) and other species (e.g., bats; Yartsev, Witter, & Ulanovsky, 2011). Located in the medial entorhinal cortex, these neurons are selectively active when the animal is in certain positions of the environment, the so-called fields, at the vertices of a remarkably regular hexagonal lattice. A study of the activity of grid cells in multiple environments (Fyhn, Hafting, Treves, Moser, & Moser, 2007) has shown that while the grid expressed by each neuron varies across environments in its spatial phase and orientation, between neurons the coactivity relations are largely preserved, at least for those recorded nearby in the tissue, with the same tetrode. In other words, the grids of different cells undergo a coherent rigid movement when a new environment is explored, as illustrated schematically in Figures 1A and 1B (bottom row). The subsequent discovery of quasi-discrete “modules” (Stensola et al., 2012) indicates that these relations are maintained at the local network level, presumably by recurrent collateral connections among grid cells. This finding has led to the hypothesis that local ensembles of grid cells each comprise a single continuous attractor network, expressing a universal, two-dimensional map, which encodes the metric of space independent of the environmental context. There is a crucial difference with the context-dependent spatial representations provided by hippocampal place cells, which display “global remapping” (Leutgeb et al., 2005) even between very similar rooms, in particular in the CA3 field (Alme et al., 2014): cells that were silent acquire one or more place fields, others lose theirs, and the fields that seem to have been maintained typically are in a different location (see Figure 1, top row). Global remapping has motivated the conceptual model of multiple charts (Samsonovich & McNaughton, 1997), in contrast with early and later models of continuous attractor grid cell networks, which envisage a single chart (Fuhs & Touretzky, 2006; Burak & Fiete, 2009; Couey et al., 2013). The dominant overall view, then, holds that the hippocampus encodes multiple, uncorrelated, context-dependent cognitive maps, while the grid system provides metric information that is independent of the environment. Recent evidence of context-dependent distortions in the grid pattern has begun to question the view that the collective map expressed by a grid module is universal, that is, it applies to any environment. Stensola, Stensola, Moser, and Moser (2015) have shown that when rats explore large environments, a single grid can exhibit multiple orientations, likely due to anchoring effects to the closest wall, which in any case amount to distortions of the hexagonal pattern. These effects have been analyzed extensively in a more recent study (Hägglund, Mørreaunet, Moser, & Moser, 2019). Krupic et al. (Krupic, Bauza, Burton, Barry, & O'Keefe, 2015; Krupic, Bauza, Burton, & O'Keefe, 2018) have shown that the grid pattern deviates from perfect hexagonality, with both global and local distortions, in response to environmental features such as the geometry of the walls. Finally, a couple of recent studies (Boccara, Nardin, Stella, ONeill, & Csicsvari, 2019; Butler, Hardcastle, & Giocomo, 2019) have shown that the presence of salient features such as goals or rewards affects the entorhinal map, changing field locations and inducing remapping in other space-selective cells. These observations, moreover, refer solely to the position of the peaks of activity, that is, the place fields of each cell, and do not take into account the fact that they vary reliably in height, independently across peaks, from one environment to the other (Dunn, Wennberg, Huang, & Roudi, 2017). Should we still regard grid cells as a sort of stack of millimeter paper, providing a universal metric for space?

Recent studies conducted in both rodents and humans, moreover, suggest that regular grids may not measure only physical space. Aronov, Nevers, and Tank (2017) find that both place cells and grid cells, in rats, are involved in the representation of a nonspatial but continuous, one-dimensional variable, such as the frequency of a sound. An fMRI study by Constantinescu, O'Reilly, and Behrens (2016) shows a hexagonal modulation of the BOLD signal in human entorhinal cortex, and elsewhere, in a task that requires subjects to navigate the 2D space spanned by the varying leg and neck lengths of a drawing of a bird. The representation of abstract or conceptual spaces, which could in principle be topologically and geometrically complex, would require of the grid cell system a flexibility that can hardly be reconciled with the universal grid hypothesis.

In a most interesting study (Kanter et al., 2017), a subset of grid units were depolarized in transgenic mice, leading to what appears to be global remapping in the hippocampus. What is so striking is that the manipulation induces extensive changes, up and down, in the peak firing rates of the different fields of individual grid units but not in their position. This elaborates the observation in Fyhn et al. (2007) and suggests that what might be universal in the grid representation expressed by an ensemble of units, if anything, are the relative positions of the fields, whereas their peak firing rates are variable (see Figure 1C, bottom row). On the other hand, a strict hexagonal periodicity of the field positions of individual units is possible only in flat 2D environments. The adaptation model of grid formation (Kropff & Treves, 2008) predicts instead, on surfaces with constant positive or negative gaussian curvature and appropriate radius, the emergence of grids with, for example, pentagonal (Stella, Si, Kropff, & Treves, 2013) or heptagonal (Urdapilleta, Troiani, Stella, & Treves, 2015) symmetry. In all other cases, including ecologically plausible natural environments, nonflat surfaces have varying curvature, making strictly periodic grids dubious and rigid phase coherence most unlikely. But what happens to the universality of the grid in natural environments?

To address these issues, the aim of the work we present here is to answer a first fundamental question: Is it at all possible to conceive of multiple, hence nonuniversal, ideal grid representations expressed in the same local network when the animal is placed in distinct, even if flat, environments? In other words, would the storage capacity of a recurrent network of grid cells be above unity, so that multiple continuous attractors can coexist, encoded in the same synaptic efficacies? We pose this question within two alternative mathematical models, both accepting the idealized assumptions that underlie the universal map hypothesis, that is, of strict periodicity and equal peak rates, depicted in Figure 1D (bottom row), but allowing for several uncorrelated grid representations. Under these assumptions, we analyze an ensemble of grid cells as a continuous attractor neural network, extending the frameworks developed in Battaglia and Treves (1998) and Monasson and Rosay (2013, 2014) for the description of place cells. We emphasize that the storage capacity we are interested in quantifies the number of different, independent charts (or collective maps) that the network can store, and not the spatial resolution (which may be referred to as information capacity, i.e., the number of different positions that can be decoded from the ensemble activity), as considered, for example, in Hedrick and Zhang (2016) and Fiete, Burak, and Brookings (2008).

## 2 Complementary Network Models

### 2.1 Binary Units

### 2.2 Threshold-Linear Units

An energy function would look similar to the one in equation 2.3 but now expressed in terms of the continuous variables ${V}$. When $C<N-1$ and connections are not symmetric, the energy formalism does not apply, but we can still analyze the model (see below and in appendix B), and again we take global inhibition, which can now also act through a modulation of the common gain $g$ to depend on the average activity of the network and to be such as to optimize storage capacity.

## 3 Storage Capacity

Both models can store a single population map, as in the bottom panels of Figure 1, A and B, and the equations for such a map admit periodic bump solutions that reproduce the shape of the tile or kernel (as well as potentially other solutions, such as stripes, discussed later). We are interested, however, in their capacity to store several distinct maps, as in panels A and D of Figure 1, and in the possibility of calculating such storage capacity analytically, in the mean field approximation. The general strategy involves formulating and resolving a set of self-consistent equations relating the activity of the units in the network. When the model admits an energy function, these are the saddle point equations derived from the computation of the free energy of the system with the replica trick, which allows taking into account the statistics of the field centers in each environment. Without an energy function (e.g., when the connections are sparse and not symmetric), equivalent equations can be derived through so-called self-consistent signal-to-noise analysis (Roudi & Treves, 2004). The solutions to these equations, which describe the activity in one map, disappear sharply at a critical value $\alpha c$ of the storage load $\alpha =(p/C)$, which measures the ratio of the number of maps to the number of connections to each unit. $\alpha c$ then gives the maximum number of maps that the network can store and retrieve or express, normalized by the connectivity. Crucially, we have developed a novel method to assess whether below $\alpha c$, these solutions are indeed stable and prevail on others (see Figures 6 and 7). The details of these methods, which build on Monasson and Rosay (2013, 2014) for the binary model and on Treves (1990a) and Battaglia and Treves (1998) for the rate model, can be found in appendix B. We focus, in the calculation of the storage capacity, on so-called bump states, in which activity is localized along each of the two dimensions of the elementary tile (anywhere on the tile, given the translation invariance of the interaction kernel). Other solutions, however, do exist, as discussed in section 4.

### 3.1 Binary Units

We calculate, from the saddle point equations, the storage capacity $\alpha c(w,f)$ as the maximal value of the load $\alpha $ for which a bump-like solution to equation 3.1 exists. Then, for a given value of $w$, we find the coding level $f*(w)$ that maximizes the storage capacity. Over a broad range $0\u2264w\u22640.5$, the optimal $f*$ turns out to be approximately half the value of $w$ (see Figure 3). That the optimal value for the coding level is proportional to $w$ can be understood intuitively by considering the spatial profile of the signal $\mu (x\u2192)$: if too few cells are allowed to be active, the connections to the cells that are forced to be silent, within the connectivity range of the active cells, will be frustrated. But if too many cells are active, those outside the connectivity range will contribute more to the noise than to the signal. This optimal storage capacity is plotted in Figure 4 for the square and hexagonal grids as a function of $w$. At low $w$, the two values are similar, but when $w$ increases, their trends diverge, leading to a substantially higher-capacity value in the hexagonal case, of order $10-2$ for $w\u22430.5$. This value would definitely allow, in a real cortical network with order thousands (or tens of thousands) of neurons, the storage and retrieval of multiple independent grid maps. Again, considering the spatial profiles of the signal $\mu (x\u2192)$ allows gaining intuition about this divergence. At very low $w$, that is, short-range interactions, what happens in other tiles can be neglected, and the two grids behave similarly. When the range is wider, the location of the fields in the immediately neighboring tiles starts to be relevant. In the square case, there are four first neighbors that contribute to exciting silent neurons in between the fields. For a hexagonal arrangement of the fields, there are six neighboring tiles that each contribute relatively less excitation in between fields. Intuitively, this last geometrical arrangement makes the structure more rigid and reduces the influence of the noise due to the storage of other charts.

### 3.2 Threshold-Linear Units

### 3.3 Sparsity and Noise Reverberation

The binary model shows that the difference in capacity between hexagonal and square grids results from the effective interactions among the fields in different tiles, as it emerges only with wide kernels and dense coding. When both are sparse, hexagonal and square grids are roughly equivalent. The $w\u21920$ limit can be worked out analytically and $\alpha c\u21920$ in both cases, but only after having reached a maximum around $\alpha c\u22430.02$ for quite sparse codes, $w\u22430.03$ and $f\u22430.015$. Sparse coding is known to suppress noise reverberation (leading to small $\psi $), but remarkably, this relatively large capacity is approximately preserved for hexagonal grids with dense coding, $w\u22430.5$ and $f\u22430.25$, illustrating the efficiency with which this compact arrangement minimizes interference.

## 4 Band Solutions

In the previous analysis, we focused on bump states, in which activity is localized in a grid pattern. Another possibility are partially localized solutions: band states, where activity is localized along a single direction in the elementary tile and extends along a stripe in the orthogonal direction.

In the binary model, these band states can be oriented along an edge of the tile (see Figures 6b and 6f) or along the diagonal of the tile (see Figures 6c and 6g) or in a discrete multiplicity of other orientations. Individual units fire along stripes of the same orientation, with relative offsets. We can study the properties of some of these band states in the $w-f$ parameter space, to find that they are particularly favored in regions of high coding level. Given the connectivity range set by $w$, bump states are the global minima of the free energy for low $f$, and one of the band states (which one depends on $\theta $) becomes the minimum for higher $f$. For example, for both square and hexagonal grids, at small connectivity range $w=0.1$, band states have lower free energy than the bump state for coding levels beyond 0.35, while for the larger connectivity range $w=0.5$, this happens for coding levels beyond 0.4. This is intuitive, since for sufficiently large $f$, a band state has a shorter boundary between active and quiescent units than a bump, and it is the length of the boundary that raises the free energy above its minimum value. Moreover, we can study how these different states are separated by computing the size of the free energy barrier to cross to go from one state to another. The method to compute this barrier is sketched in Figure 7c and explained in more detail in appendix D. In Figure 7d, we show the size of the barriers to cross to go from a bump state to band states. On the range of coding levels where these two kinds of states coexist, the bump state is always more robust for a hexagonal grid compared to a square grid, as shown by the higher barrier size in a hexagonal grid (blue curve, from bump to band edge or band diagonal state) compared to square grid (full red curve, from bump to band edge state).

Different behavior is observed in the threshold-linear network. In this case, the rigid symmetry imposed by the three-cosine interaction kernel makes the bump pattern a global minimum. In the two-cosine case instead, band state are stable solutions, corresponding to a macroscopic overlap with only one of the two cosines. We can describe bands also with a 1D interaction kernel, with a single cosine, and compare the storage capacity for band patterns with the one for square and hexagonal grids. In Figure 5, the green line shows the capacity for band patterns as a function of the connectivity. For a densely connected network, it is above that for square grids, and the barrier method indicates that these are only marginally stable to collapsing into stripes. This is in line with the reduction of the capacity from one to two dimensions shown in Battaglia and Treves (1998). Interestingly, as soon as the emergence of a third cosine is allowed, the capacity is instead enhanced, surpassing the 1D kernel except for very low values of connectivity density.

## 5 Discussion

Our results indicate that given appropriate conditions, a neural population with recurrent connectivity can effectively store and retrieve many hexagonally periodic continuous attractors. This possibility suggests that a regular grid code may not be restricted to represent only physical space; it could also express continuous abstract relations between arbitrary features, at least if they can be mapped to a two-dimensional space. This would, however, require a system flexible enough to store and retrieve uncorrelated grid representations. Our results show that this flexibility does not need, in principle, separate neural populations for separate representations, but can be achieved by a single local ensemble, provided it can effectively learn orthogonal representations.

Given the recent observation of nonspatial coding—a consistently tuned response to the position along a 1D nonspatial variable, sound frequency, during a sound manipulation task—by neurons that qualify as grid cells in a 2D spatial exploration task (Aronov et al., 2017), it would be interesting to know whether a similar selectivity can be observed for a 2D nonspatial variable, as suggested by indirect observations of hexagonal modulation (Constantinescu et al., 2016).

Several important questions are left open for future investigation. First, if global remapping is possible within a grid cell population, why has it not been observed experimentally? Possibly a remapping capacity of grid cells may have been hidden by the fact that multiple mappings were studied only in simple, empty, flat environments—and then they turned out to be the same, modulo translations (Fyhn et al., 2007). The hypothesis of a universal grid, which shifts without deformation across an environment and from one environment to the other, faces severe difficulties as soon as curvature is taken into consideration. In curved environments, rigid translations are not possible, and the geodesic transformations that partially substitute for them do not leave field-to-field relations unchanged, making a universal grid a priori impossible. Nevertheless, natural environments show a wide range of both positive and negative curvature, which does not seem to pose any problem to the navigational skills of rodents or other species. It is then conceivable that the apparent universality of the grid pattern comes from the experimental restriction to flat environments, which all belong to the same, rather special, class of 2D spaces with zero curvature, and that a richer grid behavior is required in order to code for position in more general spaces.

The emergence of grid representations in curved environments has been investigated with a model based on single cell adaptation (Stella et al., 2013; Urdapilleta et al., 2015), which illustrates the emergence of different regular patterns for distinct ranges of curvature. Estimating the storage capacity of recurrent networks expressing curved grids, however, poses some challenges. Since shifting the grid pattern along a curved surface moves individual fields by a different amount, the relationships between grid units cannot be reduced to relationships between a single pair of their fields. Long-range translational coherence becomes impossible. Curved grids can be only partially coherent, and whether this partial coherence is sufficient to build stable attractors is an open problem (Stella et al., 2019).

A second open problem is the ability of a network encoding multiple charts to support path integration, since the noise introduced by other charts is likely to introduce discontinuities in the dynamics shifting the activity bump, affecting the accuracy of the integrator. It has recently been suggested (Mosheiff & Burak, 2019) that interactions between different grid modules (each encoding a single chart or coherent ensemble of maps) can enhance the robustness to noise during path integration. The possibility that this result generalizes to modules encoding multiple charts, and the analysis of the capacity deriving from interactions between modules, is beyond the scope of this letter but deserves future investigation.

Finally, a third issue concerns the learning dynamics that sculpts the grid attractors. What is the mechanism that leads to the attractors of the recurrent network? Does a single grid dominate it, in the case of flat environments? Can self-organization be unleashed by the interplay between the neural populations of medical entorhinal cortex (mEC), including nongrid units, and hippocampal place cells, aided by the dentate gyrus (Treves, Tashiro, Witter, & Moser, 2008)? Including the hippocampus may be needed also to understand the distortion of the grid pattern, reported in several experimental studies (Stensola et al., 2012; Stensola et al., 2015; Krupic et al., 2015), that disrupting long-range order also weakens coherence. At the system level, a finite storage capacity for the grid cell network implies the possibility that mEC, or any other area in the brain (Constantinescu et al., 2016) that is observed to include grid-like units, can serve context memory. This would turn upside down the widely shared notion that memory for the specific spatial features of each environment is available only downstream, in the hippocampus, and conceptually reunites mEC with other regions of the mammalian temporal lobe, known to be dedicated to their own flavor of memory function (Eichenbaum, 2000). Moreover, the possibility of multiple uncorrelated continuous attractors in flat environments, combined with the discovery of transitions between (highly correlated) states in which the grid is the same but the peak firing rate of each field is different (Kanter et al., 2017), and with a new understanding of the disorder and frustration inherently associated with the grid representation of the curved environment (Stella et al., 2019), puts to rest the rigid order that appeared as the most salient characteristic of the newly discovered grid cells. It suggests instead a sort of spin glass at intermediate temperature—that in order to code densely and efficiently for position on (many) continuous manifolds, grid cells have to be equipped with the flexibility and the ability to compromise characteristic of a self-organized disordered system.

## Appendix A: Mean Field Equations: Binary Model

In the text, we focus on the limit of vanishing stochastic noise $\beta \u2192\u221e$, and the term $\beta (q-f)$, which remains finite in such limit, can be identified with the parameter $\psi $ of the threshold-linear model, quantifying the reverberation through the loops of the network of the quenched noise, which is due to the interference of the other maps.

## Appendix B: Mean Field Equations: Threshold-Linear Model

As the storage load $\alpha =p/C$ increases, this closed curve shrinks and eventually disappears. The value $\alpha =\alpha c$ at which the curve vanishes marks a phase transition: for $\alpha >\alpha c$, retrieval solutions do not exist. The storage capacity $\alpha c$ can therefore be calculated by finding the vanishing point of $E1=0$, and in this way, one automatically selects the optimal value of the gain $g$ as the one for which the curve $E2=0$ intersects the single point $E1=0$.

## Appendix C: Finite Connectivity and Noise Reverberation

## Appendix D: Free-Energy Barriers in the Binary Model

Free-energy values for the different metastable states are calculated using equation A.4 after order parameters have been computed by solving the saddle-point equations. These equations are solved iteratively, starting from an initial condition for order parameters and iterating the values of the order parameters until convergence to fixed values. The free-energy values of the different metastable states are obtained by initializing $\rho (x\u2192)$ as $\rho Bp(x\u2192)$ for bump states (see Figures 6a and 6e) and $\rho BaE(x\u2192)$ for band edge states (see Figures 6b and 6f) or $\rho BaD(x\u2192)$ for band diagonal states (see Figures 6g and 6f). In order to estimate the size of the barrier that must be jumped over in order to go from one state $X$ to another state $Y$, we proceed as follows. The activity profile is initialized as $\rho k=0,z(x\u2192)=z\rho X(x\u2192)+(1-z)\rho X(x\u2192)$, with $z$ chosen such that $\rho k\u2192+\u221e,z(x\u2192)=\rho X(x\u2192)$ and $\rho k\u2192+\u221e,z-\epsilon (x\u2192)=\rho Y(x\u2192)$ for $\epsilon \u226az$. When solving equations from such an initial condition, the network state goes close to a saddle point lying at the boundary between the two basins of attraction associated with states $X$ and $Y$, before sliding into state $X$, as shown in Figure 7c. The size of the barrier is then given by the difference between the free energy of the saddle point and that of the metastable state $X$.

## Acknowledgments

This work was supported by Human Frontier grant RGP0057/2016.

## References

*Grid cells show field-to-field variability and this explains the aperiodic response of inhibitory interneurons*

*Partial coherence and frustration in self-organizing spherical grids*

## Author notes

D.S. and A.D. contributed equally to this work.