## Abstract

Place cells in the hippocampus (HC) are active when an animal visits a certain location (referred to as a place field) within an environment. Grid cells in the medial entorhinal cortex (MEC) respond at multiple locations, with firing fields that form a periodic and hexagonal tiling of the environment. The joint activity of grid and place cell populations, as a function of location, forms a neural code for space. In this article, we develop an understanding of the relationships between coding theoretically relevant properties of the combined activity of these populations and how these properties limit the robustness of this representation to noise-induced interference. These relationships are revisited by measuring the performances of biologically realizable algorithms implemented by networks of place and grid cell populations, as well as constraint neurons, which perform denoising operations. Contributions of this work include the investigation of coding theoretic limitations of the mammalian neural code for location and how communication between grid and place cell networks may improve the accuracy of each population's representation. Simulations demonstrate that denoising mechanisms analyzed here can significantly improve the fidelity of this neural representation of space. Furthermore, patterns observed in connectivity of each population of simulated cells predict that anti-Hebbian learning drives decreases in inter-HC-MEC connectivity along the dorsoventral axis.

## 1 Introduction

Place cells are a class of spatially modulated neurons with an approximately bivariate gaussian tuning curve centered on a particular location in the environment and have been identified in the hippocampus (O'Keefe & Dostrovsky, 1971; O'Keefe, 1976; Ekstrom et al., 2003). Grid cells are spatially modulated neurons with firing fields that form a periodic and hexagonal tiling of the environment and are found in the entorhinal cortex (EC) of rats, mice, bats, and humans (Hafting, Fyhn, Molden, Moser, & Moser, 2005; Fyhn, Hafting, Witter, Moser, & Moser, 2008; Yartsev, Witter, & Ulanovsky, 2011; Doeller, Barry, & Burgess, 2010; Jacobs et al., 2013). Grid cells are clustered in discrete modules wherein cells share grid scale (Stensola et al., 2012). Anatomically, both cell types share a dorsoventral organization, with cells possessing wider receptive fields distributed toward the ventral end (Strange, Witter, Lein, & Moser, 2014; Stensola et al., 2012). It is known that the rat grid cell network requires communication from the hippocampus to maintain grid-like activity (Bonnevie et al., 2013) and that a significant improvement in accuracy of the rodent place cell representation is tightly correlated with the emergence of the grid cell network (Muessig, Hauser, Wills, & Cacucci, 2015). However, the mechanisms by which these networks communicate and how each may bolster the other's accuracy are unknown. Objectives of this work include the investigation of coding theoretic limitations of the mammalian neural code for location and how communication between grid and place cell networks may improve the accuracy of each population's representation.

Associative memories are a class of biologically implementable content addressable memory consisting of networks of neurons, a learning rule, and, in some instances, a separate recall process (Hopfield, 1982; Amit & Treves, 1989). This means that they can be exploited to stabilize the states of their constituent neurons to match a previously memorized network state if enough of the network already lies in this state. The information capacity of the simplest of these constructions is quite limited: $n2logn$ bits for a network of $n$ binary neurons (McEliece, Posner, Rodemich, & Venkatesh, 1987). However, recent advances by Salavati et al. take advantage of sparse neural coding and nonbinary neurons to design an associative memory with information storage capacity exponential in the number of neurons (Salavati, Kumar, & Shokrollahi, 2014). Sparse connectivity confers the memory network with other performance improvements: infrequent spiking implies reduced energy costs and faster convergence to a stable state.

In communications, this principle is leveraged by low-density parity check codes (LDPC), a class of linear block code whose power (in coding and decoding complexity) depends on the sparsity of the code's parity check matrix. Commonly, denoising a LDPC code involves iteratively passing messages along edges of a bipartite graph consisting of a collection of nodes that stores and updates an estimate of the originally transmitted word connected to a collection of nodes that computes the code's parity check equations (Chen & Fossorier, 2002; Declercq & Fossorier, 2007). Recent developments in the intersection of coding theory and machine learning demonstrate that neural networks can learn an approximation of an LDPC code's parity structure, and by executing belief propagation algorithms, they can recover memorized patterns in the presence of noise (Salavati et al., 2014).

Nature provides myriad circumstances in which many neural computations (e.g., object recognition, acoustic source localization, and self-localization) must be executed robustly in the presence of neural noise if the organism is to survive. We propose a denoising mechanism for populations of grid and place cells in the form of the associative memories described in Salavati et al. (2014), Karbasi, Salavati, Shokrollahi, & Varshney (2014), and Karbasi, Salavati, & Shokrollahi (2013), which takes advantage of coding theoretic properties of these populations to ameliorate the negative impacts of noise. We observe that after learning, average connectivity between place cells and grid modules decreases with increasing place field size for each module. We demonstrate that the effectiveness of the proposed denoising algorithm relies on the biological organization of grid cells into discrete modules. Additional contributions of this work include the coding model and denoising systems themselves as a framework in which to characterize limits on the fidelity of cooperating neural codes subject to noise, for physical position or other variables such as the auditory code studied in Aronov, Nevers, and Tank (2017), and improved clarity about how parameterization of grid and place cell populations affects these fundamental information and coding theoretic limits.

Redundancy in receptive field (RF) population codes is known to confer improvements in decoding accuracy when a small tolerance to error is introduced (expressed in this case in the stimulus space to which we decode; Curto, Itskov, Morrison, Roth, & Walker, 2013). To our knowledge, we are the first to investigate coding theoretic impacts of redundancy in grid cell populations. We study the impact of this redundancy on decoding accuracy by comparing denoising and decoding performance across codes of varying redundancies. We demonstrate that after denoising, a maximum likelihood (ML) estimator reliably decodes position from population activity with small position estimation error in the presence of bounded noise. Overall, our work shows that the biological organization of grid cells into modules may be necessary for optimal self-localization.

This article is organized as follows. In section 1, we introduce a few key concepts and present the main results. Section 2 introduces the theoretical framework on which our model is built, describing code construction, denoising network, learning algorithms, and denoising algorithms in sections 2.1, 2.2, 2.3, and 2.4, respectively. Section 3.1 presents results of all coding theoretic analysis and experimentation. Section 3.2 annotates results of the learning algorithms. Section 3.3 describes outcomes of performance tests of the denoising algorithms. Section 4 consists of a discussion of these results, their implications and limitations, and a physiologically testable hypothesis they inform.

## 2 Theoretical Framework

### 2.1 A Hybrid Code

#### 2.1.1 A Hybrid Codebook

$C$ code words, of length $N=P+\u2211m=1MJm$, are generated by choosing locations from the vertices of a square lattice imposed on the plane, with unit area equal to $(\Delta L)2$ and total area equal to $L2$. $C$ is assembled by placing these code words in its rows and represents the states of the grid and place cells when stimulated with these positions. The mapping that forms this code is illustrated in Figure 1.

### 2.2 Denoising Network

Two high-capacity associative memory designs are considered to test the hybrid code's resilience to noise. In each case, the memory network is a bipartite graph consisting of $N$ pattern neurons (i.e., grid and place cells) and $nc$ constraint neurons. In the unclustered design, all constraint neurons are connected to a random set of pattern neurons. In the clustered configuration, the constraint neurons were split into $M$ distinct clusters of $n$ constraint neurons per cluster, with each cluster connected to a distinct grid module. Each cluster's constraint neurons were connected randomly to pattern neurons, chosen from a set consisting of every grid cell in the corresponding module and every place cell.

We also consider a foil to this systematic clustering architecture organized by grid modules: grid and place cells are randomly assigned to clusters. Figures 2a and 2b depict the general connectivity structure of the unclustered and clustered designs, respectively. In both the clustered and unclustered configurations, a neurally plausible modified version of Oja's subspace learning rule was applied to learn the code, that is, a sparse connectivity matrix is found such that the weights of connections from constraint neurons to pattern neurons lie orthogonal to the code space (the space spanned by $C$; Oja & Kohonen, 1988). This way, constraint neuron connectivity converges to the parity structure of the code and may be used in denoising operations.

### 2.3 Code Construction via Subspace Learning

Before we can use the denoising system to correct corrupted code word, it must learn (i.e., adapt its weights for) the hybrid code. This process is complete when the constraint neurons may be read to determine if the states of the pattern neurons map to a valid code word. Formally, this amounts to finding a connectivity matrix, $W$ ($Wi,j$ is the synaptic weight between constraint neuron $i$ and pattern neuron $j$), whose rows are approximately perpendicular to the code space. A procedure to procure such a matrix is outlined in Oja and Kohonen (1988) and improved in Salavati et al. (2014). Note here that this learning process is not a model for the development of either grid or place cells' apparent receptive fields nor their remapping, as in Monaco & Abbott (2011). These algorithms begin with a random set of vectors, and for each, they seek a nearby vector orthogonal to $C$ (i.e., a vector onto which each element of $C$ has minimal projection). We implement this in algorithm 1 (a derivation of this algorithm is in appendix B. In the clustered design, algorithm 1 is applied to each cluster's local connectivity matrix. Note that here, all arithmetic on the synaptic weights, $Wi,j$, is performed in $R$, while arithmetic on states of neurons (i.e., their firing rates) is quantized to the nearest integer in $[0,Q-1]$. The maximum firing rate, $fmax=Q-1$, is identical for all neurons. With each update, $w\u2190w-\alpha t(y(x-yw\u2225w\u22252)+\eta \Gamma (w,\theta ))$, where $\theta $ is a sparsity threshold, $\eta $ is a penalty coefficient, $y=xTw$ is the scalar projection of $x$ onto $w$, and $\alpha t$ is the learning rate at iteration $t$. $\Gamma $ is a sparsity-enforcing function, approximating the gradient of a penalty function, $g(w)=\u2211k=1mtanh(\sigma wk2)$, which, for appropriate choices of $\sigma $, penalizes nonsparse solutions early in the learning procedure (Salavati et al., 2014).

In algorithm 1, line 12 terminates learning of the current weight, $w$, if the sum of the projections of $w$ on each pattern is no more than $\epsilon $ away from zero, that is, if the current weight vector is approximately orthogonal to the code space. Lines 17 to 19 perform a thresholding operation that maps to zero any weight sufficiently small in magnitude. This is primarily to suppress numerical errors and promote consistency, as in line 11, we use $\epsilon $ as a small, positive constant. Note that since the weights processed on each iteration are independent of those in other iterations, this algorithm can be readily parallelized so that each constraint neuron learns its weights simultaneously.

### 2.4 Denoising and Decoding

We implemented a bit flipping style neural denoising process, which we applied to both the clustered and unclustered denoising networks. For all configurations (clustered and unclustered and for a fixed maximum number of denoising iterations, the bit flipping algorithm performs no worse than winner-take-all. Moreover, since it requires only the additional implementation of parallel thresholding operations for each pattern neuron, a biological realization of their inclusions is no less plausible. The goal of this algorithm is to recover the correct activity pattern, $x$, which has been corrupted by noise and, as such, is currently (and errantly) represented by a noisy version, $xn=x+n$, where $n$ is this noise pattern. Since each weight vector is nearly perpendicular to every pattern, for a matrix of weights, $W$, $xnW'$ reveals inconsistencies in $xn$, which the denoising algorithm seeks to correct in the feedback stage.^{1} In denoising, feedback weights from constraint neurons to pattern neurons are taken to be equal to the corresponding feedforward weight (i.e., synaptic connectivity is symmetric). The clustered denoising process begins with algorithm 3, in which each cluster attempts to detect errant pattern neurons. If no errors are detected, the process is complete. Otherwise, algorithm 2 is invoked for each cluster that detected errant neurons. This and other denoising processes are discussed in greater detail in Karbasi et al. (2013) and Salavati et al. (2014). Note that this denoising mechanism differs from error correction methods presented in Fiete, Burak, and Brookings (2008) and Stemmler, Mathis, and Herz (2015) in that information contributed by place cells reaches grid cells only through constraint neurons, and place information contributed by grid cells at module $i$ reaches other modules only through constraint neurons if connectivity allows.

In order to quantify the information content of the population, we estimated the location encoded by the population using a maximum likelihood decoder in four different schemes. Joint hybrid decoding utilizes information from all cells. Grid- (resp. place-) only decoding utilizes information from only grid (resp. place) cells. Grid decoding conditioned on place response performs decoding using only information provided by the grid cells; however, the only candidate locations considered for the estimate are those that are not impossible given the place cell activity.

## 3 Results

### 3.1 Coding Theoretic Results

We now endeavor to disentangle the connections between grid and place cell parameter choices and coding theoretically relevant dependent variables and understand these links. The results presented here motivate the questions answered in section 3.3, in which we investigate how the coding parameters studied here limit fidelity and the error correction capability of the corresponding representation of space. We begin our investigation of coding theoretic properties of the hybrid code by defining a measure of redundancy of grid cell population response: $\mu p$. More precisely, we define $\mu p$, a hybrid code's spatial phase multiplicity, as the number of grid cells with the same phase in the same module (e.g., if $\mu p=5$, in a module with 20 grid cells, there must be four unique spatial phases). This replication of grid cell phases can be considered as a repetition code in the activity of the grid cell population. Wennberg (2015) revealed that there may be a highly nonuniform distribution of phases among grid cells. Considering replication of grid cells (i.e., modules consisting of multiple grid cells of the same phase) allows us to investigate coding theoretic repercussions of this phenomenon. Inspired by Mosheiff, Agmon, Moriel, and Burak (2017) for each of these regimes, we consider two distributions of grid cells to modules: uniform and nonuniform. Mosheiff et al. (2017) find that choosing $Jm\u221d1\lambda m-1$ produces a more efficient representation of space. When modeling the nonuniform allocation of grid cells to modules, we chose $Jm=\u230aJ\lambda m-1\u230b$, since the scale of module $m$ is defined as $\lambda m=\lambda 1(\lambda )m-1$. Neural recordings show that the smallest scale is $\lambda 1\u224840$ cm (the value used here; Stensola et al., 2012).

We construct a codebook matrix, $C\u0332$, by placing elements of $C$ in its rows. We computed normalized rank of the code, $R=rank(C\u0332)N\u2208[0,1]$ as a function of the grid scaling ratio. Normalized rank is an indicator of a code's density, expressed as the fraction of possible dimensions of the code space occupied by a particular code. $R$ is an important feature to consider since a code's dimensionality determines the dimensionality of its null space, the object that is learned by the denoising network. As discussed in Salavati et al. (2014), if we suppose that $C\u2282Rn$ and dim$(C)=k<n$, then there are $n-k$ mutually orthogonal vectors that are also orthogonal to our code space (e.g., any basis for the null space of the code), each representing one valid constraint equation. Thus, rank provides a fundamental limit on the number of unique effective constraint nodes the denoising network may learn.

The grid cell code is known to be dense (Fiete et al., 2008). This is especially pronounced when all orientations and phases are chosen randomly (uniformly from $[0,2\pi ]$ and $[0,L]\xd7[0,L]$, respectively), where for all choices of other parameters, the hybrid code achieves full rank at a low rate. That is, the experimentally observed properties of the grid cell code described in Stensola et al. (2012) produce a measurable decrease in rank compared to typical ranks observed when all orientations and phases are chosen randomly.

Place cell activity forms a relatively sparse code (for enough cells and a sufficiently large environment); thus, combining populations of grid and place cells realizes codes that are sparser than the grid cell component of the code. When $\mu p=1$, a hybrid code with no place cells achieves the largest normalized rank. Since place cells communicate redundant information, their inclusion also reduces rank, which is precisely the trend observed in Figure 3. However, this appears to reverse when $\mu p>1$ for a sufficiently small number of participating place cells. This occurs because rendering grid cells redundant by increasing phase multiplicity lowers the rank of the grid-only component of the code. Consequently, including place cells increases rank, until the information contributed by the place cells reaches its maximum, at which point the inclusion of additional place cells only lowers rank. Error bars (measuring SEM) are included due to the stochastic nature of instantiating certain parameters (e.g., $\xi $, which is always chosen uniformly randomly from the set of quantized locations).

We also computed rank, $R$, as a function of code rate, $r=CN$ (number of locations represented per neuron), a measure of spatial resolution and efficiency of the encoding (i.e., for a fixed $L$, a higher code rate, $r$, is obtained by lowering $\Delta L$ or by decreasing $N$). It is their common demoninator ($N$) that links the dependence on population size of both rank and rate. When phases are chosen randomly, low rank is difficult to obtain at all, but the smallest of code rates tested ($r\u2208[0,1]$ and $\mu p>1$ may result in low ranks if enough place cells are included). In contrast, Figure 4 shows that codes spanning the spectrum of normalized ranks may be instantiated over a wide range of rates with appropriate choice of parameters. Further, this indicates that redundancy reduces dimensionality so low ranks are achievable even at rates much greater than biologically relevant. Later, we show that this low dimensionality is important in constructing sparse and readily denoisable representations of space. Figure 4 demonstrates that without the redundancy introduced by increasing $\mu p>1$, a hybrid code that encodes in 90 neurons more than 90 locations in a 9 $m2$ environment has full rank. However, when $\mu p>1$, there is a stark drop in the maximum rank achieved. As shown, when $\mu p>1$, one may encode orders-of-magnitude more locations while maintaining low dimensionality. This trend is observed in each configuration shown and when grid cells are allocated to modules nonuniformly. Thus, both dense and sparse hybrid codes may be developed with proper choices of redundancy parameters.

A code's resilience to neural noise can be assessed by the minimum pairwise (Euclidean) distance between code words, ($d$). Traditionally, Hamming distance is used as the operative metric for characterizing minimum distance of a code. However, in cases when soft information is used by the decoder, Euclidean distance can prove to be more useful. Higher $d$ (i.e., larger distances between code words) corresponds to a more noise-tolerant neural representation of space (Lin & Costello, 1983). In fact, ideally all errors induced by noise with amplitude less than $\u230ad-12\u230b$ are correctable (Lin & Costello, 1983; Sreenivasan & Fiete, 2011). (For an intuitive illustration of this, see appendix D). We computed $d$ as a function of rate, $r$, for different phase multiplicities, $\mu p$, (see Figure 5). For each configuration there is a trade-off between $d$ and $r$. Since rank tends to increase and saturate with rate, this is also a trade-off between $d$ and rank. When the rate is low, a low resolution of location is targeted: $d$ is larger, so more erroneous neurons may be corrected. Note that for a fixed value of $r$, the codes with $\mu p=5$ have slightly smaller $d$, and this difference grows to saturation as $r$ increases. Interestingly, at high rates, the decrease in $d$ produced by increasing $\mu p$ is much smaller for the population with grid cells distributed to modules nonuniformly. This observation applies for the highest rates for which computation of $d$ is tractable with modern high-performance computers: $r<106$. Thus, for a fixed $r$ and large enough $\mu p$, the code with grid cells allocated to modules nonuniformly should exhibit measurably better denoising performance. We test this prediction by simulating the denoising process and collecting statistics presented in Figures 10 through 14. Surprisingly, for small $r$, with a uniform allocation of grid cells to modules, increases in $\mu p$ appear to effect small decreases in $d$, while when grid cells are allocated to modules nonuniformly, increases in $\mu p$ produce small but discernable increases in $d$.

For environments of a fixed size, $xmax2cm2$, and a hybrid code with $N$ neurons, varying code rates implies quantizations of space with varying unit width ($\Delta L=xmaxC$). Since rate, $r=CN$, $\Delta L=xmaxNr$. Thus, the spatial sampling period, $\Delta L$, is inversely proportional to $r$. In order to ensure we probed reasonable code rates, we estimate the typical perceivable spatial period of a rat (through its place cells) by considering its running speed (ranging from .1 to 100 $cms$), and average ISI of 150 ms (Gupta, Van Der Meer, Touretzky, & Redish, 2012), which bounds neural sampling periods for space, implying that $\Delta L$ should lie somewhere in $[0.15,15]$ cm. Code rates considered in this work assume $\Delta L<15$ cm. To satisfy curiosity and probe rate-dependent phenomena at even greater rates, the smallest $\Delta L$ considered is 0.0022 cm.

In order to investigate how the fundamental limits on denoisability of the code scale with the number of pattern neurons (i.e., grid and place cells), we compute $d$ as a function of $N$, independently varying $P$, $M$, ${Ji}i\u2208{1,...,M}$), fixing other paramters. As illustrated in Figure 6, minimum distance increases exponentially with increases in $N$ due to increases in the number of place cells, $P$, and number of grid cells per module, $Ji$. In contrast, increases of $M$ past a critical value cease to improve minimum distance because the spatial scale at which higher-order modules represent position fails to capture relevant differences in location encoded. Notably, when all other parameters are fixed, nonuniform allocations of grid cells to modules provide a code with inferior minimum distance. This is a consequence of the greater number of pattern neurons in the uniform case and can be considered the loss incurred in exchange for an increase in coding efficiency, (measured by number of neurons used to encode position), as discussed in Mosheiff et al. (2017).

### 3.2 Code Learning Results

In order to study how algorithm 1, neural learning, affects the denoising network, we assess the changes in connectivity that it produces. Typical learned connectivity matrices and their associated normalized degree distributions (empirical distributions of the number of connections emanating from pattern neurons, normalized to the total number of pattern neurons, $N$) are found in Figures 7 and 8. These demonstrate that for a typical hybrid code, the clustered network has a sparser connectivity, with less variability in its sparsity compared to the unclustered network. This is because clustering enforces a tighter limit on the number of pattern neurons to which a constraint neuron may connect. We simulated an ensemble of 4 modules of 20 grid cells each, together with 20 place cells, which produced the following connectivity matrices and associated degree distributions. Interestingly, in both cases, there are place cells (i.e., pattern neurons with index exceeding 80) that are left unconnected to grid modules via constraint neurons. An illustration of the learned weights matrix corresponding to a randomly clustered denoising network was omitted, as it is sparser, but otherwise very similar to that of the unclustered weights image.

Figure 9 depicts the average connection strength between place cells and grid modules, where the connection strength between place cell $p$ and grid module $m$ is defined as $1nc\u2211(i,j)|wi,jwi,p|$, where $i$ indexes constraint neurons, and $j$ indexes grid cells in module $m$. Note here that connectivity implies not direct synaptic connection but effective connectivity through constraint neurons. Results were obtained from configurations with $M=4$, $J=20$, and $P=20$; connectivities depicted are averaged over 50 networks. Place cells are ordered by increasing the size of the receptive field. This trend appears for any $\mu p>1$ (i.e., whenever the responses of at least some grid cells are replicated by instantiating multiple grid cells with the same phase in the same module). In the modularly clustered case, average connectivity (between place cells and all grid modules) appears to decrease with increasing place cell size, as compared to a random clustering that produces nearly the same connectivity for each place cell. This phenomenon was not observed when grid cell phases and orientation offsets were chosen randomly and does not appear in the unclustered configuration.

### 3.3 Denoising and Decoding Results

In order to study the relationship between coding theoretically relevant variables, population parameters, denoising network configuration, and fidelity of the hybrid code's representation of space, we empirically evaluate the denoising network's performance. To measure the effectiveness of the denoising network, we first perturb the states (i.e., firing rates) of the grid and place cells by incrementing or decrementing randomly and clipping to the boundaries of $[0,Q-1]$. A pattern error occurs if, after denoising, any entry of the denoised pattern differs from the corresponding component of the original pattern. A symbol error occurs each time any symbol of the denoised pattern differs from the corresponding symbol of the correct pattern. For identical populations of grid and place cells ($M=4$, $J=20$, and $P=10$) in pattern error rate, the clustered network dramatically outperforms the unclustered (when the grid cells have sufficient redundancy), and the modular clustering scheme always outperforms the random clustering scheme. By fixing the size of the populations we compare, we ensure no improvement in $d$ results from a larger $N$. Figure 10 depicts pattern error rate ($Ppe$) for a clustered hybrid code, with varying phase multiplicity. The missing configuration (consisting of a randomly clustered network with a code with a nonuniform allocation of grid cells to modules) had a 100% pattern error rate for every nonzero number of initial errors. This shows that for a small number of initial errors, the full pattern of population activity corresponding to the correct location may be recovered, but in general, this is rarely possible. That only the modularly clustered denoising networks are able to achieve low $Ppe$ shows that the biological organization of grid cells into discrete modules is important for high-quality self-localization in the presence of noise. Further, clustering is the only way to achieve such a small $Ppe$, since no unclustered denoising network consistently reduced $Ppe$ below 0.99. It is surprising that the modularly clustered denoising mechanism achieves a better $Ppe$ when denoising hybrid codes with uniform allocations of grid cells to modules (as compared to nonuniform allocations of grid cells to modules), as Figure 5 demonstrates that such codes tend to have a larger minimum distance at any rate probed. This result also demonstrates that whether grid cells are distributed uniformly to modules has a smaller impact on $Ppe$ than $\mu p$. That the codes with larger $\mu p$ tend to outperform those with $\mu p=1$ is also surprising, since at high rates (in Figure 10, $r\u2248103$), codes with larger $\mu p$ are restricted to smaller $d$.

Figure 11 shows symbol error rates of hybrid codes for several configurations with deliberately chosen grid cell phases and orientations (i.e., so as to mirror those observed in Stensola et al. (2012). This demonstrates that generally, clustered denoising networks do not offer improved symbol error rate, $Pse$, compared to their unclustered counterparts. However, for a small initial number of errors, when the grid cells exhibit sufficient redundancy in their phases, a randomly clustered denoising network is outperformed only by a modularly clustered network. Figure 12 shows $Pse$ for a hybrid code with deliberately chosen phases and orientations, denoised by a modularly clustered network. Consistent with observations on pattern error rate, hybrid codes with grid cells uniformly allocated to modules achieve better $Pse$. This may result from the fact that $d$ is larger for such codes when $\mu p$ is small. However, this explanation is incomplete as when $\mu p=5$, a code generated by a nonuniform allocation of grid cells to modules, tends to achieve a larger minimum distance than those generated by uniform allocation of grid cells to modules. Plotted in both Figures 11 and 12 is a dotted red curve, $log10(initialnumberoferrorsN)$. This curve is a threshold between regions of desirable and unacceptable $Pse$ (i.e., $log10(Pse)$ for a network that performs no denoising). To see this, consider a denoising network that does not change the initial number of errors, $E$. For this network, $Pse=EN$, so $log10(Pse)=log10(E)-log10(N)$. Surprisingly, Figure 11 shows that for a small initial number of errors, configurations with $\mu p=1$ have $log10(Pse)$ above this threshold, that is, they increase the number of symbol errors! Figure 12 quantifies the loss incurred by the nonuniform allocation of grid cells to modules (i.e., $Jm\u221d1\lambda m-1$) for a modularly clustered denoising network. Note that both grid cell allocation schemes produce networks that introduce additional errors during denoising when $\mu p=1$ and $E=1$, as these conditions result in $Pse>EN$. Note that for $E>1$, no network introduces extraneous errors by denoising. Additionally, networks with $\mu p=5$ dramatically outperform those with $\mu p=1$, when $E$ is small.

Figure 13 shows MSE of different decoding processes after denoising for a hybrid code with $M=4$, $J=20$, $P=10$, and $\mu p=5$, for deliberately chosen grid cell parameters (i.e., so as to mirror those observed in Stensola et al., 2012). This plot demonstrates that an ideal observer decoder that considers information from all cells outperforms all others for any initial number of errors. This disparity may in part be accounted for by the difference between the number of grid cells and the number of place cells. Figure 14 shows MSE of joint hybrid decoding after denoising for a hybrid code with $\mu p=5$, for the configurations that achieved the best error correction performance in both $Ppe$ and $Pse$. This plot demonstrates that the code with grid cells distributed to modules uniformly with a modularly clustered denoising network achieves the best decoding performance, outperforming its nonuniformly arranged analogue. Since the code with a nonuniform allocation of grid cells to modules had a larger minimum distance (compared to the same code with a uniform allocation of grid cells to modules), this result challenges our earlier hypothesis that codes with nonuniform allocations of grid cells across modules may be denoised more effectively. This is especially remarkable since in section 3.1, we demonstrated that these codes achieve larger minimum distance for identical $N$ at large $r$, such as the rate considered in Figure 14. Further, this demonstrates (in a natural metric of the stimulus space) that in the most redundant hybrid code considered, a modularly clustered denoising network is far superior to a randomly clustered or unclustered one. Interestingly, for a small number of initially erroneous pattern neurons, the loss (in MSE) due to a lack of modular clustering is much greater than the loss due to nonuniformity.

## 4 Discussion

We demonstrated that both dense and sparse hybrid codes may be constructed by proper choice of grid and place cell parameters. We also showed that in the presence of neural noise, the activity of only configurations with sufficient redundancy in the grid cell component of the code may be consistently denoised. It is somewhat counterintuitive that populations with replicated grid cell responses (i.e., $\mu p>1$) produce a more noise-resilient code (as shown in the denoising performance results). This is surprising because the populations with uniformly allocated grid cells and largest $d$ are those with unique spatial phases (i.e., $\mu p=1$; see Figure 5). This result is counterintuitive (in the biological sense) as in Hafting et al. (2005), it is noted that the distribution of grid cell phases observed in experiment did not deviate significantly from uniformity. Wennberg (2015) revealed that the distribution of spatial phase offsets of grid cells may be significantly nonuniform. The data set from which this conclusion is drawn was obtained from rat 14147 in Stensola et al. (2012). Our results imply that this observed nonuniformity in distribution of grid cell phases provides value in denoisability and accuracy of decoding.

Our results reveal another surprise in Figure 5, in which, for $\mu p>1$, codes with nonuniform allocations of grid cells to modules achieve demonstrably larger $d$. However, in Figure 12, the networks with $\mu p=5$ and grid cells allocated to modules uniformly achieve the smallest $Pse$. Further, in Figures 10 and 14, for a small number of initially erroneously signaling neurons ($E$), these networks outperform those with grid cells allocated to modules nonuniformly. These observations demonstrate that the hybrid code for space may trade off improvements in denoising performance (in $d$) for efficiency of encoding ($r$) by distributing grid cells to modules nonuniformly, as suggested in Mosheiff et al. (2017).

Hybrid codes of widely varying rank, minimum distance, and code rate ($R$, $d$, and $r$, respectively) may be instantiated by choosing appropriate parameters for the populations of grid and place cells, a fact that showcases the code's adaptability. This means that grid and place cells may participate in neural computations that rely on assumptions other than those presented here, which insist on a low-dimensional code space and a sparse connectivity matrix. It is particularly difficult to characterize the trade-off between code rate and $d$, presented in Figure 5, as it indicates that for biologically reasonable values of $r$, increases in $\mu p$ should reduce a code's minimum distance, $d$ (a fundamental limit of the code's denoisability). Surprisingly, the configuration with uniformly allocated grid cells and $\mu p=5$ tends to outperform the others in $Ppe$, $Pse$, and MSE. It is possible that the denoising networks presented here are incapable of achieving the codes' error correction capacities in the cases considered. This would allow for characteristics endowed by a larger $\mu p$ to effect the stark differences observed in denoising efficacy and decoding accuracy. Furthermore, this explanation seems likely, as coding theory suggests that the maximum number of correctable errors in a linear block code (as a function of $d$) can be computed as $t=\u230ad-12\u230b$ (Lin & Costello, 1983). For example, the strongest code (as measured by largest value of $d$ in Figure 5) achieves $d\u22485$ for intermediate $r$, so $t\u22482$. Figure 10 corroborates this in demonstrating that pattern error rate exceeds 0.5 (and quickly saturates at 1) for more than two errant pattern neurons.

We demonstrate that the chosen denoising network architecture performs satisfactorily for hybrid codes that fit its requirements regarding rank and poorly for those that do not. Additionally, we assessed average connectivity between place cells of varying receptive field sizes and modules of grid cells by analyzing the learned connectivity matrix. This analysis demonstrates that our model place cells of smaller receptive field size are more strongly connected to grid modules and that they are most strongly connected to grid modules of the smallest scale. Moreover, this result presents a physiologically testable hypothesis. While difficult, two-photon microscopy has been successfully employed to accurately image the microscopic structure of nervous tissue (Svoboda & Yasuda, 2006). One way to estimate connection strength between real neurons is to count the number of boutons expressed on the presynaptic neurons, assuming that weight should be proportional to this number, though there may be simpler ways to estimate connection strength (Bi & Poo, 1998). Thus, if groups of place cells connected via constraint neurons to several distinct grid modules may be identified, this theoretical prediction—that connectivity between the hippocampus and MEC will decrease along the dorsoventral axis—can be confirmed or refuted. Another interesting experiment is made possible by advances in optogenetics, which enable single cell resolution of network activity for a population of inoculated cells (e.g., a collection of grid cells, as in Sun et al., 2015). While technically challenging due to the physical separation of each population in the brain, it should be possible to image simultaneous activity of grid and place cells at high temporal precision (Grewe, Langer, Kasper, Kampa, & Helmchen, 2010). From these measurements, for a set of quantized locations, simultaneous firing rates may be estimated (Theis et al., 2016). Then the rank, rate, and minimum distance of this empirical codebook may be computed to offer insight into limits of noise tolerance of real spatial navigation circuitry. Of particular interest is discovering the extent to which neural noise transiently varies such attributes for grid and place cells in real brains and how these coding theoretic properties adapt (if at all) to changes in speed, context, and other variables.

In Figures 10, 12, 13, and 14, we demonstrate the differences in performance of each network structure and of the various decoding algorithms. The universal improvements from place-only decoding to joint-hybrid decoding show that highly accurate position estimation can be significantly more difficult without both populations of cells. The discrepancy between grid-only decoding and grid decoding conditioned on place response shows that even utilizing place cell information indirectly (by eliminating candidate locations deemed impossible given the state of the place cell population) yields a sizable improvement in decoding accuracy when there are many place cells or when place cells are less noisy than grid cells. That the modularly clustered networks tend to best the corresponding randomly clustered networks implies that the physiological organization of grid cells by their spatial scale may provide a computational advantage in denoising and decoding. This notion is further supported by the observation that a randomly clustered network sometimes introduces additional errors by attempting to denoise, as shown in Figure 11. This may be because the unclustered network is essentially a randomly clustered network that does not take advantage of synergistic cluster computing. In any cluster, both grid cells and place cells are able to correct each other's errant activity. However, under modular clustering, in order for a grid cell in module $i$ to correct the activity of a grid cell in a different module $j$, the activity of each neuron in module $i$ must be correct so that the activity of place cells (connected to both modules $i$ and $j$) will contradict and correct the erroneous activity.

It should be noted that the denoising constraint neurons are a hypothetical construct and need not reside in the hippocampus or MEC in order to execute the previously described computations. Our conception of these constraint nodes is as single units. However, these may represent larger networks of neurons performing identical computations. Furthermore, this work is not intended to convince readers of the necessity or existence of these cells, only to demonstrate tangible coding theoretic advantages conferred by constraint neuron moderated communication between grid and place cells. Additionally, some models of development of the grid and place cell networks demonstrate dependence between properties of each population's apparent receptive fields that our model is unable to capture (Monaco & Abbott, 2011). Thus, coding theoretic results presented here are confined to consideration of a more static code than what is often observed in recordings of real neuronal populations. While our model is limited in the sense that neurons are defined functionally (in contrast with biophysical models, where behavior emerges from the time evolution of the model's physics), the learning algorithms considered are analogous to a Hebbian plasticity, and operations required for denoising can be feasibly implemented by networks of real neurons (if not by single units). Hence, the results discussed here have potential implications about neural codes for other continuously valued stimuli (e.g., pitch of an auditory signal, another variable encoded in the mammalian hippocampus; Aronov et al., 2017).

Contributions of this work include the coding model and denoising systems themselves, as a framework in which to characterize limits on fidelity of cooperating neural codes subject to noise (for physical position or other variables such as the auditory pitch code studied in Aronov et al., 2017), and improved clarity about how parameterization of grid and place cell populations affects these fundamental information and coding theoretic limits. Further development along these threads of investigation of neural codes for space includes studying coding theoretic properties of more complete navigational codes, including head direction cells, boundary vector cells, and time cells (Lever, Burton, Jeewajee, O'Keefe, & Burgess, 2009; Salz et al., 2016; Taube, Muller, & Ranck, 1990). It would be most interesting to probe coding and information-theoretic properties of place cells that encode 3D space as demonstrated to reside in the bat hippocampus (Yartsev & Ulanovsky, 2013). Even with these classes of neuron, the hybrid code might be unable to encode and denoise path information without supplementary structure to process its sequentiality. One strong candidate solution for this is to include so-called hippocampal time cells. Just as place cells code for distinct locations on paths through space, time cells encode ordered moments in a temporally ordered sequence of events, precisely the information, which, when coupled with location, should allow for the encoding of paths (MacDonald, Lepage, Eden, & Eichenbaum, 2011).

## Appendix A: Network Size

$N$, the number of pattern neurons in a network, is the sum of the sizes of the constituent grid and place cell populations. When grid cells are allocated to modules uniformly, $N=P+M\xb7J$. Otherwise, $N=P+\u2211m=1MJ\lambda m-1$.

Since a code of normalized rank $R$ admits at most $N(1-R)$ unique constraint equations (i.e., linear combinations of pattern neuron activities that evaluate to zero only when this activity forms a code word and the functions computed by constraint neurons), we use $nc=N(1-R)$.

## Appendix B: Subspace Learning

## Appendix C: Structure of the Performance Testing Simulations

In order to evaluate the performance of the denoising mechanisms proposed here, we first generate codes from the parameters considered in appendix E. Then algorithm 1 is applied to the chosen denoising network. After learning is complete, in sequence, $C$ randomly chosen code words are corrupted and presented to the network to denoise using algorithms 2 and 3. After the denoising process is complete, the denoised pattern is assessed and performance is computed incrementally.

## Appendix D: How Minimum Distance Limits Ideal Decoding

Suppose $x$ and $z$ are two code words separated by their code's minimum distance, $d$, as shown in Figure 15, and that during transmission of $x$, our channel adds noise, $n$. If the magnitude of this noise ($\u2225n\u2225$) exceeds $d2$, the received word ($y$), may lie a distance $t<d2$ away from $z$. As a result, a minimum distance decoder (which outputs the code word nearest to the received word) incorrectly declares that $z$ was transmitted. If error events at the symbols of code words are independent and the probability of error does not depend on the position of the symbol in question, as long as this probability does not exceed $12$, minimum distance decoding is maximum likelihood decoding.

## Appendix E: Parameter and Variable Definitions

We present a table of definitions considered in this article.

Parameter | Definition |

$L$ | Length of simulated square arena |

$C$ | Number of locations (code words) that comprise the code in question |

$M$ | Number of modules of grid cells |

$J$ | Number of neurons in the first module of grid cells |

$P$ | Number of place cells |

$\mu p$ | Number of grid cells with the same phase in the same module |

$\lambda $ | Scaling ratio between grid modules |

$\lambda i$ | Scale of the $i$th grid module |

$\theta m,j$ | Orientation offset of the $j$th grid cell of module $m$ |

$\alpha t$ | Learning rate at iteration $t$ |

$\epsilon $ | Learning completion threshold |

$\eta $ | Sparsity penalty coefficient |

$C$ | Codebook: A collection of code words formed by the simultaneous activity of pattern neurons |

$C\u0332$ | Codebook matrix constructed by placing elements of $C$ in rows |

$R$ | Normalized rank of the code, $rank(C\u0332)N$ |

$r$ | Normalized code rate (i.e., number of locations represented per neuron): $CN$ |

$d$ | Minimum distance of a code (minimum among all distances between code words) |

Parameter | Definition |

$L$ | Length of simulated square arena |

$C$ | Number of locations (code words) that comprise the code in question |

$M$ | Number of modules of grid cells |

$J$ | Number of neurons in the first module of grid cells |

$P$ | Number of place cells |

$\mu p$ | Number of grid cells with the same phase in the same module |

$\lambda $ | Scaling ratio between grid modules |

$\lambda i$ | Scale of the $i$th grid module |

$\theta m,j$ | Orientation offset of the $j$th grid cell of module $m$ |

$\alpha t$ | Learning rate at iteration $t$ |

$\epsilon $ | Learning completion threshold |

$\eta $ | Sparsity penalty coefficient |

$C$ | Codebook: A collection of code words formed by the simultaneous activity of pattern neurons |

$C\u0332$ | Codebook matrix constructed by placing elements of $C$ in rows |

$R$ | Normalized rank of the code, $rank(C\u0332)N$ |

$r$ | Normalized code rate (i.e., number of locations represented per neuron): $CN$ |

$d$ | Minimum distance of a code (minimum among all distances between code words) |

## Appendix F: Choices of Parameters

In learning, normalized weights are initialized randomly with degree $\u23084loge(n)\u2309$, where $n$ is the length of the weight vector. We used, $\theta 0=0.031$, $\eta =0.075$, and $\alpha 0=0.95$. In denoising, we set $\phi =0.95$. Unless otherwise noted, dependent variables measured and computed are mean values averaged over 100 networks. Error bars represent standard error of the mean.

Here we present a table of parameters indexed by figure in this article. N/A means either that this parameter was varied or was not used.

Figure | $L$ (cm) | $C$ | $M$ | $J$ | $P$ | $\lambda $ | $\lambda 1$ (cm) | $\mu p$ | $\epsilon $ |

3 | 300 | 1000 | 4 | 20 | NA | $(2)$ | 40 | NA | NA |

4 | 300 | 1000 | NA | NA | NA | NA | 40 | NA | NA |

5 | 300 | NA | NA | NA | NA | $(2)$ | 40 | NA | NA |

6 | 300 | NA | 4 | 20 | 10 | $(2)$ | 40 | NA | NA |

7 | 300 | NA | 4 | 20 | NA | $(2)$ | 40 | 5 | NA |

8 | 300 | $105$ | 4 | 20 | 20 | $(2)$ | 40 | 5 | $C10-3$ |

9 | 300 | $105$ | 4 | 20 | 20 | $(2)$ | 40 | 5 | $C10-3$ |

10 | 300 | $105$ | 4 | 20 | 20 | $(2)$ | 40 | 5 | $C10-3$ |

11 | 300 | $105$ | 4 | 20 | 10 | $(2)$ | 40 | NA | $C10-3$ |

12 | 300 | $105$ | 4 | 20 | 10 | $(2)$ | 40 | NA | $C10-3$ |

13 | 300 | $105$ | 4 | 20 | 10 | $(2)$ | 40 | NA | $C10-3$ |

14 | 300 | $105$ | 4 | 20 | 10 | $(2)$ | 40 | 5 | $C10-3$ |

15 | 300 | $105$ | 4 | 20 | 10 | $(2)$ | 40 | 5 | $C10-3$ |

Figure | $L$ (cm) | $C$ | $M$ | $J$ | $P$ | $\lambda $ | $\lambda 1$ (cm) | $\mu p$ | $\epsilon $ |

3 | 300 | 1000 | 4 | 20 | NA | $(2)$ | 40 | NA | NA |

4 | 300 | 1000 | NA | NA | NA | NA | 40 | NA | NA |

5 | 300 | NA | NA | NA | NA | $(2)$ | 40 | NA | NA |

6 | 300 | NA | 4 | 20 | 10 | $(2)$ | 40 | NA | NA |

7 | 300 | NA | 4 | 20 | NA | $(2)$ | 40 | 5 | NA |

8 | 300 | $105$ | 4 | 20 | 20 | $(2)$ | 40 | 5 | $C10-3$ |

9 | 300 | $105$ | 4 | 20 | 20 | $(2)$ | 40 | 5 | $C10-3$ |

10 | 300 | $105$ | 4 | 20 | 20 | $(2)$ | 40 | 5 | $C10-3$ |

11 | 300 | $105$ | 4 | 20 | 10 | $(2)$ | 40 | NA | $C10-3$ |

12 | 300 | $105$ | 4 | 20 | 10 | $(2)$ | 40 | NA | $C10-3$ |

13 | 300 | $105$ | 4 | 20 | 10 | $(2)$ | 40 | NA | $C10-3$ |

14 | 300 | $105$ | 4 | 20 | 10 | $(2)$ | 40 | 5 | $C10-3$ |

15 | 300 | $105$ | 4 | 20 | 10 | $(2)$ | 40 | 5 | $C10-3$ |

## Note

^{1}

To see this, consider that $xnW'=(x+n)W'=xW'+nW'\u22480+nW'$.

## Acknowledgments

This work is supported in part by National Science Foundation grants IIS-1464349 and CCF-1748585.