## Abstract

In this study, we integrated neural encoding and decoding into a unified framework for spatial information processing in the brain. Specifically, the neural representations of self-location in the hippocampus (HPC) and entorhinal cortex (EC) play crucial roles in spatial navigation. Intriguingly, these neural representations in these neighboring brain areas show stark differences. Whereas the place cells in the HPC fire as a unimodal function of spatial location, the grid cells in the EC show periodic tuning curves with different periods for different subpopulations (called modules). By combining an encoding model for this modular neural representation and a realistic decoding model based on belief propagation, we investigated the manner in which self-location is encoded by neurons in the EC and then decoded by downstream neurons in the HPC. Through the results of numerical simulations, we first show the positive synergy effects of the modular structure in the EC. The modular structure introduces more coupling between heterogeneous modules with different periodicities, which provides increased error-correcting capabilities. This is also demonstrated through a comparison of the beliefs produced for decoding two- and four-module codes. Whereas the former resulted in a complete decoding failure, the latter correctly recovered the self-location even from the same inputs. Further analysis of belief propagation during decoding revealed complex dynamics in information updates due to interactions among multiple modules having diverse scales. Therefore, the proposed unified framework allows one to investigate the overall flow of spatial information, closing the loop of encoding and decoding self-location in the brain.

## 1 Introduction

### 1.1 Neural Representations Subserving Spatial Navigation

How does the brain represent information about external stimuli or internal states? This question has been an active research topic because the answers to it will aid our understanding of the information processing mechanisms in the brain. More specifically, knowledge of the neural representations at different stages of information processing helps researchers investigate perception and cognition step by step and provides details and rich hypotheses that can contribute to the depth and breadth of our understanding of the brain.

This study examines the question we posed at the start of this letter in the context of spatial navigation and the underlying neural systems. Spatial navigation involves a wide range of neural representations of inputs from different sensory systems, the map of the environment, self-location, and route plans. Since the notion of the cognitive map was proposed by Tolman (1948), the neural systems used in spatial navigation have been studied. Among these systems, the hippocampus (HPC) and the entorhinal cortex (EC) have attracted considerable attention because of their critical roles in spatial learning (McNaughton, Battaglia, Jensen, Moser, & Moser, 2006; Moser, Kropff, & Moser, 2008; Moser et al., 2014; Bush, Barry, Manson, & Burgess, 2015; Hasselmo, Alexander, Dannenberg, & Newman, 2020).

Neurons in the HPC and EC convey information about self-location, but the neural representations in the HPC and EC show different structures. The place cells in the HPC of an animal represent self-location such that a place cell produces significantly more spikes when the animal is close to a particular spatial location, called a spatial field, and different place cells are active for different locations (O'Keefe & Nadel, 1979; Wilson & McNaughton, 1993). The EC is an interface between the neocortex and the HPC, but rather surprisingly, the representation of self-location in the EC shows a stark difference from that in the HPC (Fyhn, Molden, Witter, Moser, & Moser, 2004; Hafting, Fyhn, Molden, Moser, & Moser, 2005). The grid cells in the EC show a modular structure (Stensola et al., 2012) in which different modules are related to self-location by periodic functions having different periodicities. These considerably different neural representations of spatial location in the neighboring brain areas have attracted significant interest from both experimental (Fyhn, Molden, Witter, Moser, & Moser, 2004; Hafting, Fyhn, Molden, Moser, & Moser, 2005; Stensola et al., 2012; Rowland, Roudi, Moser, & Moser, 2016; Doeller, Barry, & Burgess, 2010; Jacobs et al., 2013; Tocker, Barak, & Derdikman, 2015; Jafarpour & Spiers, 2017; Julian, Keinath, Frazzetta, & Epstein, 2018) and the theoretical (Fuhs & Touretzky, 2006; Burgess, Barry, & O'Keefe, 2007; Fiete, Burak, & Brookings, 2008; Hasselmo, 2008; Burak & Fiete, 2009; Sreenivasan & Fiete, 2011; Giocomo, Moser, & Moser, 2011; Burak, 2014; Bush et al., 2015; Yoo & Kim, 2017; Hasselmo et al., 2020) perspectives.

This letter focuses on the interaction between the HPC and the EC for processing self-location information. A good understanding of the neural representation for spatial location in the HPC (O'Keefe & Nadel, 1979; Wilson & McNaughton, 1993) is provided by the framework of classical population codes (CPCs) (Seung & Sompolinsky, 1993; Deneve, Latham, & Pouget, 1999; Dayan & Abbott, 2001). A place cell in the HPC is more active when the animal's spatial location is close to the cell's preferred location, which is explained by a simple unimodal tuning curve. Thus, identification of the active place cells reveals information about the spatial location. In contrast, the tuning curves of grid cells in the EC are periodic functions of spatial location. Even more interesting is that the spatial periods of grid cells in different modules differ, and they are systematically arranged in the EC; the spatial periods increase from the dorsal to the ventral parts in the EC (Hafting et al., 2005; Stensola et al., 2012). Thus, this newly discovered multiscale code having a modular structure in the deep brain has attracted significant attention. This multiscale neural representation of self-location, which was found first in rodents and bats (Geva-Sagiv, Las, Yovel, & Ulanovsky, 2015), was indirectly observed in the human brain as well (Doeller et al., 2010; Jacobs et al., 2013; Epstein et al., 2017; Julian et al., 2018), implying that the multiscale neural representation is a fundamental representation of space in mammalian brains (McNaughton et al., 2006; Moser et al., 2008, 2014; Bush et al., 2015).

### 1.2 Neural Encoding and Decoding

From a neural coding perspective, accumulating evidence has shown that a neural representation of an encoded variable (such as self-location) involves a group of neurons. For example, the place cells in the HPC collectively represent the animal's spatial location according to the activities of place cells having an identical bump-like receptive field (or place field) but different centers. Thus, the identity of the place cells that are active conveys information about spatial location. The collective encoding is formulated as a population code, in which a group of neurons' responses to the encoded variable are identical up to some simple transformation (such as translation). The population code appears to be a fundamental neural representation in not only sensory but also deeper brain areas.

The activities of the grid cells in the EC also agree with the population coding paradigm. Neighboring grid cells in the same module share the same periodic tuning curve but with different translations. Thus, the grid cells in a module collectively encode the phase with respect to the spatial period of the module. These network effects have been investigated (Fiete et al., 2008; Burak & Fiete, 2009; Yoon et al., 2013; Chaudhuri, Gerçek, Pandey, Peyrache, & Fiete, 2019).

Conventionally, neural representations have been studied separately for each of the rather distinct stages of information processing: encoding and decoding. Neural encoding refers to the process by which an encoded variable (e.g., self-location) is mapped to the activities of a group of neurons (e.g., place cells in the HPC or grid cells in the EC). However, neural decoding focuses on the extraction of information about the encoded variable from the activities of the group of neurons.

Most studies have been focused on the neural encoding of a target system under the assumption of an ideal model for the decoding stage, called the *ideal observer model*. For instance, in studies on CPCs, the amount of information that can be encoded by neural spike counts that deviate from the mean response, called the tuning curve, was investigated (Seung & Sompolinsky, 1993; Dayan & Abbott, 2001). Pouget, Deneve, Ducom, and Latham (1999) studied the optimal parameters of the tuning curve for encoding under the assumptions of a sufficiently large number of neurons in a population for the encoding stage and an ideal observer for the decoding stage. The assumption of a sufficiently large number of neurons was relaxed by Berens, Ecker, Gerwinn, Tolias, and Bethge (2011), which allowed them to draw a similar conclusion. Thus, the assumption of an ideal optimal decoder simplifies analyses and simulations, enabling researchers to focus on the information processing during encoding.

These studies based on the assumption of an ideal decoder, however, provided an incomplete understanding of neural information processing. In general, the decoding stage is considerably more complex and requires higher computational complexity than the encoding stage (Goldsmith, 2005; Proakis & Salehi, 2008). Therefore, the search for good encoding schemes with efficient decoding algorithms has been a long-standing research activity in the field of communication engineering (Cover & Thomas, 2006; Richardson & Urbanke, 2008). The same principle applies to neural codes. If a representation of an encoded variable by a biological information system is to be efficient, both the encoding and the decoding of the representation must be efficient.

However, only a few studies have addressed the amount of information that can be decoded from a neural representation by downstream neurons. Wong, Huk, Shadlen, and Wang (2007) showed that during perceptual decision making, downstream neurons use less information than would be extracted by the ideal decoder. Deneve et al. (1999) investigated a decoding model for CPCs by generating the neural responses of an encoded variable using a unimodal tuning curve, which is decoded using a linear summation followed by a nonlinear activation. These two steps in their decoding model naturally fit the biological mechanisms of the cortical network, that is, polling neural responses by lateral connections and divisive normalization. Having numerically explored the range of parameters for linear pooling, Deneve et al. (1999) concluded that a near-optimal readout of a CPC is made possible by iterating these two biologically plausible computations. However, the manner in which the information about the spatial location encoded in the multiscale modular representation in the EC may be decoded remains elusive.

### 1.3 Unified Framework for Neural Encoding and Decoding

The objective of this study was to obtain a holistic understanding of the cognitive map in the HPC and EC, considering both the encoding and the decoding processes in a unified framework. The contributions of our work are summarized as follows:

**A more realistic encoding model for the modular structure in the EC**: The multiscale modular structure of the grid cells in the EC allows the representations of self-location to have error-correcting capabilities (Fiete et al., 2008; Burak & Fiete, 2009; Sreenivasan & Fiete, 2011; Mathis, Herz, & Stemmler, 2012). This gain has been theoretically analyzed under the assumption of a sufficiently large number ($N$) of grid cell modules. However, experimental findings showed that the number of grid cell modules is finite and quite small (Stensola et al., 2012), which undermines the theoretical predictions of the error-correcting capacity of grid cell networks. Therefore, we addressed the gap in our understanding by investigating the effects of the number of modules on the error probabilities using only a few modules ($N=2-4$).**A realistic decoding model with neurally plausible computations over sparse connectivity**: Instead of the ideal observer model, we adopted a neurally plausible decoding model based on belief propagation on a sparse graph (Yoo & Vishwanath, 2015; Yoo & Kim, 2017). First, the beliefs of the spatial location are locally calculated from the grid cell activities of the neighboring module pairs. The local computation of belief, mathematically derived from the belief propagation rule (Yoo & Vishwanath, 2015), is approximated well by a simple linear transformation followed by divisive normalization (Yoo & Kim, 2017). Then, such beliefs are exchanged across modules, which update the local beliefs. The belief update and propagation are iterated until convergence. This iterative decoding algorithm provides a more realistic decoding model that can be efficiently simulated for multiscale neural representations in the EC.**Neural dynamics of decoding self-location**: The neural representation in the EC encodes spatial location as a combination of multiple modules having different scales. This combinatorial structure provides the representation with strong error-correcting capabilities (Fiete et al., 2008; Burak & Fiete, 2009; Sreenivasan & Fiete, 2011; Mathis et al., 2012), which is accompanied by an increased complexity of decoding. The inclusion of both encoding and decoding models in a unified framework allowed us to investigate the manner in which spatial information from multiple modules is combined and used by downstream neurons. The trajectories of beliefs during the decoding process reveal the manner in which local information from neighboring modules is updated and propagated. The most interesting case was that different modules provided conflicting information. We explored the dynamics of decoding such “corner cases” using numerical simulations.

The remainder of this letter is organized as follows. In section 2, the multiscale neural codes for spatial location are formalized. In section 3, we discuss the biologically plausible decoding of such codes. Our numerical simulation results are presented in section 4, and the conclusions are given in section 5.

## 2 Multiscale Neural Codes for the Spatial Location

### 2.1 The Modular Structure of the Entorhinal Cortex Induces Qualitatively Different Neural Codes

The modular structure in the EC yields a unique neural code for spatial location, called the grid-cell population code (GPC), the properties of which are very different from those of CPCs. CPCs are observed mainly in the sensory or motor areas of the brain, where a population of neurons with homogeneous tuning curves up to translation represents a particular feature of the stimulus, such as the orientation or motion of a bar, direction of the wind, or direction of an intended motion (Hubel & Wiesel, 1962; Miller, Jacobs, & Theunissen, 1991; Georgopoulos, Kalaska, Caminiti, & Massey, 1982; Georgopoulos, Caminiti, Kalaska, & Massey, 1983; Georgopoulos, Schwartz, & Kettner, 1986). In contrast, the GPC is observed in the EC, where tuning curves are periodic functions of the spatial location, and the shape of the tuning curves has two parameters: period and phase. Closely located grid cells in the EC share the same period but are sensitive to different phases (Hafting et al., 2005). Thus, these grid cells having the same period, called a *module*, collectively encode the phase of the spatial location. The spatial periods of grid cells are discrete and increase along the dorsal-ventral axis in the EC (Hafting et al., 2005; Stensola et al., 2012), indicating that multiple modules with different spatial periods represent spatial location using phases with respect to distinct periods.

To obtain intuition from a geometrical perspective, the encoding structure of a neural code can be understood as an embedding from a lower-dimensional space (for an encoded variable) to a higher-dimensional space (for neural representation). When an encoded variable of dimension $D$ is represented by $M$ neurons, such a neural code is defined by the (encoding) map from the encoded variable to a neural representation on a $D$-dimensional manifold embedded in an $M$-dimensional space. When $D<M$, this mapping from a low-dimensional variable to a higher-dimensional representation provides redundancy and tolerance to errors (Shannon, 1949; Fiete et al., 2008; Sreenivasan & Fiete, 2011).

In contrast, the embedding of the GPC results in a more complex manifold in a higher-dimensional space. In the GPC, the grid cells in different modules have periodic tuning curves with different spatial periods (see Figure 1C). Because of the different periodicities, the embedding curve revolves at different rates for different modules. Consequently, the embedding of the GPC forms a twisted curve in the neural representation space (see Figure 1D). Hence, similar neural representations (dots in Figure 1D) may in fact correspond to very distant points along the curve, thereby forming completely different encoded variables.

Thus, the key difference between CPCs and GPCs is that the neural representation of a GPC is not monotonically related to the encoded variable, as is that of a CPC. In a CPC, the embedding curve is relatively simple, and the geodesic distance increases with the dissimilarity of the neural representations. However, this proportionality does not hold for a GPC. Because the embeddcing curve of a GPC twists and turns, similar neural representations may correspond to completely different encoded variables. This different encoding structure introduces a qualitatively different neural coding, the implications of which are discussed in the following section.

### 2.2 Different Coding Structures Result in Different Error Patterns of Classical and Grid-Cell Population Codes

According to the geometrical insights, let us consider perturbation in the neural representation and a simple decoder. For simplicity, the perturbation is assumed to be uniformly distributed along all the directions in the neural representation. The decoding process is defined by a projection of the noisy representation to the closest point on the embedding curve, and the corresponding coded variable of the projected neural representation is the estimate of the encoded variable.

The decoding error of a CPC is monotonically related to the magnitude of the perturbation along the embedding curve (see Figure 1B). The perturbation perpendicular to the embedding curve is completely removed by the projection. Conversely, the perturbation along the curve results in a difference between the true and the projected neural representations (the geodesic distance along the embedding curve indicated by the black dots in Figure 1B). Thus, the decoding error depends on the arc length of the embedding curve. In other words, a longer embedding curve reduces the decoding error. This effect is quantified by the length of the embedding curve corresponding to a unit interval of the coded variable, called the *stretch factor* ($L$) (Shannon, 1949). Typically, the embedding of a CPC is a dimension expanding map, and $L\u226b1$. Thus, the simple decoding reduces the effect of perturbation in the neural representation by a factor of $1/L$.

In contrast, a perturbation in GPCs results in two distinct types of decoding error. When the perturbation is relatively small and the projected neural representation is close to the true representation along the embedding curve, the decoded error scales with $1/L$. This type of decoding error is basically similar to that of a CPC and is called a *local error*. However, when the perturbation is sufficiently large, the projected neural representation may ultimately be located closer to a remote segment of the manifold (black dots in Figure 1D). This results in large, nonlocal errors (Sreenivasan & Fiete, 2011; Berens et al., 2011), because the geodesic distance (along the embedding curve) between the true and projected neural representations is very large. Such an error is also known as a *threshold error* (Shannon, 1949).

More specifically, there exists a trade-off between local and threshold errors. Let us assume that the range of the encoded variable and the dimension of the neural representation, that is, the numbers of modules and grid cells, are held constant and different embedding curves are considered by varying the spatial periods of the GPCs. The larger the value of $L$, the smaller the local errors. However, an increase in $L$ inevitably increases the probability of threshold errors ($Pth$). This is because a more stretched coding manifold is packed into the same representational space, and therefore the minimum distance ($dmin$) between distinct neural representations decreases. Consequently, it is more likely that the same perturbation in the neural representation is projected to a remote segment, which results in a threshold error.

Therefore, the frequencies and sizes of local and threshold errors differ. Local errors in general occur more frequently, and their size is smaller than that of threshold errors. Threshold errors occur less frequently but are very large, resulting in complete decoding failure. This trade-off between local and threshold errors is due to the trade-off between the value of $L$ and $dmin$ of the embedding curves.

### 2.3 The Threshold Error Necessitates New Approaches for Analysis and Numerical Simulations

CPCs having relatively simple tuning curves produce only local errors, which existing frameworks can explain well. The classical examples include Fisher information (FI) analysis, where the encoding models for CPCs are examined by analyzing the effects of the encoding parameters on the FI (Pouget et al., 1999; Dayan & Abbott, 2001; Bethge, Rotermund, & Pawelzik, 2002; Berens et al., 2011). An important advantage of FI is that it is analytically calculated using simple and reasonable models such as gaussian or Poisson distributions for neural variabilities (Dayan & Abbott, 2001). In addition, because FI is inversely related to the theoretically achievable minimum decoding error (Cramér, 1946), FI-based analysis allows one to focus on the encoding process under the assumption of an ideal observer for the decoding model. This simplifying assumption is valid for CPCs because relatively simple neural networks effectively approximate the ideal observer model (Seung & Sompolinsky, 1993; Deneve et al., 1999).

In contrast, FI fails to capture the effect of threshold errors owing to its nonlocal property. This is because the inverse of FI is the lower bound, called the Cramer-Rao bound, of the variance of decoding errors (Cramér, 1946). This bound on the variance is useful only when the distribution of the decoding error is unimodal (such as local errors) but provides little information about the threshold errors because their distribution is multimodal and spreads over the entire range of the encoded variable (Sreenivasan & Fiete, 2011).

Therefore, to investigate the threshold errors of GPCs, one should resort to numerical simulations. The neural representation of a GPC is inherently based on the combinatorial effect of multiple modules with different scales, which is highly nonlinear. Consequently, simple readout mechanisms are compatible only for local errors and fail to address threshold errors (Yoo, 2014). Therefore, in this study, a neurally plausible decoding model was incorporated, which considers the combinatorial structure of multiple modules.

Specifically, we used an iterative decoding algorithm proposed for GPCs (Yoo & Vishwanath, 2015) for our numerical simulations. This algorithm first calculates the intermediate local likelihoods, called beliefs, from several modules, and then combines these beliefs until convergence to produce an estimate of the encoded variable. This approach is called belief propagation or Pearl's algorithm, which was inspired by and resembles information processing in the brain (Pearl, 1988). The accuracy of a belief propagation-based decoder is very close to that of the ideal observer model based on the maximum likelihood decoding rule (Yoo & Vishwanath, 2015). An additional advantage of this approach is that one can include decoding errors in the downstream decoding neurons to render the decoding model even more realistic (Yoo & Kim, 2017).

Using this more realistic decoding model, we investigated the manner in which the interactions between grid cell modules give rise to correct or incorrect decoding. The existing studies on neural codes provided a static view of information processing based on the final estimates of an input in the asymptotic regimes (in the limit of a very large number of neurons or modules). In contrast, the belief propagation-based decoder reveals the dynamics during decoding of GPCs. In other words, the transient belief propagations before convergence were analyzed in this study to understand further the manner in which spatial location information in a multiscale neural representation in the EC is extracted.

In the following section, we provide more details of the encoding and decoding models with details of the procedures and parameters of the numerical simulations.

## 3 Unified Framework for Biologically Plausible Encoding and Decoding Models

Both the encoding and the decoding models are designed to capture the essential features of each stage by applying realistic parameters. More detailed descriptions of the encoding and the decoding models are provided in the following sections.

### 3.1 Encoding Model of Multiscale Population Codes

The encoding model used in this study is as follows. The spatial location $x\u2208[0,xmax]$ of an animal is encoded by $N$ grid cell modules having different spatial periods, where each module contains $M$ grid cells. Notationally, a modular neural representation $(r1,r2,\cdots ,rN)$ corresponds to spatial location $x$, where $rn$ corresponds to the neural activities of the grid cell module $n$ and comprises the spike counts of $M$ grid cells: $(rn1,rn2,\cdots ,rnM)$.

Thus, equations 3.1 to 3.3 summarize the encoding model from the spatial location ($x$) to the phase ($\varphi n$) and then to the grid cell responses ($rnm$). In other words, the encoding map $x\u2192\varphi n\u2192rnm$, $n\u2208{1,2,\cdots ,N}$ and $m\u2208{1,2,\cdots ,M}$ defines the embedding from the one-dimensional spatial location to an $MN$-dimensional neural representation.

### 3.2 Biologically Plausible Decoding Model

A biologically plausible decoder (see Figure 2) produces an estimate of the spatial location denoted by $x^$ (Yoo & Vishwanath, 2015; Yoo & Kim, 2017). It is assumed that the downstream neurons in the HPC constitute the decoder. To provide consistency with the anatomical structure of the HPC (Amaral & Witter, 1989), a modular structure with sparse connectivity is assumed for the decoder, which results in a modular structure in the HPC. More specifically, the grid cell modules in the dorsal part of the EC are connected to the dorsal parts of the HPC, whereas the grid cell modules in the ventral part of the EC are connected to the ventral part of the HPC. Consequently, an HPC neuron receives inputs from a limited range of EC outputs. For simplicity, two neighboring grid cell modules are connected to one module of the HPC in the feedforward layer (see Figure 2). Therefore, the HPC can access a set of pair-wise likelihoods ($L12,L23,...$), each of which contains information about the spatial location based on only two neighboring grid cell modules. This partial information is propagated and updated according to the belief propagation rule (Yoo & Vishwanath, 2015; Yoo & Kim, 2017) to produce an estimate of the spatial location based on all the grid cell modules in the recurrent layer (see Figure 2).

*intramodule*belief.

*intermodule belief*. In turn, this inter-module belief is used to update the intramodule beliefs in another pair of modules according to equation 3.7.

The two iterative decoding steps (local competition and maximum shown in equations 3.7 and 3.8, respectively) are repeated until all the beliefs converge to fixed points. After the convergence, the quotients $(q^1,q^2,\cdots ,q^N)$ having the largest beliefs are chosen to produce estimate $x^$ of the spatial location.

### 3.3 Experimental Details of Numerical Simulations

For our numerical simulations, four spatial periods ($\lambda n,n=1,2,3,4$) were chosen to provide consistency with previous experimental observations according to the following two criteria. First, the ratio between the coding range and the spatial period $xmax/\lambda n$ should be relatively prime (Fiete et al., 2008; Yoo & Vishwanath, 2015). If this condition is not satisfied, the spatial location cannot be uniquely recovered from the neural representation (Fiete et al., 2008). Second, the ratios between consecutive periods ($\lambda n+1/\lambda n$) should be approximately $2$, based on the experimental observation that the spatial periods scale by a factor close to $2$ (Stensola et al., 2012). Thus, according to the first criterion, 5677 quadraples of relatively prime numbers lower than 50 were found by using an exhaustive search. The second constraint on the ratios of the spatial periods $0.92<\lambda n+1/\lambda n<1.12$ reduced the candidate parameter sets to 1287 quadraples of $(\lambda 1,\lambda 2,\lambda 3,\lambda 4)$.

For further investigation of the decoding dynamics, a GPC was randomly chosen from the 1287 GPCs having the ratio constraint $\lambda n+1/\lambda n\u22482$. Any randomly chosen GPC having a sufficiently large $dmin$ showed qualitatively similar results in terms of the trade-off between the local and threshold errors and belief propagation decoding. Thus, we fix the parameter for a GPC to be $xmax\lambda n={9,13,19,29}$ for equation 3.1 in the following, which has $dmin=0.21$. Using these spatial periods, we defined four grid cell modules, denoted by $Gn,n=1,2,3,4$.

Two groups of multiscale codes were studied: two-module codes ($N=2$) and a four-module code ($N=4$), which includes the two-module codes as its components. A two-module code, $Cn(n+1),n=1,2,3$, comprises a pair of modules $Gn$ and $Gn+1$ with consecutive periods $\lambda n$ and $\lambda n+1$. The four-module code $C1234$ comprises all four modules with four distinct spatial periods. Consequently, the composite code $C1234$ contains all the two-module codes as its components, with an additional coupling between modules $G2$ and $G3$, which connects $C12$ and $C23$, and another additional coupling between modules $G3$ and $G4$, which connects $C23$ and $C34$.

Two motivations underlie this simulation design. First, at least two modules are required to identify the spatial location uniquely from periodic neural responses. Therefore, the two-module code ($N=2$) corresponds to a minimum building block for multiscale neural codes with periodic tuning curves. Second, we compare the two module codes ($C12,C23,C34$) and the composite code ($C1234$). If some characteristics of the latter are not explained by those of the former, they must be due to the additional coupling between modules in the composite code.

The numerical simulations for each code were performed in three steps. First, the average threshold errors of the two-module codes and the composite code were compared for a range of phase noise sizes $\sigma \varphi $ by encoding and decoding 10,000 times for each value of $\sigma \varphi $. Then the intra- and intermodule beliefs at the beginning (itr = 1) and end (itr = 20) of the iterations were investigated to allow a representative simulation with a fixed $\sigma \varphi =0.01$ when the threshold error is not negligible but not excessively high. Finally, to investigate decoding dynamics further, we traced the change in the intra- and intermodule beliefs as the iterations increased.

For the encoding step in the numerical simulations, four grid cell module responses were generated for a fixed $x=xmax2$ with independent phase noises and Poisson variability according to equations 3.1 to 3.3. In each EC module, phase noise with a given $\sigma \varphi $ was injected independently according to equation 3.1. The noisy phase was represented by $M=256$ grid cells with the circular tuning curve defined in equation 3.2. The preferred phases of the grid cells $0,1M,2M,\cdots ,M-1M$ were uniformly distributed in the unit interval. Given the mean firing rates, the spike counts were generated according to the Poisson distribution in equation 3.3, where the maximum firing rate was set such that $rmax\Delta t=5$.

Given the noisy responses of the four EC modules, estimated spatial location $x^$ was produced by belief propagation for either two-module or composite codes. Here, the only difference is in the EC modules used for the belief propagation. For example, when the beliefs are propagated between $G1$ and $G2$, the resulting estimate is that of the two-module code $C12$. Similarly, the estimates of the other two-module codes were calculated. Using all the modules produces an estimate of the composite code $C1234$.

The type of a decoding error is determined by its size, as follows. If the absolute value of the decoding error ($|e|=|x^-x|$) is less than the average of the spatial periods $1N\u2211n=1N\lambda n$, the estimate is categorized as a local error; otherwise, it is categorized as a threshold error.

## 4 Numerical Simulations

### 4.1 Coupling More Modules Reduces Threshold Error Probability

To investigate the coupling effects of multiple modules on the threshold errors ($Pth$), $Pth$ values of the two- and four-module codes are compared as follows.

#### 4.1.1 Two-Module Codes

First, the threshold errors of the two-module codes ($C12$, $C23$, and $C34$) were measured for different phase noise sizes. The solid lines in Figure 3B, the different colors of which correspond to different codes, show the $Pth$ values of the three two-module codes as a function of the phase noise size $\sigma \varphi $. As $\sigma \varphi $ increases from a small value ($10-3$), $Pth$ initially remains close to zero and then rapidly increases when $\sigma \varphi $ reaches above a certain value. The $Pth$ value of $C12$ remains close to zero for a wide range of $\sigma \varphi $s and attains the smallest threshold error at each $\sigma \varphi $. The $Pth$ value of $C23$ starts to increase at a smaller $\sigma \varphi $ and ultimately is greater than that of $C12$. $C34$ shows the largest $Pth$ among the three codes.

The different threshold errors of the codes with fixed $N=2$ could be interpreted as the trade-off between $L$ and $dmin$. Among these two-module codes, the value of $L$ of $C12$ with the largest spatial periods is the smallest and that of $C34$ with the smallest spatial periods is the largest. Therefore, the local errors of $C34$ are smaller than those of the other two codes. However, this gain in local errors comes at the price of an increased $Pth$ because a larger $L$ implies that a longer embedding curve is packed into the same neural representation space, and therefore the minimum distance between the embedding curve decreases. Thus, numerical simulations with two-module GPCs corroborate the theoretical prediction that a code with a larger $L$ tends to have a smaller $dmin$, and consequently a higher $Pth$.

#### 4.1.2 Four-Module Code

Next, the $Pth$ value was measured for the four-module code, $C1234$. As the phase noise level increased, the $Pth$ of $C1234$ (dashed line in Figure 3B) showed a significantly lower value than that of any of the two-module codes and remained close to zero for a wider range of noise levels. This shows that the composition of two-module codes leads to a code with a stronger error-correcting capability.

Furthermore, the composite code ($N=4$) showed qualitatively different properties as a result of the constituent codes ($N=2$) being combined. The different modules of the GPC represent phases with respect to different periods, and therefore a failure in one module may lead to a complete decoding failure (Fiete et al., 2008; Burak & Fiete, 2009; Sreenivasan & Fiete, 2011). Thus, one may expect the $Pth$ of the composite code to be close to the largest $Pth$ of the constituent codes. However, the results showed the opposite: the $Pth$ value of the composite code is close to the smallest $Pth$ value of the constituent codes. The former is even smaller than the latter for a wider range of noise levels. Such an increased error-correcting capability of the composite code is not explained by the properties of the individual constituent codes, which we explore further in the following sections.

### 4.2 Convergence Analysis

To understand the threshold errors of the two-module codes and their composite code further, the initial and final beliefs of the decoding algorithm were investigated. In all the simulations, the decoding algorithm converged to a fixed point before 15 iterations and remained constant until the end of the maximum (20) iterations. This is also consistent with the theoretically proven convergence of belief propagation decoding for GPCs (Yoo & Vishwanath, 2015). This good convergence indicates that the iterative update of beliefs in the decoding algorithm leads to the closest point on the embedding curve, which satisfies all the congruence relationships between $x$ and $\varphi n$ for $n=1,2,\cdots ,N$ in equation 3.1 (Yoo & Vishwanath, 2015).

Nevertheless, convergence to a fixed point does not necessarily imply correct decoding. This is because the estimate may suffer from a threshold error. In other words, when the noise is greater than a certain threshold, the closest point on the embedding curve, which is determined by the decoding algorithm, may correspond to a completely different encoded variable. To understand further the gap between the convergence and correct decoding, the initial and final beliefs were analyzed for a fixed noise size $\sigma \varphi =0.01$ as follows.

#### 4.2.1 Two-Module Codes

It was discovered that in early iterations, there existed periodically arranged local maxima of similar sizes in the ridge of the intramodule belief $\rho 12$. This is clearly shown in the intermodule belief $\alpha 21$, which conveys information about the result of the local competition in the neighboring module pair, propagated to update $\rho 21$ of the next module pair.^{1} For example, $\alpha 21(j,k)$ is the maximum of $\rho 12(i,j)$ for a given $j$. Thus, $\alpha 21$ after the first iteration (blue in the right-hand column of Figure 4) shows the maxima among the vertical strip $\rho 12(i,j)$ for a fixed $j$ in the left-hand column of Figure 4. For all the two-module codes, the initial beliefs from a pair of modules provide little information about the encoded variable in the absence of intermodule beliefs.

This ambiguity was resolved by exchanging beliefs across modules in more iterations. After 20 iterations, a single maximum was prominent in $\rho 12$ (see Figure 4, middle) and $\alpha 21$ (see Figure 4, right, blue). This shows that both intra- and intermodule beliefs always converge to fixed points in numerical simulations.

Although all the two-module codes showed good convergence, their decoding errors showed different patterns. Whereas $C12$ and $C23$ showed local errors, $C34$ showed a threshold error. In the top two rows of Figure 4, it can be seen that $C12$ and $C23$ show qualitatively similar results in decoding. One difference is that the number of local maxima of $C23$ is larger in the initial $\rho 12$ than that of $C12$ (see Figure 4, left), because the former has smaller spatial periods than the latter. Regardless of this difference, the maximum of $\alpha 21$ after 20 iterations coincides with the true value (see the dashed vertical lines in Figure 4, right) for both $C12$ and $C23$, indicating local errors. In contrast, the threshold error for $C34$ (the code with the largest $L$ value) is indicated by the mismatch with the location of the maximum belief $\alpha 21$ and the true index (see the vertical dashed line in Figure 4 bottom).

The decoding failure for $C34$ can be also explained by the trade-off between $L$ and $dmin$. The smaller spatial periods $\lambda n$ of $C34$ than of $C12$ or $C23$ result in a larger $L$ but smaller $dmin$. Thus, the increased $L$ reduces the effect of noise on the local errors. Conversely, a smaller $dmin$ would increase the frequency of threshold errors. This is consistent with the observation that the largest number of local maxima in beliefs $\alpha 21$ in early iterations (see Figure 4, left) among the pairwise codes studied is that of $C34$, which leads to threshold errors when the phase noises are above a certain threshold. This is consistent with the threshold error analysis presented in section 4.1.

#### 4.2.2 Composite Code

The composite code ($C1234$) includes all the module pairs of the two-module codes ($C12$, $C23$, and $C34$) as its components, and the same input ($x$) and phases noise ($\xi n$) in equation 3.1 were used as for the two-module codes in the numerical simulations. Thus, additional couplings between the modules in the composite code allowed beliefs to propagate among all four modules. In one iteration of decoding, beliefs flow from $G1$ to $G4$ and then from $G4$ to $G1$ sequentially.^{2}

The right-hand column of Figure 5 shows the intermodule beliefs $\alpha $ after the first (blue) and the last (red) iterations. As for the two-module codes, the initial $\alpha $ shows numerous local maxima in the early iterations. Specifically, $\alpha 23$ after the first iteration is identical to $\alpha 21$ of $C12$. One difference is that $\alpha 34$ and $\alpha 43$ in the composite code show that the single maximum, corresponding to the true index (vertical dashed line), begins to be prominent among other local maxima even after the first iteration. This is because the additional coupling in the composite code allows beliefs to propagate even from the first iteration. After 20 iterations, each $\alpha $ contains a single maximum at the true location (see the vertical dashed line).

A stark difference between the two- and four-module codes is the failure to decode $C34$ in contrast to the success of decoding the corresponding module pair $G3$ and $G4$ in the composite code. The shortest spatial periods of $G3$ and $G4$ result in a larger number of local maxima in $\alpha $ and therefore more frequent threshold errors. When this module pair was independently simulated as the two-module code $C34$, a threshold error occurred (see Figure 4, bottom right). In contrast, in the composite code, the same module with identical inputs was correctly decoded when beliefs from other modules were provided (see Figure 5, bottom right). Thus, the lower $Pth$ value of the composite code shown previously is due to the additional coupling of multiple modules in the composite code.

### 4.3 Concatenating More Modules Introduces Complex Dynamics

To investigate the dynamics of decoding multiscale neural representations further, the exchanged beliefs for a representative simulation were traced during the decoding iterations from the beginning until the convergence. These transient dynamics were analyzed in two scales: local dynamics in the intramodule beliefs ($\rho $) and global dynamics in the intermodule beliefs ($\alpha $). More specifically, changes in $\rho $ reflect the dynamics due to the local competition between the pair of neighboring modules. Conversely, the $\alpha $ contains information about more global dynamics; the result of the local competition among $\rho $ gives rise to $\alpha $, which is passed from one pair to another pair. Hence, beliefs $\rho $ and $\alpha $ demonstrate the local and global dynamics of decoding multiscale codes, respectively.

#### 4.3.1 Two-Module Codes

These different convergence characteristics of the two-module codes could be explained by the different spatial periods. Because $C12$ has the largest spatial periods among the two-module codes, it contains information about the location in a course scale, consistent with the smallest number of peaks in $\alpha 21$ in Figure 4. Thus, $C12$ suffers from a lower level of ambiguity than the other two-module codes for decoding spatial location from the noisy phases. This causes faster convergence to a larger belief value.

#### 4.3.2 Composite Code

The composite code showed more complex decoding dynamics. The left-hand panel of Figure 6B shows the chosen intramodule beliefs $\rho n,n+1*$ as a function of iterations. The intramodule belief $\rho 12*$ of the composite code (blue solid line), which corresponds to $\rho 12*$ of the two-module code $C12$, increases faster and converges earlier than any other module pairs. Conversely, the intramodule belief of $\rho 34*$ (green solid line) decreases in the early iterations before increasing and saturating to a certain value. However, this local decrease in the chosen belief for a pair of modules results in an increase in the sum of beliefs for all pairs (black dashed line).

One interpretation of this global increase together with occasional local decreases in beliefs is that the local beliefs of $\rho 34$ are less reliable than other local beliefs and produce a smaller intermodule belief $\alpha 43$, which contributes less to the sum of beliefs. This allows the decoding algorithm to make a slightly poorer choice locally for modules $G3$ and $G4$ when a greater gain occurs in the other modules, $G1$ and $G2$. This is consistent with the intuition that modules having smaller spatial periods contain information about the spatial location in finer scales, but at the cost of higher probabilities of threshold errors. This flexible dynamics is due to the belief propagation across more modules with diverse scales, which prevents decoding failure in the case of the composite code.

Such a coupling effect was clearly seen in the trace of the chosen intermodule beliefs during decoding. The right-hand panel of Figure 6B shows the selected beliefs $\alpha n,n+1*$ as a function of iterations for different pairs (blue, red, and green lines) and the sum (black line). In contrast to the smooth increase in $\alpha 21*$ of the two-module codes (see the right-hand panel of Figure 6A), the beliefs from the same modules of the composite code show more complex dynamics (see the right-hand panel of Figure 6B). Specifically, in the first few iterations, $\alpha 23*$ (the output belief of $G1$ and $G2$, which corresponds to $\alpha 21*$ of $C12$) rapidly increases (red line), contributing to the swift increase in the sum (black line). During later iterations, $\alpha 34*$ (the output belief of $G2$ and $G3$, which corresponds to $\alpha 21*$ of $C23$) and $\alpha 43*$ (the output belief of $G3$ and $G4$, which corresponds to $\alpha 21*$ of $C34$) increase faster than others in an arbitrary order. Such irregular changes in the selected beliefs $\alpha n,n+1*$ demonstrate more complex interactions between modules with different scales in the composite code.

## 5 Discussion and Conclusion

The consideration of both the encoding and the decoding processes in a unified framework allowed us to investigate the multiscale neural representation of self-location in the EC under more realistic conditions than those used in previous studies. For the encoding model, multiple modules of grid cells having geometrically scaling spatial periods were used to match previous experimental observations. For the realistic decoding model, a belief propagation algorithm was used with sparse connectivities between modules.

The key insight obtained in this study is that multi-scale neural codes yield synergy effects by combining heterogeneous modules, which were not explained by individual module pairs. The effect of coupling more modules was investigated by comparing two-module codes and their composite code with four modules. Numerical simulations showed that the $Pth$ value of the composite code was lower than those of the component codes. To understand this coupling effect further, static and dynamic analyses were performed for beliefs after convergence and for dynamics during transient beliefs, respectively. The implications of the individual findings are further discussed as follows.

First, the lower $Pth$ value of the composite code than of the constituent codes demonstrates the positive synergy effect of the modular structure. This is consistent with the results of existing theoretical studies, which predicted that the multiscale representation with multiple modules in the EC gives rise to strong error-correcting capabilities. However, in previous studies, only the encoding structure was considered, and asymptotic analysis was performed under the assumption of a sufficiently large number of modules. In this study, the existing theory was extended in two directions. First, we included both the encoding and the decoding stages and confirmed that the multiscale neural representation can be efficiently decoded by downstream neurons. Second, numerical simulations were performed with a realistic number of modules and sparse connectivities across modules.

The convergence analysis shows that a composite code is more than the sum of its constituent codes. The belief-propagation decoding algorithm showed good convergence to a fixed point, identified by a unimodal peak in the intermodule beliefs $\alpha $ after 20 iterations. When the two-module codes were decoded separately, only some of the codes were correctly decoded. When the two-module codes were combined to form a composite code, the beliefs were propagated across all four modules. This led to different maxima in the intermodule beliefs $\alpha $, which coincided with the true values. For the same input and noise values, decoding failed, because of a threshold error, for two-module codes but succeeded because only local errors occurred, for the composite code.

Coupling more modules yields more complex dynamics due to the interaction among heterogeneous modules. The diversity in spatial periods allows different modules to convey complementing information about spatial location in terms of the phases with respect to different spatial periods. Consequently, different modules produce diverse beliefs, the interaction of which produces unusual dynamics during the decoding of multiscale codes. For example, at a certain iteration, the beliefs from a module may be the most reliable and could dominate the increase in the total belief. However, as beliefs are propagated across modules, a different module may provide stronger evidence at another iteration. Thus, the heterogeneous structure of GPCs allows the most informative module to make the largest contribution at each iteration, which leads to an increase in the total belief.

This flexible interaction during decoding facilitates efficient and effective decoding of multiscale codes for spatial location. The modules with longer spatial periods ($\lambda n$) have a smaller $L$ but larger $dmin$, whereas those with shorter $\lambda n$ have a larger $L$ but smaller $dmin$. The former conveys coarser information about the encoded variable $x$ but is robust to threshold errors, whereas the latter provides information about $x$ in a finer scale but is susceptible to threshold errors. Thus, when module pairs are decoded separately, $x$ can be recovered only in coarse scales from the modules with a longer $\lambda n$, because GC modules with a shorter $\lambda n$ tend to produce frequent threshold errors and provide little information about $x$. In contrast, when heterogeneous GC modules are coupled and beliefs propagate across all the modules, intermodule beliefs from the GC modules with a larger $\lambda n$ selectively increase the intramodule beliefs of the GC modules with a shorter $\lambda n$ corresponding to the vicinity of the correct $x$. As a result, the beliefs of GC modules with a shorter $\lambda n$ converge to the correct value, and fine-scale information from these modules can be extracted without a threshold error.

We proposed a unified framework for the multiscale modular neural representation of self-location. In our study, end-to-end numerical simulations, which included both the encoding and the decoding stages, were performed. A holistic understanding of spatial information processing was enabled by adopting a distributed decoding algorithm based on simple computations involving local beliefs and their propagation over sparse connectivities. This unified framework provides a concrete computational model for the interaction between the EC and HPC.

The proposed model for the information flow from the EC to the HPC is tightly integrated with the hippocampal anatomy. Specifically, the grid cell modules with different spatial periods in the superficial layers of the medial EC encode the animal's spatial location (Hafting et al., 2005; McNaughton et al., 2006; Witter & Moser, 2006; Moser et al., 2008). These grid cells project axons to the dentate gyrus and the CA3 area of the HPC through the perforant path (Amaral & Witter, 1989; Witter, 1993). This input from the EC to the HPC was modeled as the linear summation in previous studies (McNaughton et al., 2006; Solstad, Moser, & Einevoll, 2006). We extended this simple readout model with the sparse feedforward network to incorporate the localized connectivities across the dorsal-ventral axis (Amaral & Witter, 1989; Witter, 1993). Thus, the sparse feedforward network produces local likelihoods based on only a few grid cell modules, which are iteratively combined by the recurrent network. It is postulated that this recurrent network is located in the CA3 of the HPC, where neighboring pyramidal cells are densely interconnected (Cajal, 1893; de Nó, 1934; Andersen, 2007).

An additional new contribution of the unified framework is the determination of a more principled involvement of the recurrent network in the decoding process. In contrast to a simple rectification of linear inputs (McNaughton et al., 2006; Solstad et al., 2006) or a separated winner-take-all dynamics (Sreenivasan & Fiete, 2011), the recurrent network investigated in this study iteratively combines the local likelihoods provided by the sparse feedforward network and produces an estimated spatial location. This iterative update process is mathematically derived from the belief propagation algorithm (Yoo & Vishwanath, 2015), and a more biologically plausible iterative decoding has also been demonstrated (Yoo & Kim, 2017).

An additional contribution of this letter is an explanation of the different dynamics of the EC and HPC networks. Whereas grid cells in the EC very quickly form spatial fields in a new environment, place fields in the HPC appear considerably more slowly (Leutgeb, Leutgeb, Treves, Moser, & Moser, 2004; Hafting et al., 2005; Colgin, Moser, & Moser, 2008). The proposed model provides a concrete explanation for this difference. Sensory inputs from the cortex converge to the medial EC, where the multiscale modular representation of spatial location is formed (McNaughton et al., 2006; Moser et al., 2008, 2014). In each module in the EC, the network response with a fixed scale is readily formed by sparse but strong inputs that have local inhibitions and are updated by self-motion cues (Fyhn et al., 2004; Hafting et al., 2005; McNaughton et al., 2006; Fiete et al., 2008). However, in a novel environment, sensory information is not sufficient, and the spatial phases of some modules may be incorrect even if the network response in each EC module is stable and produces reliable spatial fields.

We postulate that this inconsistency across modules in the EC is resolved by iterative decoding in the HPC. Immediately after the animal is exposed to a novel environment, spatial phases in the EC are incompatible, and therefore the HPC fails to decode the spatial location, corresponding to a threshold error. As the animal is familiarized with the new environment, the phase discrepancy in the EC becomes smaller, to be decoded without a threshold error. Even after the discrepancy in the EC phases becomes smaller, the resolution of some offsets in the phases by the iterative decoding algorithm takes more time, as shown in Figure 6B. A stable spatial field in the HPC would appear only after such conflicting beliefs are resolved. Thus, the inclusion of both the EC and HPC in a unified framework allowed us to propose this interpretation of the different dynamics of the biological networks in the EC and HPC.

Our future work will extend the unified framework by including feedback from the HPC to the EC (Kloosterman, Van Haeften, Witter, & Lopes da Silva, 2003). Closure of the loop in the information flow between the EC and HPC should advance our understanding of the interaction between the EC and HPC for spatial navigation. Consideration of neural encoding and decoding in a unified framework in different areas of the brain and the investigation of the dynamics of information flows could be additional directions for future work.

## Notes

^{1}

Because only two modules exist for simulations with $N=2$, one pair of modules exists. However, an identical module pair may contain different beliefs depending on the direction of belief propagation. Specifically, $\rho 12$ and $\rho 21$ correspond to the same module pair but depend on different external beliefs $\alpha 12$ and $\alpha 21$. The latter contains more “processed” information, because beliefs are updated in the following order: $\rho 12\u2192\alpha 21\u2192\rho 21\u2192\alpha 12\u2192\rho 12.$

^{2}

Beliefs were propagated sequentially for simplicity. In an alternative implementation, each module pair could update beliefs in parallel based on the beliefs from neighboring module pairs in the previous iteration. This difference is similar to that between the Gauss--Seidel and Jacobi methods in numerical linear algebra. However, our previous study showed that no qualitative difference exists between the two update strategies (Yoo & Vishwanath, 2015).

## Acknowledgments

This work was supported by an Incheon National University Research Grant in 2018.

## References

*Trends in Neurosciences*,