Grid cells of the rodent entorhinal cortex are essential for spatial navigation. Although their function is commonly believed to be either path integration or localization, the origin or purpose of their hexagonal firing fields remains disputed. Here they are proposed to arise as an optimal encoding of transitions in sequences. First, storage requirements for transitions in general episodic sequences are examined using propositional logic and graph theory. Subsequently, transitions in complete metric spaces are considered under the assumption of an ideal sampling of an input space. It is shown that memory capacity of neurons that have to encode multiple feasible spatial transitions is maximized by a hexagonal pattern. Grid cells are proposed to encode spatial transitions in spatiotemporal sequences, with the entorhinal-hippocampal loop forming a multitransition system.
Decades of research unearthed neurons that represent spatial information. For instance, place cells (PCs) encode mostly singular locations (O'Keefe & Dostrovsky, 1971; O'Keefe, 1979), head direction cells (HDs) show preferential tuning toward head directions (Chen, Lin, Green, Barnes, & McNaughton, 1994; Ranck, 1984; Preston-Ferrer, Coletta, Frey, & Burgalossi, 2016), and grid cells (GCs) fire at hexagonally arranged locations of an environment (Hafting, Fyhn, Molden, Moser, & Moser, 2005). Together, they are thought to form a cognitive map (Fyhn, Solstad, & Hafting, 2008; Moser, Kropff, & Moser, 2008), anticipated by Edward Tolman as early as 1948 (Tolman, 1948).
GCs, stellate cells of the entorhinal cortex, are believed to convey critical metric information during spatial navigation (Fyhn et al., 2008; Moser & Moser, 2008). It was discovered recently that their fields of activity with respect to the environment, called grid fields, are not only hexagonally distributed but also that the sizes of grid fields vary in discrete steps (Stensola, Stensola, Froland, Moser, & Moser, 2012). GCs are typically characterized by their firing field sizes, orientation, and shift of the hexagonal pattern relative to an arbitrary coordinate system. Cells that share the same orientation and field sizes are denoted as belonging to the same grid module. Close to the peak of the hierarchical organization of the cortex, they are considered to be an ideal vehicle to understand abstract cortical representations (Moser et al., 2014). However, the origin and purpose of the hexagonal fields and discrete scales are still insufficiently understood and controversial. While several models propose recurrent dynamics as the origin for the arrangement (Fuhs & Touretzky, 2006; Burak & Fiete, 2009; Couey et al., 2013), others use integration of oscillations to form hexagonal fields (Burgess, Barry, & O'Keefe, 2007). Yet others suggest that the fields form due to spatially modulated afferents and as a result of a self-organization process (Gorchetchnikov & Grossberg, 2007; Kropff & Treves, 2008; Stepanyuk, 2015). Extended overviews of such models can be found in Giocomo, Moser, and Moser (2011); Zilli (2012) and Shipston-Sharman, Solanka, and Nolan (2016).
Most researchers assume that GCs perform one of two functions, although both have subtle, yet significant, issues. First, their hexagonal pattern was reported to be suitable for path integration (Burak & Fiete, 2009) and even provide an error-correction mechanism (Sreenivasan & Fiete, 2011). Real-world experiments showed that such models quickly accumulate noise and require external resetting (Mulas, Waniek, & Conradt, 2016), though. Second, theoretical studies showed that GCs grossly outperform PCs during localization when the location is decoded using Bayesian inference (Mathis, Herz, & Stemmler, 2012; Stemmler, Mathis, & Herz, 2015). However, PCs are known to play an essential role during localization and navigation (Morris, Garrud, Rawlins, & O'Keefe, 1982). It remains unclear why there should be two subsystems for localization—GCs and PCs—especially given that neural networks are energetically expensive (Niven & Laughlin, 2008). In either of the two cases, researchers are in disagreement about how downstream neurons should resolve the ambiguities that are due to the hexagonally repeating pattern. Many models use integration of multiple scales to form PCs (Solstad, Moser, & Einevoll, 2006) and simply add more scales to resolve said ambiguities. This does not appear to be a reasonable or generalizable solution to the issue, as it merely shifts the problem out of sight. Finally, most researchers neglect temporal aspects of spatial information, although the hippocampal formation (HF) is crucial for episodic memories (Scoville & Milner, 1957).
2 Related Work
Long known to be essential for its formation and storage (Scoville & Milner, 1957), the HF was studied extensively in the context of episodic memories (Tulving, 1972; Jarrard, 1993; Buzsaki, 2015), as well as spatial information processing (O'Keefe & Dostrovsky, 1971; Morris et al., 1982; Hafting et al., 2005). Recently several studies have addressed sequence learning or transition encoding. They examined transitions and the stability of sequences during the formation of temporal memories in spiking neural networks (Hayashi & Igarashi, 2009; Hattori & Kobayashi, 2016) and proposed sophisticated models for the acquisition of episodic memories and the interaction of subareas within the HF (Cheng, 2013). Others proposed that spatial transition encoders are particularly useful during path planning operations and presented a biologically plausible model that was evaluated using a robotic platform in real-world scenarios (Cuperlier, Laroque, Gaussier, & Quoy, 2004; Cuperlier, Quoy, Giovannangeli, Gaussier, & Laroque, 2006). Later, this model was extended to select optimal trajectories from a number of candidate solutions using RF (Hirel, Gaussier, Quoy, & Banquet, 2010). However, none of these studies examined the optimality of transition encodings.
While many models use the integration of ego motion for the formation of hexagonal grid fields (Zilli, 2012), some researchers suggest that their origin lies in afferents from spatially modulated inputs. For instance, rate adaption was used successfully for the stable formation of hexagonal grid fields using PC-like presynaptic activity in Euclidean space (Kropff & Treves, 2008). Others used dendritic computation to cover a spatially modulated input space hexagonally and self-organization principles to arrange several GCs coherently to form proper grid modules (Kerdels & Peters, 2013). Unfortunately none of these studies addressed the concerns already mentioned.
3 Episodic Memories and Multitransition Systems
A recent study observed preplay of PCs to nearby target locations while animals were at rest (Pfeiffer & Foster, 2013). Presumably some form of mental travel or route selection, this sparked the following motivating idea. When navigating to a destination, an animal travels along a trajectory of intermediate places. To plan its trajectory, it requires accumulated knowledge about such locations, evidently memorized and processed in PCs (O'Keefe & Dostrovsky, 1971; Moser, Rowland, & Moser, 2015). However, and at least equally important, it needs insight into feasible (spatial) movements between them. In the basic case of minimizing distances toward targets, the animal also requires knowing approximately how far apart the places are. Without any a priori knowledge about this information, the animal needs to sample its surrounding and learn the relationship between places and, later, during recall, infer these distances using the accumulated data. The acquisition of such data may have happened during an exploration phase and should have happened in some optimal, but also general, manner to apply to arbitrary environments.
3.1 Model and Methods
The novel computational model to solve this task is depicted in Figure 1. It proposes that the entorhinal-hippocampal loop forms an interacting hierarchy of computations with the purpose of optimally storing and retrieving spatiotemporal sequences. Inputs (black arrows) from a suitable sensor space (e.g., boundary vector information, bottom row) or other spatially modulated neurons directly project to PCs that learn to represent locations and rewards (second row; receptive fields indicated as blue circles). Furthermore, movements or transitions between locations are memorized and retrieved in two different layers of the hierarchy. One computational layer (top row) records episodic memories of actually performed sequences—for instance, that they can be replayed in order. Another layer (third row) learns the relationship between spatially close locations. While learning temporal adjacency is supposed to require only interactions between PCs and transition encoders (arrows between top and second layers), acquiring knowledge about spatial transitions requires projections from the sensory representation as well as afferents from the PC layer to bind the appropriate spatially close PCs (arrows from bottom row to third and arrows between the second and third rows).
The remainder of this letter examines the logic and memory consumption of transition storage in this model from a mathematical point of view. The axiomatic system it introduces uses symbols to represent spatial locations and transitions to model movements from one symbol to another, both well known in computer science from automata theory, labeled transition systems, or Markov processes. Note, however, that the sensory input space (see the bottom row of Figure 1) is not explicitly modeled. In fact, any location is assumed to induce a unique sensory representation. The consequences as well as the plausibility of these abstractions and the simplification of the input space are discussed in section 4.
The deliberately abstract notion that is used for the analysis that follows allows treating goal-directed navigation as a particular instance of a general memorization task: formation of episodic memories. The results presumably apply beyond spatial navigation or the entorhinal-hippocampal loop. Furthermore, it enables reasoning about the algorithmic level of the computation independent of the physical realization. Throughout this letter, several brief examples and discussions facilitate understanding the notation and logical analysis, both of them inspired by communicating sequential processes (CSP; Hoare, 1978), the analysis of time in distributed computing systems (Lamport, 1978), and the analysis of causality in theoretical computer science (Halpern, 2015).
3.2 Symbols, Alphabets, and Sequences
Consider an animal that moves across three rooms. The trajectory of the animal can be described by the sequence of symbols , each representing one room. The meaning of a symbol is not predetermined, though; rather, it depends on the system being analyzed. For instance, symbols could also represent the event of perception of each corresponding room, particular views of a room or objects within a room, or other modalities as long as they are distinguishable. Moreover, symbols can describe various other forms of sequences, such as the production of a particular series of sounds or steps to perform a certain action. Although the examples in this letter use spatial navigation, the formal system generalizes to other applications.
The entirety of symbols forms an alphabet and their consecutive ordering a sequence, both captured in the following:
(alphabet and sequence). An alphabet is a finite set of symbols. A sequence (or word) is an ordered tuple of symbols , that is, , where is the Kleene plus operator.
Thereby a trajectory of an animal is described by a sequence of symbols, just as motivated in the introductory example. However, immediate repetition of a single symbol is disallowed, formally specified as follows:
(nonstationarity). A sequence is locally nonstationary if any two successive symbols and are distinguishable, that is, .
This constraint is inspired by neural dynamics. Specifically, the refractory period of neurons prohibits a continuous representation of a state by a single neuron during short timescales. Also, it is behaviorally relevant for the generation of a sequence. Consider an animal that tries to reach a certain goal location under the pressure of a nearby predator; it needs to find a sequence where symbols correspond to locations. If the animal were to recall a sequence that contains repetitions of locations, it would likely come to a halt at a repeated symbol and fall prey to the predator.
However, the axiom does not limit general capabilities. Two consecutive but distinct symbols of a sequence can have the same associated meaning, depending on the sequence that needs to be encoded. For instance, the perception of a certain room or location within a room in the case of spatial navigation or a certain actuator state in the case of motor commands could be encoded in two consecutive symbols. Generally the associated meaning of a symbol is independent of the symbol itself.
Moreover, this constraint does not introduce explicit information about real time or prevent repetition of a symbol within a sequence at a later point. In fact, the axiom requires only that any two consecutive symbols are different. Consider an animal that stays in a room for a longer period of time and records its movement in terms of distinguishable locations. Without any additional information or an extension to explicitly incorporate real time, the recorded sequence contains only symbols of consecutive places that the animal's perceptual system can differentiate, regardless of when the change happened.
The directional ordering of a sequence is expressed using the arrow notation , which ignores time. In fact, the temporal order of evaluation needs to be stated explicitly, as will be shown further below. Thereby, symbols and transitions form propositions. Consider the example , which means that the symbol causally follows after symbol —in other words, if is true, it follows that is also true. Conversely, if is false, so is . Hence they form a chain of causality. In addition to , the arrow exists; for example, means that there exists a path from to that bridges intermediate symbols. The negations of the notation are and . Finding a path to a target requires the following constraint, though:
(coherency). Let be a sequence of symbols. is coherent if and only if .
Coherency is necessary for goal-direction navigation. Consider an animal that intends to travel to a remote target. In terms of the basic idea for this letter, it has to plan a trajectory without any significant gaps. Otherwise, it will get stuck or lost, and it may express undefined behavior or displacement activity. When the animal is not navigating to a specific goal, it is assumed that novel symbols are acquired for future planning operations by explorative movements. The animal's task is thus either to find a valid sequence to its target or acquire more knowledge.
This is not to be confused with definitions of automata in computer science. As defined above, a coherent sequence has symbols that deterministically follow one after the other. However, multiple sequences between two symbols may exist at the same time or different sequences may be generated at different times. It is therefore possible to specify, for instance, a nondeterministic automaton or Markov process that accepts or generates a coherent sequence, respectively.
(validity). A sequence is valid or acceptable if it is both nonstationary and coherent.
Using these notations and axioms, goal-directed navigation from a start to a goal is a program that expands the path into any valid sequence , if it exists. The next sections describe how to expand a path, give several examples, and address the memory requirements for storing transitions.
3.3 On Universal Multitransition Systems
The arrow notation specifies relations between two symbols. Consider the transition , which can also be written as the tuple . The concept of transitions is well known—for instance, from reinforcement learning (RL; Sutton & Barto, 1998 or computer science (Van Benthem & Bergstra, 1994; Thomas, 2006). There, it is usually denoted as a transition function mapping one state to another given a set of actions — (Sutton & Barto, 1998). This concept will now be extended to allow simultaneous encoding of multiple feasible transitions. The motivation for the extension will be stated further below.
Set is called a transition set and contains sets , called transition bundles. In turn, a transition bundle is a set of transitions , called th transition point of .
Indices will be dropped if they are clear from the context. In addition, the following terminology and notation will be used:
A transition from to can be written or .
is defined for and leads to, written and , respectively. The notation is transitive to bundles and sets: , and , respectively.
A bundle forms a tuple with start and target symbols and , respectively.
If a transition bundle is true, then so are all contained transitions .
Transitions form propositional terms that are independent of symbols. Consider the symbol and the transition —both of them propositions—in the expression . If is true, it can be deduced logically that is also true, written . Here, is the logical and operator, and forms a precondition for the transition . As is true, the precondition is met and so the transition is also true. Hence, is the conclusion of the entire term.
Since order of evaluation is not specified during logical deduction, sequential evaluation of transitions is made explicit as follows:
The following section gives examples for transition evaluation using these definitions and discusses how they can be implemented in principle in a neural network.
3.4 Interpretations and Implementations of Universal MTS
The example presented in Figure 2 resembles the parallel execution of a breadth-first search in a directed graph and exposes a relationship to message-passing algorithms such as belief propagation in factor graphs. Note that a specific sequence is neither selected in the figure, nor is such a technique presented, as this would require some form of reward mechanism, which is beyond the scope of this letter. In principle, however, this could be implemented with well-known algorithms such as Dijkstra's algorithm or the Bellman-Ford algorithm. The latter was used in Hirel et al. (2010), who also proposed that the HF stores transitions and presented a biologically plausible implementation thereof.
The other panels of Figure 3 depict possible wiring diagrams of individual neurons of the two memories and . Note that the diagrams ignore any inhibitory interneurons, assume that each connection is subject to a temporal delay such as axonal transmission, and depict only a few connections to reduce the complexity of the drawings. In each panel, neurons of require local recurrent excitation to form an autoassociative memory.
The schematics differ in the way the recurrent connectivity from to is implemented. Direct coupling, as depicted in panel B, appears to be unlikely as such a system would ignore the start of a transition. More likely candidates are shown in panel C, which uses a heterosynaptic connectivity (indicated as two lines converging on a singular triangular endpoint), or in panel D, which uses an interneuron to integrate the state of the symbol for which a transition is defined as well as the transition. Recall that for a succeeding symbol to activate, both the symbol for which a transition is defined and the transition itself need to be logically true. Hence, the computation performed by the heterosynaptic connectivity or the interneuron is the logical and, which was used to evaluate transitions and symbols as propositional terms in section 3.3. In a spiking neural network implementation, this would require the spike times of both symbol and transition neurons to appear in a suitable temporal integration window to either fire the interneuron or activate the next symbol using the heterosynaptic connection. The study of necessary temporal dynamics as well as a concrete implementation of such a network will be left for future work.
In the general case, receptive fields of neurons in depend on external excitatory afferents and are subject to the meaning associated with symbols. Neurons in , however, coactivate with neurons of and thus inherit the receptive fields of . Consider the example where neurons of represent places; that is, they form PCs. Their receptive fields are driven by spatially modulated input and generate distinct place fields. Although neurons of do not receive spatially modulated input, they will express place fields due to their coactivation with PCs of .
Networks of the form illustrated in Figure 3 have already been reported in the literature, often to model the behavior of synfire chains (Abeles, 1991). A particularly appealing network model thereof was presented in Wennekers and Palm (2009), as it is not only structurally similar to the proposed implementation depicted in panel A of the figure. It was also used to generate syntactic sequences.
One important question that needs to be answered is which of the depicted connectivity schemes appears in the HF. Le Duigou et al. (2014) observed that local recurrent excitation of pyramidal cells in cornu ammonis 3 (CA3) is weak, whereas interneuron excitation appears to be quite effective. Hence, their finding is in favor of the implementation depicted in Figure 3D. Nevertheless, the variant depicted in Figure 3C is also possible, and additional studies are required to further verify if, and if so, which type of implementation is present in local microcircuits of the HF.
It also remains to investigate how many neurons are required to store either symbols or, more important, transitions of a universal MTS. This is the focus of the next section.
3.5 Encoding Capacity of the Universal MTS
The definition of introduced a bundling trick. Transition bundling provides several benefits when analyzing the computational logic and storage requirements of an MTS, especially in the light of neural encodings. Consider the following thought experiment. Suppose that the generation of a bundle (e.g., a neuron) is energetically expensive; however, the addition of a transition point (e.g., a dendritic spine) to an existing bundle is comparably cheap. To avoid evolutionary pressure (Niven & Laughlin, 2008), the goal is thus to minimize the overall cost. This corresponds to maximizing the number of transition points while minimizing the number of bundles. As will be shown now, it is not possible to merge arbitrary transition points in one bundle without violating the axioms already introduced.
Let , an MTS on the alphabet , the corresponding transition set, and a transition bundle. generates valid sequences if and only if the following conditions hold:
The sets of symbols for a must be mutually exclusive: .
(1) From axiom 1, it follows immediately that any transition that is defined for and leads to violates the nonstationarity condition. (2) Without loss of generality, consider the three symbols and but . This yields the transition points and . Assume further that and are bundled in and that and are true. It follows that . However, and thus . This contradicts the assumption and violates the coherency constraint.
(minimality, universality). An MTS is minimal if there exists only one for any : for any . In a universal, any arbitrary transition between two symbols is possible.
The input set of a transition bundle is singleton for a minimal and universal.
and . According to theorem 1, .
Let be an alphabet of size and a transition set of transition bundles for a minimal universal in which a transition between any two symbols is feasible. Then .
The corollary can be proved by reduction to the graph-coloring problem. In this problem, each node of a graph is assigned a color such that no two neighboring nodes share the same color (Cormen, Leiserson, Rivest, & Stein, 2009). The number of required colors is called the chromatic number of the graph.
Construct the graph of in which each transition is represented by a node and any symbol by a directed edge. is a complete digraph; that is, each pair of nodes is connected by a pair of directed edges. By merging any such pair of directed edges, can be reduced to a simple complete graph.
According to theorem 1, for any . Therefore, only those transitions can be bundled that are not connected by an edge in . The number of independent nodes in is equivalent to the chromatic number of the graph, which is equal to the number of nodes in a complete graph (Cormen et al., 2009).
3.6 Multitransition Systems in Euclidean Space
The space that is constructed by symbols and transitions above is the discrete topological space with the induced discrete metric. However, the world in which animals reside is not discrete, and, more important, arbitrary jumps between any two locations are infeasible. In particular, the perceived environment is a complete metric space, the Euclidean. For brevity this will simply be called metric space from now on. Hence, an MTS that encodes transitions in a metric space has different constraints than a universal MTS does.
Encoding transitions between locations in a metric space depends on the detection of two consecutive positions. The following analysis is based on the assumption that there exists a continuous signal that depends on and uniquely identifies each possible location of the animal. In terms of the Euclidean space , this corresponds to locations . Certainly an animal does not have access to coordinates; however, other stimuli are likely to provide the necessary information. For instance, geometrical information combined with head direction signals is sufficient to represent singular locations, as demonstrated in the boundary vector (BV) cell model (Barry et al., 2006).
According to definition 1, an alphabet is finite. This can be understood to correspond to a finite number of neurons that have to represent locations. However, the alphabet of spatial symbols has to represent the continuous signals of the input space . Clearly, this corresponds to the well-known sampling theorem.
(spatial symbol, enablement, and assignment). Let be spatial symbols according to a sampling process of a complete metric space . Each is centered at a .
A point enables if it is within the support of given by the open ball of radius : .
The point is assigned to the closest : for which is minimal. Given two adjacent symbols , then , describing a ball of radius .
The definitions of enablement and assignment can be interpreted in the following way. The region in which a spatial symbol is enabled can be understood as its receptive field. In contrast, assignment identifies the closest symbol—for instance, as a result of a WTA mechanism. Due to the definition, spatial symbols are allowed to have overlapping receptive fields. Nevertheless, a single symbol is representative for any location, and transitions can be detected and learned when the winning symbol changes. Before examining the optimal distribution of spatial transition bundles, it is necessary to determine the placement of symbols.
According to the Petersen-Middleton theorem (Petersen & Middleton, 1962), the ideal sampling strategy for two-dimensional continuous signals and therefore placement of spatial symbols is a hexagonal arrangement. From a different point of view, the sampling process can be understood as a solution to the problem of packing spheres with diameter as densely as possible. The sphere-packing problem also yields a hexagonal lattice in the two-dimensional case (Conway, Sloane, & Bannai, 1987; Leech & Sloane, 1999).
Given such an ideal sampling process for spatial symbols, the optimal distribution of spatial transition bundles follows immediately.
Let be a Euclidean space. Let be a minimal transition system on such that the countably finite alphabet corresponds to the densest optimal covering with respect to :
The number of transition bundles is constant.
The occurrence of any transition bundle is periodic.
The theorem is proved by its corresponding graph-coloring problem, which was introduced above.
The densest arrangement of spatial symbols according to the Petersen-Middleton theorem is a hexagonal lattice (Petersen & Middleton, 1962). Furthermore, transitions between symbols are possible only between adjacent symbols. Consequently, the corresponding transition graph is not complete; only neighboring transitions are connected. Due to the hexagonal arrangement of symbols, the chromatic number of the resulting graph is 3, and the occurrence of colors is periodic.
The following section presents distributions of symbols in environments that are commonly used during rodent experiments and suggests a neural implementation for the dendritic computation.
3.7 An Online Method for Dense Packing of Spatial Symbols
An animal, however, does not appear to have access to global information or a global optimization procedure a priori. In contrast, a (near) optimal distribution of spatial symbols needs to be established while the animal is actively exploring an environment.
A novel axiomatic system was used to investigate transition encoding in arbitrary and spatially confined sequences. Moreover, possible neural implementations were presented, and simulations showed how spatial symbols optimally arrange in two-dimensional environments.
Although the model was presented in the context of spatial navigation, the results of both the universal MTS and the spatial MTS are general and possibly apply to other representations. In particular, any system in which transitions in arbitrary spaces need to be encoded suffer from the results obtained for the universal MTS. Systems in which the input can be mapped to a Euclidean space and where transitions should be bundled optimally express behavior like the spatial MTS. Observations of GCs in other brain regions that process sequences are therefore expected.
In the following, the benefits of hexagonal packings and reasons, as well as implications for a separation of the representation, will be discussed. First, however, the results of the N-body simulation will be used to derive a neural model for a spatial MTS, which will then be integrated with one of the proposed computational models for a universal MTS of Figure 3.
4.1 Proposed Neural Model of Spatiotemporal Sequence Encoding
Although not a biologically plausible neural network, the results of the N-body simulation can be used to guide the design of a suitable neuron model. Note that the system dynamics of the model that I propose will be described and evaluated in detail only in a following paper in the series. Nevertheless, it is included here to show how a neural network in principle can implement an MTS for spatiotemporal sequences.
Recall the functional levels of the model presented in Figure 1 and described in section 3, where it was suggested that GCs learn transitions in spatiotemporal sequences and bind the appropriate PCs that are spatially nearby. In other words, GCs need to learn about spatial relationships given suitable sensory information and convey this information to PCs to which they are recurrently connected. This not only learns spatial transitions but also decouples PCs from explicit information about spatial relationships. This latter point is particularly beneficial during recall and is discussed in section 4.4. To perform this task, GCs are proposed to learn a dense packing of spatial symbols in a suitable sensory space as part of their dendritic computation during exploration of an environment. Specifically, assume that GCs express multiple receptive fields that behave the same as the symbols of the N-body simulation with only a minor extension: the receptive fields represent transition information based on dense packing of symbols.
What do these receptive fields look like, and how can the dendritic computation self-organize appropriately and biologically plausibly within both a single cell and a network of GCs?
Local recurrent connectivity with on-center and off-surround regions is common within continuous attractor neural network (CAN) models of GCs (Burak & Fiete, 2009; Shipston-Sharman et al., 2016). In particular, a recently published model found that precisely the proposed form of local center-surround interactions leads to stable formation of grid fields (Weber & Sprekeler, 2018). In contrast to these models, however, the purpose of GCs is suggested not to be localization or integration of distances. Rather, a single neuron is suggested to encode as many transitions as efficiently as possible. Hence, a network of GCs assigned this task also needs to adhere to the constraints of MTS.
The local microcircuit of GCs needs to establish WTA dynamics to minimize the number of transition bundles and reduce the ambiguities of transitions. Fast local recurrent inhibition appears to be an ideal solution, as it avoids computational complexities and, in particular, temporal delays that would be the consequence of mechanisms that compare firing rates. Also, it is likely to align the responses of a network of GCs. Local inhibition is well supported by the findings of Couey et al. (2013), who found that the predominant interaction within the entorhinal cortex (EC) is via inhibitory interneurons.
Finally, GCs are suggested to interact with PCs similar to the way they do with episodic transition neurons. The proposed model with all local interactions is presented in Figure 7B, which uses interneurons to compute the logical and operations that are necessary during evaluation of transitions.
4.2 Requirements and Predictions for Neurons and Dendritic Trees
Following corollary 2, an implementation of a universal minimal requires as many entities to store transition bundles as it has symbols. Due to the dependence on their associated symbols during learning or expansion of a path, transition neurons inherently coactivate with their symbols. This is also visible from the proposed neural model depicted in Figure 3. While symbols receive external afferents, transition encoders only receive feedforward drive from and recurrently project back to symbols. Hence, this could provide an explanation for the two types of PCs found in regions cornu ammonis 1 (CA1) and CA3: one population of PCs acts as spatial symbols, whereas the other encodes temporal transitions. A qualitative change to one population is therefore immediately reflected in the other. This could potentially lead to novel insights into PC remapping (Colgin et al., 2008; Solstad, Yousif, & Sejnowski, 2014), or when and how it is induced. In addition, the difference between place remapping and grid realignment (Fyhn, Hafting, Treves, Moser, & Moser, 2007) is expected to be a result of their independent input sampling processes and the suggested abstraction layer that GCs provide. Due to the proposed separation of temporal and spatial transitions, place remapping, and thus recall of a different set of temporal sequences, can be performed independent of remapping spatial transitions.
To decorrelate from their target symbols (i.e., to fulfill theorem 1), transition neurons require individual receptive fields per branch and complex dendritic computations. Several recent studies believe that neurons can express this form of sophistication. For instance, dendritic spines were found to express individual structural plasticity (Bosch & Hayashi, 2012), as well as local synaptic plasticity (Segal, 2005; Cichon & Gan, 2015; Weber et al., 2016). Furthermore, dendrites were found to be capable of encoding multiple sensory stimuli (Varga, Jia, Sakmann, & Konnerth, 2011). In combination with the complex intrinsic organization of the EC (Canto, Wouterlood, & Witter, 2008; Witter, Doan, Jacobsen, Nilssen, & Ohara, 2017), it appears likely that GCs perform multiple distinct computations in and self-organization of their dendritic tree corresponding to the proposed multiple receptive fields for transition encoding. So far, it is unclear if the ideal sampling that is the basis of the proposed self-organization should also be reflected in the activity of PCs. If this were the case, PCs should express peak activities that are distributed hexagonally in the absence of other spatial cues.
The model assumes that sensory afferents from BV and HD cells lead to unique sensory representations from which spatial symbols were sampled. Therefore, these cell types directly influence the formation of symbols. In fact, sampling symbols from a sensory space that is spanned by BV and HD cells could explain the findings by Derdikman et al. (2009). They reported that GCs repeat their grid fields in every other corridor of a hairpin maze. The common features in every other corridor are the movement direction in which the animal was running, as well as the HD-dependent geometry of the corridor. Moreover, this could explain the results presented by Krupic, Bauza, Burton, Barry, and O'Keefe (2015) and Krupic, Bauza, Burton, and O'Keefe (2016), who discovered that the geometry of an environment influences the responses of GCs. It is important to note that because the N-body simulation sampled directly from Euclidean coordinates and not from a BV-HD-space, it did not yield any deformations or displacements of fields.
Although the optimality results for MTS were obtained for packing symbols and not entire transition encoders, the results apply to GCs in the following way. The on-center and off-surround receptive fields are ideally circular in spaces that provide unique sensory stimuli due to the Peterson-Middleton theorem. Therefore, their densest packing also follows the sphere packing problem. Furthermore, each part of the proposed neural model (i.e., individual dendrites and the entire neuron) corresponds to entities of MTS (i.e., transitions and bundles). Consequently, the optimality results are believed to be transitive to the proposed neuron model. However, care must be taken with respect to the finite capabilities of a dendritic tree, the sensory afferents, and the discretization of space.
4.3 Discrete spaces, Hexagons, and Sphere Packings
Using symbols in MTS discretizes the input space. Moreover, transitions were assumed to be binary above, that is, indicate only if a transition exists. However, it is conjectured that the obtained optimality results and the Peterson-Middleton theorem still apply even if symbols are not discrete and transitions are associated with a transition probability. Concretely, assume that symbols are represented by bell-shaped tuning curves, similar to optimal representations found in neural networks (Jazayeri & Movshon, 2006; Butts & Goldman, 2006). If these curves are well chosen, the activities of symbol neurons correspond to probabilities. In combination with transition probabilities, this is expected to result in definitions of MTS that are similar, if not identical, to Markov chains or message-passing algorithms. The latter is particularly evident in Figure 2, which shows a bipartite graph of symbols and transitions precisely in the way that factor graphs are commonly depicted and used in probabilistic graphical models (Kschischang, Frey, & Loeliger, 2006). Given symbol and transition probabilities, transition evaluation is thus expected to resemble belief propagation. To conclude this argument, it appears to be feasible to extend the presented discrete MTS to probabilistic MTS.
A notable feature of (discrete) spatial MTS is the minimal number of three transition bundles for continuous spaces. It is important to note that this applies only in the mathematical treatment, where a transition bundle may contain arbitrarily many transition points and symbols perfectly discretize the input space. Furthermore, this would require a perfect WTA mechanism to select the appropriate transition bundle. Instantaneous local inhibition is, however, highly unlikely in a real neural network. It is therefore expected that the amount of overlap between grid fields depends on the temporal delay until local recurrent inhibition acts, which is determined by the synaptic strength between spatial afferents to GCs and the mechanism of inhibition. Moreover, a real neuron is limited in the number of dendrites and synapses, is subject to noisy afferents, and can therefore realistically cover only a fraction of an entire input space. This leads to two important observations.
First, it is expected that the number of GCs found in the rodent EC depends on the dendritic capabilities and the size of their grid fields, and it follows a law that tries to cover the typical habitual space of a rat. Without exact numbers on available synapses and how the recurrent connectivity with PCs is implemented, it is difficult to predict numbers of expected GCs. Nonetheless, an assessment of expected numbers is presented in section 4.4.
Second, continued exploration of an environment is expected to increase the synaptic strength between presynaptic sensory information and GCs. Consequently, grid fields may be fuzzy and their overlap larger in the beginning of an exploration phase due to uncertainties encoded by low synaptic strengths. Given sufficient exploration and suitable plasticity, synaptic strengths will increase. In turn, this should reduce the time to spike of GCs and, consequently, the latency of local recurrent inhibition. It is therefore predicted that grid fields separate more strongly over time due to faster local inhibition.
An intimately related question to the previous observations is why it is beneficial to encode multiple transitions within a single neuron instead of encoding each transition separately. First, neurons are energetically expensive (Niven & Laughlin, 2008). Minimizing their number appears to be a likely optimization problem that is solved by the brain in an effort to reduce energy consumption. Second, learning transitions requires plasticity suitable for timescales of spatial exploration. Although adult neurogenesis is reported for the hippocampus (Zhao, Deng, & Gage, 2008), only spike-based plasticity rules operate on timescales applicable to spatial navigation and exploration. Consider an animal that explored an environment once and now needs to trace its original trajectory back home. Because the time between exploration and retrieval may be only a few minutes apart, the entire process needs to rely on fast short-term memory. Third, and finally, transition bundles require less physical space. Knowledge of a feasible transition is little more than just a bit of information. If each transition required a separate neuron, the number of required neurons would explode. This is especially true for the universal MTS, where, given symbols, the system would require transition neurons. It is also true, however, for a spatial MTS. Ideally, it would require only 6 transition neurons per symbol. Consider, however, the N-body simulation for the square maze of and symbols of size 20 cm. The converged solution consists of 115 symbols and would require about 690 transition neurons. Extrapolating from these numbers to an area of the size , which is well below roaming areas of rats (Harper & Rutherford, 2016), this would require 1,725,000 transition neurons. Bundling transitions reduces this number significantly. As mentioned above, the true number of required transition bundles is difficult to assess due to missing numbers for the synapse count.
Another important point to address is why dense hexagonal packing of receptive fields is beneficial for spatial navigation or spatiotemporal sequences in general. Consider the information that is encoded in circular receptive fields for transition encoding. It provides information about constant cost operations to interact with surrounding states. This is depicted in Figure 8B for the case of spatial navigation. The figure depicts that given its current location, a transition informs about the surrounding neighborhood with which the animal can interact directly (i.e., with constant cost because distances are uniform in all directions). This drastically simplifies algorithms that work on such data, as they do not have to consider corner cases or distinguish between certain configurations of symbols.
To make this argument concrete, consider an animal that explored an environment and wishes to find its way back home. Ignoring any other reward signals, the animal wants to plan the shortest trajectory to minimize energy consumption. On a flat surface, a basic task for the animal is thus to approximately compute distances between two arbitrary locations. To be useful, however, the mechanism needs to work in any environment without a priori knowledge. Because symbols are circular and densely packed, an algorithm that operates on these data can expect that any two neighboring symbols are equidistant in sensory space. Conclusively, there is no mechanism required to distinguish between different neighboring symbols or to align symbols to the surrounding world. The observed alignment and shearing effects of grid fields with respect to the geometry of environments as reported in Stensola et al. (2015), Krupic, Bauza, Burton, Barry, et al. (2015) and Krupic, Bauza, Burton, & O'Keefe (2016) are, in fact, considered to be an artifact of densely packing spherical receptive fields in a suitable sensor space.
Now consider a different animal for which the packing is not hexagonal but, for instance, a square lattice. The fundamental difference is that the lattice loses the constant neighborhood property, and a transition would not inform about an area that is equidistant in any direction. This is because points across a diagonal of the square lattice are farther away than others. Any algorithm now also needs to know how to reach other symbols depending on their position on the lattice, for instance, via vertex or via edge. Again, this is a result of the nonconstant neighborhood. Finally, it is unclear to which external frame of reference the lattice should be aligned, if at all. Essentially, lattices other than the hexagonal introduce avoidable complexities for spherical receptive fields.
Conclusively, dense packing allows learning information about an environment in a general bottom-up fashion without a priori knowledge and without having to deal with corner cases. Later, this information can be used in top-down algorithms. It is also likely that the information that was acquired in this bottom-up fashion is used, for instance, to prune unnecessary information and learn abstractions during a later stage.
One may wonder if there are other representations that could be used to represent space. For instance, previous work showed how PCs can be used to triangulate exact localizations and how the hippocampus forms a topological map (Dabaghian, Memoli, Frank, & Carlsson, 2012; Dabaghian, Brandt, & Frank, 2014). Still, this method requires a process that learns distances or relations of points in the topological map. An alternative representation is to use the fewest neurons to encode the largest volume of space. This, as well as the previous encoding scheme, would require a distinct decoding mechanism to infer the exact location. An efficient implementation of either approach could use rank-order codes similar to the ones proposed for the visual cortex (Thorpe & Gautrais, 1998), that is, the relative time of spikes of PCs informs about the exact location. This decoding mechanism limits the distance between locations that can be represented, though, because temporal integration windows for decoding neurons are not arbitrarily long. Furthermore, it would require knowledge about distances to properly tune the spike times a priori. Another decoding scheme could infer the location based on rate coding. Spatial navigation, however, is an operation that requires fast execution times—on the order of a fraction of a second—for instance, when finding the shortest path toward a safe location while at the risk of a predator. Encoding transitions explicitly reduces the retrieval time to a bounded factor between neighboring locations (i.e., the axonal transmission time) and yields acceptable times for short and medium distances. In addition, it provides the necessary information about spatial relationships for topological representations without having to deal with the geometry of the environment explicitly. Nevertheless, recursive expansion is an issue for long trajectories and will be addressed in the following paper of the series, which introduces a technique to optimally accelerate retrievals in MTS.
4.4 On the Separation of Spatial and Transitional Information
Sequence and transition learning in the HF was suggested previously (Hayashi & Igarashi, 2009; Hattori & Kobayashi, 2016). However, these studies ignored spatial information or GCs. Furthermore, spatial transitions were the focus of other studies with biologically plausible models of the HF (Cuperlier et al., 2004, 2006; Hirel et al., 2010). Nevertheless, these models did not differentiate between spatial and temporal transition systems, sequences were not defined rigorously, and optimality of transition encoding was not their concern.
Why, though, should there be a distinction between spatial and transitional information in the first place? Several observations are in favor for this. First, numerous failed (and thus unreported) experiments using spiking neuron models indicated that it is difficult to construct a network with only a single neuron population that is capable of both maintaining activity of a representation infinitely but also toggling state transitions arbitrarily. The necessary parameters for a stable network were biologically highly unlikely, especially when neurons were realistically noisy.
Second, the dendritic tree of a PC is certainly not infinite. Thus, the integration of presynaptic afferents is limited by the number of synapses. Given that PCs integrate a plethora of cues, for instance olfactory information (Zhang & Manahan-Vaughan, 2015), receive projections from the entorhinal cortex (Witter, 2007; Witter et al., 2017), and are directly or indirectly coupled with the pre frontal cortex (PFC) (Swanson & Kohler, 1986; Jay & Witter, 1991; Varela, Kumar, Yang, & Wilson, 2014; Ito, 2018), it seems unlikely that there are sufficiently many synapses left to also encode transitions. Certainly the number of synapses is also limited for GCs. In turn, this limitation allows estimating the number of expected GCs depending on the size of their grid fields. Because the number of reported synapses per neuron varies significantly in the literature, the following estimate should be taken with due care. Consider a neuron that has 15,000 synapses, a number on the lower bound of reported synapses for pyramidal neurons in rats (DeFelipe, Alonso-Nanclares, & Arellano, 2002; Markham & Greenough, 2004). Furthermore, assume that not every synapse associates with presynaptic sensory states, but also, for instance, to recurrent projections from place cells and other neurons in the local microcircuit, and that more than a single synapse is required to drive the neuron to its spiking threshold. In the following estimate, it is therefore assumed that only one-fifth of the total synaptic capacity contributes to learning transitions. Hence, a single neuron can bind to up to 3000 individual presynaptic symbols. Now consider the N-body simulation, which packed 115 symbols with a diameter of 20 cm into the square maze, which extrapolates to 287,500 symbols for an area of size . Despite the number of symbols, this requires only 96 neurons that perform transition bundling, which is several orders of magnitude smaller than the 1,725,000 neurons to store each transition individually. Clearly, the estimate lacks knowledge about other properties of presynaptic neurons, such as bursting behavior, which could trigger a postsynaptic spike with fewer synapses, about how many different states the entirety of presynaptic neurons can represent, or if the reportedly rich dendritic organization of cells in layer II of the EC (Lingenhohl & Finch, 1991) exhibits a significantly larger number of synapses. Still, numbers of expected transition neurons are an the order of a few hundred or, at most, a few thousand for realistic or even large environments, which not only suits the number of pyramidal and stellate cells found in the EC, but also that only a few of these cells express grid-like behavior (Gatome, Slomianka, Lipp, & Amrein, 2010). Moreover, the dependency of the number of neurons on the size of their fields could explain the small number of GCs with large grid fields (Stensola et al., 2012).
Third, the architecture of the HF appears to be a combination of auto- and hetero-associative memories (McNaughton & Morris, 1987; Káli & Dayan, 2000; Papp, Witter, & Treves, 2007; Le Duigou et al., 2014). While the first type can be used to maintain and recall memories even from noise inputs (Palm, 1980), the latter is ideal to store state transitions. In fact, a combination of the two was already used to learn sequences via Hebbian plasticity (Wennekers & Palm, 2009).
Fourth, a separation increases fault tolerance and provides computational benefits. If a transition neuron vanishes, the spatial knowledge is retained and vice versa. Furthermore, both PCs and GCs are suggested to independently acquire their representations due to afferents, which carry spatial information as depicted in Figure 7B. As discussed above, while PCs are thought to learn to identify singular locations for both temporal and spatial purposes and thus directly correspond to both temporal and spatial symbols, GCs are believed to not only associate with their presynaptic spatial afferents. Moreover, they associate with coactive PCs. This has the benefit that GCs learn transitions not only in their spatial input space but also in the symbolic space established by PCs. When operating with reduced presynaptic inputs (e.g., in total darkness), the accumulated errors in the GC representation are expected to be reset using strong presynaptic stimuli that override the afferents from PCs, similar to the realignment model proposed by Mulas et al. (2016). Due to this proposed coactivity learning, sensory activation in presynaptic neurons is not required during recall and path planning, as it only requires recursive activation of GCs and PCs. Another benefit of separating spatial and transitional representations is that they can vary independently from each other. This form of abstraction layer is widely used in computer science due to its power and known as the bridge or mediator pattern (Gamma, 1995). More important, though, this predicts two distinct modes of operation: learning and recall. During learning, GCs require sufficient drive from PCs as well as afferents from upstream neurons that represent spatial information. During recall, they can rely only on activation from PCs. Conclusively, GCs or the local microcircuit within the EC are expected to expose a mechanism to toggle their mode between either state. I expect these states to be represented heterosynaptically as logical and or logical or operations, respectively, both of which can be implemented easily in neural networks (Koch, 2004) or via interneurons. Early data suggest that GCs indeed expose at least these two modes of operation (R.G. Morris, private communication, October 6, 2016).
Fifth, and finally, PCs mature earlier during postnatal development than GCs do (Langston et al., 2010; Wills, Cacucci, Burgess, & O'Keefe, 2010). The data suggest that GCs appear as soon as rats start exploring the space around them. This clearly indicates that they perform computations that are relevant only after places can be identified. In particular, temporal sequences, stored in a universal minimal , are considered to be behaviorally more important than spatial sequences to preweanling rats.
4.5 Current Limitations and Possible Extensions
The introduced MTS restricts the behavioral capabilities of animals. In its current form, an animal equipped with a system similar to the one presented in Figure 7 can exactly recall only previously explored trajectories. The reason is that appropriate places need to be visited consecutively to learn transitions, and any additional transition cannot be acquired without another exploration phase. Furthermore, the system cannot find any shortcuts between places that are beyond the distance of two spatial symbols. For instance, assume that an animal walks on a U-formed trajectory in an environment without any walls, where the start and target locations are at the start and the end of the U, respectively. If the two locations are too far apart, the system is unable to recognize that there is a shortcut between the two locations and merely follows the episodic memory. One solution is to use probabilistic symbols and transitions instead of discretized spaces. Then there may be certain nonzero activity of symbols that are far away, depending on the tails of the probability distribution. This may also lead to the discovery of some shortcuts and more realistic trajectories than the exact succession of discrete symbols. Another solution to this problem is to introduce transition encoders that can recognize places over longer distances. This solution solves another issue with transition systems—questionable run-times when evaluating trajectories. It, as well as the extension to probabilistic representations, will be presented in detail in the following paper in the series.
Another shortcoming is the assumption of a process that uniquely identifies any location. However, this simplification allowed assessing transition encoding theoretically in an idealized case without considering precise neural dynamics and will be extended to more realistic input spaces in the future. In particular, the model presented in Barry et al. (2006) already showed that BVs cells contain sufficient information to uniquely encode spatial locations. The evaluated environments were mostly of size and shape that are similar to real-world experiments: square or circular with no or only few obstacles. A spatial sampling process that is inherent in dendritic computations of GCs suggests that their fields depend on the uniqueness of the perceived stimuli, though. Recent studies that showed that GC firing strongly depends on the geometry of the environment are clearly in favor of this assumption (Krupic et al., 2015, 2016).
5 Conclusion and Outlook
This letter proposed that GCs optimally encode transitions in Euclidean space. For this purpose, an axiomatic system was introduced to examine the logic and memory consumption of transition encoding in sequences. Furthermore, a novel bundling trick was presented that allows analysis of entities that are capable of encoding multiple transitions at the same time. Finally, the results of the theoretical derivation were discussed in detail, shortcomings of the analysis were addressed, and future work was pointed out. For instance, transition bundling was argued to be performed by dendritic computation and local spatial sampling. In turn, this allowed making predictions and explaining several recent observations of real GCs.
This letter is part of a series that proposes transition coding as the core functionality of GCs. The following work will address multiple scales of transitions and demonstrate why a scale increment of is optimal and present a biologically plausible model of dendritic self-organization for transition bundling.
I sincerely thank the two anonymous reviewers whose helpful comments and constructive feedback helped to improve and clarify this letter. I also thank Christoph Richter and Jörg Conradt for invaluable discussions and feedback during the research, as well as their support and suggestions on the manuscript. This work was partially funded by EU FET project GRIDMAP 600725.