Abstract

The dentate gyrus forms a critical link between the entorhinal cortex and CA3 by providing a sparse version of the signal. Concurrent with this increase in sparsity, a widely accepted theory suggests the dentate gyrus performs pattern separation—similar inputs yield decorrelated outputs. Although an active region of study and theory, few logically rigorous arguments detail the dentate gyrus’s (DG) coding. We suggest a theoretically tractable, combinatorial model for this action. The model provides formal methods for a highly redundant, arbitrarily sparse, and decorrelated output signal.To explore the value of this model framework, we assess how suitable it is for two notable aspects of DG coding: how it can handle the highly structured grid cell representation in the input entorhinal cortex region and the presence of adult neurogenesis, which has been proposed to produce a heterogeneous code in the DG. We find tailoring the model to grid cell input yields expansion parameters consistent with the literature. In addition, the heterogeneous coding reflects activity gradation observed experimentally. Finally, we connect this approach with more conventional binary threshold neural circuit models via a formal embedding.

1  Introduction

The dentate gryus (DG) region of the hippocampus has long been thought to provide a pattern separation role during memory formation (McNaughton & Morris, 1987; O’Reilly & McClelland, 1994; Treves & Rolls, 1992). This separation role has typically been assumed to arise from several basic properties of the circuit, including substantial feedforward and feedback inhibition (Freund & Buzsáki, 1996), sparse and powerful mossy fiber projections (Henze, Urban, & Barrionuevo, 2000), and a large number of neurons relative to its entorhinal cortex (EC) inputs and CA3 outputs (Amaral, Scharfman, & Lavenex, 2007; O’Reilly & McClelland, 1994). Indeed, experimental evidence supports these theories showing that the activity of the principal granule cells in the DG is both sparse and decorrelated relative to other regions (Henze, Wittner, & Buzsáki, 2002; Jung & McNaughton, 1993; Leutgeb, Leutgeb, Moser, & Moser, 2007). Nonetheless, there is little consensus of what encoding scheme actually is used by the DG; suggestions range from random sparse projections to information-preserving schemes to lossy hash coding approaches (Treves, Tashiro, Witter, & Moser, 2008). Sparse coding of some sort is suggested as a function of many different brain regions, ranging from birdsong production (Hahnloser, Kozhevnikov, & Fee, 2002) to olfactory systems (Laurent, 2002). A particularly notable example of this is the visual cortex, where sparse coding has long been suggested to be a useful model of visual feature extraction (Olshausen & Field, 1997). The Olshausen and Field (OF) model formulated sparse coding as the identification of a sparse basis that best represented information from the original basis given some desired level of sparsity. This perspective has had considerable influence on subsequent studies of visual processing (Olshausen & Field, 2004) in terms of both understanding the complex processing of the visual cortex, as well as the development of visual cortex-inspired machine learning algorithms (Lee, Battle, Raina, & Ng, 2006).

There are several reasons to consider that the DG’s version of sparse coding would be different from this sensory function. For one, the inputs to the hippocampus (and thus the DG) are quite different from the information provided to sensory systems. If one accepts the sparse coding premise for sensory cortices, by the time information reaches the EC, the extraction of statistical features from correlated sensors (such as the extraction of objects and motion from raw pixels) will have already been performed. Instead, the EC appears to have highly refined representations of information; the medial EC represents spatial information with a sophisticated grid-cell scheme (Hafting, Fyhn, Molden, Moser, & Moser, 2005; Moser, Kropff, & Moser, 2008), and the lateral EC, while not as well characterized, appears to contain high-level information about objects and other cognitive features (Deshmukh & Knierim, 2011; Hargreaves, Rao, Lee, & Knierim, 2005). Likewise, the DG is essentially a feedforward layer typically thought of as preconditioning cortical inputs for the encoding in downstream regions that are believed to have more recurrent autoassociative properties (Marr, 1971; Treves & Rolls, 1992), which is somewhat distinct from the canonical hierarchical structure in cortex. From this perspective, the lossless encoding of information by the DG would appear to have a particularly high importance (Aimone, Deng, & Gage, 2011). In addition to these differences in inputs and outputs, unlike early sensory cortices, the DG shows considerable levels of neuronal plasticity. In addition to being among the first regions in which synaptic plasticity was observed (Bliss & Lømo, 1973), the DG is among the very few regions that exhibit ongoing neurogenesis throughout adulthood (Aimone et al., 2014). The implication of continual neurogenesis on sparse coding is of particular concern due to evidence that new neurons are more, rather than less, active than mature neurons in the DG (Aimone et al., 2011; Aimone, Wiles, & Gage, 2006; Sahay, Wilson, & Hen, 2011; Danielson et al., 2016).

Here, we sought to develop a theoretical framework for sparse coding in the DG that both guarantees the desired decorrelation function of the biological system and illustrates the necessity of plasticity in the network. Formal argument has been previously applied to abstract combinatorial neural codes (Curto, Itskov, Morrison, Roth, & Walker, 2013), but our model examines a code within the DG context. Such a model would have value in both providing a formal basis on which one can interpret biological data about the DG, as well as in providing a starting point for designing algorithms based on hippocampal coding. Specifically, we sought a model with the following related properties:

  • Lossless encoding. Encoding of input representations should preserve information (i.e., be reversible).

  • Pattern separation. Output representations should be strictly less correlated than input representations.

  • Capable of adapting code for novel inputs. Demonstrate an ability to reframe computation to match structure of novel input representations.

This letter has two main sections. First, we discuss the general model’s construction and its basic properties in section 2. These properties are more abstract and include results on sparsity and error correction. Next, in section 3, we move to incorporate more biological concepts and generalize the model as needed. We shape the inputs to represent grid cells of EC in section 3.1, incorporate neurogenesis/mixed coding in section 3.2, and provide an equivalence with more traditional models in section 3.3. We conclude with a discussion placing our formal approach into the context of current hippocampus models and an appendix of proofs.

2  Models

In this section we introduce our idealized sparse coding method. By beginning with a general formulation, we can establish fundamental properties. These properties apply to this coding method regardless of the context, and in the following section, we discuss special cases that put the coding into more biologically relevant context. While our coding method is one of likely many potential coding schemes, it has been shown to be parsimonious and provide a balance between theoretical tractability and biological relevance.

Let and be positive integers and be a set of inputs. We consider a code of to be a map with such that where is a linear transformation and is the component-wise threshold function for , otherwise. The linear component represents synaptic weights, and we include the nonlinear threshold function as a reflection of the synaptic thresholds. In addition, including a threshold function will allow for a many-to-one pullback function, which we put to use in section 2.2. In comparison to that which is found biologically in the EC-to-DG connection, we have made the assumption that neurons take binary values (0 for no spike or 1 for spike). Similarly, we are particularly interested where is much larger than .

We can now define a general version of our sparse coding. Let be a collection of subsets of with . With , define the matrix where if and only if and 0 otherwise. The map is our generalized sparse coding. In practice, each dimension of the target space represents a subset , and if and only if for all . As such, can be thought of as performing a logical AND over each set . The encoding captures information about the input vectors relative to , and so a different selection of will convey different information into the target space.

A diagram of the general sparse coding model is presented in Figure 1. In the top row, we see examples of how each DG neuron corresponds to a set of EC neurons over which the DG neuron samples. Each set is an ; in the general model, there are no size restrictions on . In the event that all EC neurons within a given spike (have corresponding entries equal 1), they each contribute the synaptic weight so that the DG neuron’s threshold is reached. This is indicated by the edges’ labels. In the bottom row, points correspond to activity patterns, entire vectors that are elements of (left) or (right).

Figure 1:

Patterns of EC activity form a densely populated space. These points are separated via the DG. Each output point has a neighborhood of radius separating it from all other valid output points. A DG neuron samples a subset of EC neurons, and differences in EC are amplified by selecting overlapping samples. (Example subsets are color-coded.) Synaptic weights are normalized with respect to size of the sample set and so that all corresponding presynaptic EC neurons must fire to drive the postsynaptic DG neuron. Overall activity in DG is sparse.

Figure 1:

Patterns of EC activity form a densely populated space. These points are separated via the DG. Each output point has a neighborhood of radius separating it from all other valid output points. A DG neuron samples a subset of EC neurons, and differences in EC are amplified by selecting overlapping samples. (Example subsets are color-coded.) Synaptic weights are normalized with respect to size of the sample set and so that all corresponding presynaptic EC neurons must fire to drive the postsynaptic DG neuron. Overall activity in DG is sparse.

For large collections , the fidelity of the encoding is good, but allowing for to be any subset of the power set is too poorly constrained for our theoretical results. At first, we make a simple choice for , namely, all -sized subsets of , and we will see that the encoded vectors provide a complete representation of the input data outside a neighborhood around 0. This is a strong initial assumption, but it allows enough structure to have provable results. We show the first few results under this assumption but then generalize to a less constrained situation. We use biological concepts to guide our modifications so that the model becomes more realistic and maintains rigor.

Our first result shows that given some , the sparse coding losslessly encodes .

Theorem 1.

Let . With being the collection all -sized subsets of , there exists a such that for all , . Namely, losslessly decodes .

Proof.

Define and if , and otherwise (applied component-wise). Set . Examine a particular . Then, if and only if for every . In addition, for all . Hence, , for all , exactly when , for all . Since was chosen arbitrarily, this holds for all .

In some sense, is maximal given the constraint . All vectors outside (viewed as being noise around the zero vector) are mapped to the zero vector. This definition of coincides with noise filtering. If only a few entries in a given are nonzero (), then we may not want the sparse coding to carry these data. By adjusting , we adjust the size of the target space and the amount of noise filtering. Unless specified, we assume the case where is all -sized subsets.

2.1  DG Code Guarantees Sparsity and Decreases Correlations

The map is a simple, combinatorial code. Nonetheless, its simplicity will allow for remarkable and desired biologically inspired properties. These properties can be shown formally. The first of such properties is its sparsity. As a measure of sparsity, we adopt (if or , then ), and with , we overload the notation to
formula
That is, measures sparsity of the output vectors via normalized correlations.
Proposition 1.
Let and be all -sized subsets of . For each , , we denote , and suppose at exactly indices. The following computes the sparsity function :
formula
Proof.
Recall if and only if for every . There are exactly sets that satisfy this condition. Hence, . Given any two , , we have that is equal to the number of entries where and agree. Moreover, and agree at a particular index if for all , . There are exactly such . Therefore, we have
formula
The result follows by definition of .

With this, it is easy to show that our sparse coding does indeed nontrivially increase the sparsity of our original input data, and we do so formally.

Theorem 2.
In the case being -sized () subsets of , the action of on increases sparsity (decreases ) in that . More specifically, for any two nonzero , (with images and , respectively),
formula

Theorem 3 establishes that our sparse coding method decreases the normalized correlation for any pair of vectors. Figure 2 shows a plot of the decrease in the estimated average value of . It is evident that even small values for lead to significant decreases in the normalized dot products. We quantify the average value of below.

Figure 2:

Estimated average is computed for 200 pairs of random vectors (chosen with 1-norm at least 3) in the input space, target space with , and target space with .

Figure 2:

Estimated average is computed for 200 pairs of random vectors (chosen with 1-norm at least 3) in the input space, target space with , and target space with .

Theorem 3.

Averaging over all nonzero vectors in , the mean value of is given by .

With theorem 3, it becomes justified to apply our sparse coding method repeatedly, each time causing an increase in the sparsity. However, we may lose fidelity depending on the subsequent choice of .

Remark 1.

The number of input vectors lost as noise at each step is . In the case where , .

2.2  Formal Sparse Code Provides Error-Correcting Properties

Traditional sparse coding algorithms act linearly on the data (Lee et al., 2006). In contrast, the threshold functions and provide a more accurate representation of neural systems than sparse coding without thresholding. Mathematically the threshold functions also allow our coding to be fault tolerant. Correction for false negatives is inherent in the construction of and , and the central idea is that in general, each nonzero entry in is coded many times in .

Proposition 2.

Let , , , and be defined as previously. Consider with , and denote . If (for some fixed ) , then there are entries in that represent .

Proof.

The claim rests on the fact that if and only if for every . Hence, for a particular , we merely need to count the such that for all and . Since is fixed, we have options for a set of elements. That is, there are sets and therefore, entries in that represent .

By proposition 6, we know that entries in carry the information for a single nonzero entry in . Moreover, from the definition of , only one of these is required to reconstruct that entry in . It is worth noting that when . This results in the least fault-tolerant situtation as only a single entry in is nonzero. However, this is truly a worst-case scenario; it is easy to find examples where the coding is much more compliant. Particularly, proposition 6 does not indicate how the entries in may carry overlapping information. Consider , and let . Since , . We notice, however, that with , , the entirety of can be expressed by only two entries in (both and must be 1). The feedback found within DG may be a biological mechanism for this error correcting.

This ability to correct errors has implications in the stability of downstream attractor networks (namely, those modeling CA3). In an iterative attractor network, it is possible that the iterative process converges to a spurious fixed point, which is usually a linear combination of the stored data. This tendency affects the overall capacity of the system. Moreover, without structural constraints, generically sparse stored data are especially susceptible to this linear combination issue.

By incorporating the error correction from our encoding-decoding process, we can attempt to alleviate the problem. The attracting fixed points need to be consistent (i.e., elements of ). It is unlikely that an arbitrary linear combination is an admissible vector. In some cases, it is impossible.

Theorem 4.

Let . If there exists distinct, nonzero , such that , then .

2.3  Knowledge of Input Structure Can Restrict Code

As developed thus far, the encoding process has two interrelated issues. The dimension of the target space is potentially unreasonably large, and we have little control over information encoded in the domain space. We address these problems concurrently. Motivated by the notion that certain combinations of neurons will not fire at the same time, we move to selectively restrict , thereby curtailing the growth of the target space. We restrict the set  to a set regarding some set of indexes by , the idea being that never occurs, and so it does not need to be encoded. With , there are sets of -sized that do not need to be encoded in the target space. We can then reduce to where . Since we are essentially eliminating superfluous zero components in the target space, the normalized dot product is unaffected, and so the value of is preserved. The process works for a specific . We now take a more generalized approach.

Let be a collection of vectors in , each of which represents a valid state for some network of neurons. The set may be the entirety of , or it may be a strict subset determined by the biological context. We say that a set is observed in if there exists some such that for all . Define . The remaining parts of the encoding process follow analogously to the case . Let where if and only if for . Define , , , , and as before. The encoding-decoding process shares many properties with our original encoding-decoding:

  • The set is losslessly encoded.

  • The sparsity nontrivially increases from . Specifically, the value of is unchanged.

  • has the same redundancies as when .

This is a first method to generalize away from the assumption that consists of all -sized subsets. With the above method, we tailor our code to the input data and maintain the desired properties. Our self-imposed requirement is 100% fidelity; however, with knowledge of the input distribution, the code can be made correspondingly leaner with proportional fidelity. In the following section, we discuss the benefits of applying this method to well-structured, biologically grounded input data.

3  Results

3.1  Applications to Grid Cells

To examine the biological plausibility of our model, we consider how the model reacts to biologically realistic input. Namely, we proceed through a formal application of the process described in section 2.3 where the restrictions on the input reflect the activity patterns observed. The primary input to the DG comes from the EC, and one of the best-studied cell types within the EC are grid cells. Moreover, the canonical one-dimensional model for grid cells due to Fiete, Burak, and Brookings (2008) has strong theoretical underpinnings, which allows us to easily adopt an equivalent representation compatible with our requirements for an input space. The grid cell model is well known, and a one-dimensional grid cell model is theoretically similar to a two-dimensional model but yields a much clearer exposition. We draw particular attention to the size of the target space. Our general model has useful properties but suffers from a potentially exponential (and thereby unrealistic) increase in dimension. We will see, however, that after applying constraints of biological context, the target dimension closely fits experimental results and is at most a polynomial increase.

Suppose groups of grid cells have relatively prime phases or frequencies , . The standard discrete model for capturing a point is given by
formula
This modular structure allows for a small number of neurons to represent very large values, and the arithmetic benefits of a modular system have been noted in the literature (Fiete et al., 2008; Yoo, Koyluoglu, Vishwanath, & Fiete, 2012).
To uniquely describe , we require . The space
formula
is endowed a group structure with modular component-wise addition. We can apply an equivalent group structure to our input space.
While it is a standard exercise to place an operation on so that it becomes group isomorphic to , we detail the construction explicitly for two reasons. First, we hope to emphasize the difference between the group structure on and the standard vector space addition. Second, this provides an abstract, binary, and equivalent representation of the group structure inherent in Fiete’s model. Define , and denote if and only if . Since , is well defined. We form the group where , recognizing . Of course, is isomorphic to the cyclic group of order , but we choose this one-hot vector representation to better match the structure found in . With , define
formula
In other words, a vector is given by the concatenation of vectors . As such, the induce a map on . The vector with if and only if for some generates the cyclic group under . The vector with if and only if for some is the identity. The group is isomorphic to the Cartesian product of our as well as :
formula
By detailing the construction of , we hope the isomorphism is apparent, though for completeness, we define via where if and only if for some . The map is an isomorphism of groups.

From a mathematical viewpoint, the ordering of our entries is arbitrary. However, the condition aligns with the finding that spacing of grid cells increases monotonically, progressing toward ventrolateral locations (Moser et al., 2008).

Taking , we get an exact representation of the grid cell resolution described in the literature. In particular, our system carries the exact same information content in a minimal unary representation. There are difficulties with this representation. First, when choosing , we achieve a clear one-to-one representation of all possible states. However, real-life biological systems are error prone, and this method leaves no redundancy. Second, although we reduce the size of the target space considerably, can still be very large. Consider . Then we have , , and . So while is two orders of magnitude smaller than , the value is still unreasonably large. We solve both problems with a simple generalization. Let . By choosing a small , we gain the robust error correction described in the above section. Proposition 6 ensures that with for all vectors in , there are entries in the target space carrying any single piece of information. We now turn to the issue of target space dimension. For large and much larger , it is easy to show, following from Stirling’s approximation, that asymptotically approaches in , and so the target space dimension grows quickly with respect to . Following our example of , if , then the dimension of our target space is computed to be a much more reasonable 3670, and the sparseness (ratio ) is .0005. This sparseness is well within those required by information-theoretic arguments (Treves et al., 2008; Treves & Rolls, 1992).

It is easy to get a crude bound for . The bound is polynomial in the largest prime.

Remark 2.
Let be a collection of prime numbers such that and . Let and be the space induced by for primes . Then the dimension of the target space is bounded above by
formula
3.1

In Figure 3, it is easy to see that the most considerable reduction in target dimension occurs when maintaining a value for and increasing as opposed to holding and varying . That is, the method of target space reduction as described is most beneficial in controlling with respect to an increase in .

Figure 3:

Taking to be the bound provided by equation 3.1 in remark 8, the function provides a lower bound for the reduction in the target space dimension.

Figure 3:

Taking to be the bound provided by equation 3.1 in remark 8, the function provides a lower bound for the reduction in the target space dimension.

More generally, the number of distinct grid cell periods (also referred to as the number of grid cell modules) has been estimated, in rats, to be in the upper single digits (Stensola et al., 2012; Moser, Moser, & Roudi, 2014). Grid cell spacing ranges from approximately 30 cm to several meters (Brun et al., 2008). Moreover, the increase in period from module to module is geometric—the period doubles each time (Moser et al., 2014). So even assuming a constant resolution of 6 cm (taken from the minimum lattice period, similar to Fiete et al., 2008), the entire range can easily be discretized using , . Using remark 8, small values of yield within expectations as rat DG is approximately neurons (Amaral et al., 2007). Hence, each dimension can be carried by many neurons without exceeding biological scope. In addition, recall that remark 8 is an upper bound overestimating the true size of . The target dimension is hard to estimate because it is sensitive to . However, in this case, we can estimate by assuming that each period is approximately twice the previous period (). With the same resolution and range as before, choosing yields and yields .

3.2  Mixed Coding and Adult Neurogenesis

Another challenge that has long faced models of DG is how to account for adult-born neurons. In particular, the observation that immature neurons are differentially activated in the network (Danielson et al., 2016; Aimone et al., 2014) suggests that they use a separate coding scheme from the mature population (Aimone et al., 2011). Assume the setup from theorem 1 where is a collection of -sized subsets of so that forms a sparse coding from into where . We can generalize our coding so that it supports structured mixed coding. Essentially, we replace one subset with smaller subsets of that set. Formally, let , and select . Denote the -sized subsets of with . Define . We can now define and as we have defined and , except with in place of .

Since , our target space has increased in dimension. The set on which is injective expands as well. Denote this expanded set with and so
formula
Let
formula
and
formula
The encoding-decoding process of is lossless on . All points in can be thought of as noise and are mapped to 0. Points in are mapped with selective accuracy, with emphasis on . This neurogenesis process could be used to selectively accommodate novel input, as shown in Figure 4.
Figure 4:

Illustrated are the supports of the dimensions in a target space. For the single-value coding case, each is the same size . If is large, is relatively selective and tightly tuned. If is small, is relatively unselective and broadly tuned. By using a mixed coding, sets in have varied sizes and consequently provide greater tuning control.

Figure 4:

Illustrated are the supports of the dimensions in a target space. For the single-value coding case, each is the same size . If is large, is relatively selective and tightly tuned. If is small, is relatively unselective and broadly tuned. By using a mixed coding, sets in have varied sizes and consequently provide greater tuning control.

We notice that the target dimensions that correspond to are less selective than those corresponding to . If for all , then for all for any . However, the converse does not hold.

We can show a formal benefit of this mixed coding in certain situations, as described in the following proposition.

Proposition 3.
Let and be two nonzero, encoded vectors with norms and , respectively, that agree at entries. Suppose the encoding is expanded (using the notation above) by , resulting in and . If is chosen so that
formula
where , then the normalized dot product of their images decreases.
Proof.
Note that
formula
so that we have
formula
if and only if
formula

The incorporation of a mixed coding scheme further removes us from the initial assumption that be all -sized subsets. As formulated, we replace a single with -sized subsets of , but this process can be repeated on other elements of the now-modified with different values for . Combined with the method presented in sections 2.3 and 3.1, our encoding can be quite sophisticated and based on variously sized subsets.

3.3  Disjunctive Connections

Thus far, our model assumes that a postsynaptic neuron fires only in the case that all presynaptic neurons fire. This assumption seems strong. In more classical models, postsynaptic activity is usually dependent on an “or” condition as a consequence of the synaptic weights (McCulloch & Pitts, 1943). In this section, we formally incorporate these more classical models into our current model, thereby weakening the assumption.

Viewing neural activity from the perspective of logical operations provides a convenient high-level abstraction of synaptic weights. For example, suppose a postsynaptic neuron has a threshold of 1, and synapses from presynaptic neurons , , and have weights .4, .7, and .4, respectively. Then the sufficient firing conditions are “ and ” or “ and .” (The condition “ and and ” is logically redundant.) The action of any combination of weights and thresholds can be fully described using such logical statements, and beneficially our arguments remain removed from an unnecessary granularity in detail. Similarly, here it will be convenient to generalize to a generic system by which we refer to two layers of neurons connected by a synaptic mapping. We assume the activity in the presynaptic layer is excitatory and is mapped deterministically to the activity in the postsynaptic layer. We show in this section that regardless of the structure of the disjunctive operations defining the mapping and without restriction on the sizes of sampling subsets, we can embed the system into a conjunction-based system compatible with the coding model previously discussed: the space and map . We begin with an example.

Consider the four-neuron system from before where the postsynaptic neuron fires if either presynaptic neurons and fire or if the presynaptic neurons and fire. Note that the pre-synaptic activity space is . Denote the set on which fires with . Denote sets and similarly. Then the firing condition for the postsynaptic neuron can be written as inclusion in the set . The space of all possible activities is represented in the Venn diagram in Figure 5. Using distributive properties, the expression can be rewritten as . That is, we can recast the network into a conjunction-based system compatible with our coding model. Note that all activity patterns of , , and produce the same output using either system.

Figure 5:

The Venn diagram represents the presynaptic activity space. The set corresponds to neuron firing; and are similar. The postsynaptic neuron fires if and fire or if and fire. Equivalently, the activity input must be in the set (shaded region). We can rewrite this condition using intersections of sets (conjunctions of neurons) as the outermost operator: .

Figure 5:

The Venn diagram represents the presynaptic activity space. The set corresponds to neuron firing; and are similar. The postsynaptic neuron fires if and fire or if and fire. Equivalently, the activity input must be in the set (shaded region). We can rewrite this condition using intersections of sets (conjunctions of neurons) as the outermost operator: .

The relationship between these dual systems is actually deeper than it appears. Given any disjunction-based system , we can produce a corresponding conjunction-based system and embed, up to an equivalence relation, the space of presynaptic activity within it via some mapping . Namely, Figure 6 commutes. We require some formalism before proceeding.

Figure 6:

Consider a disjunction-based system , where is the space of presynaptic activity and is that of postsynaptic activity. There exists a space , an induced map on into , and the composition of a quotient map with an injection such that for all .

Figure 6:

Consider a disjunction-based system , where is the space of presynaptic activity and is that of postsynaptic activity. There exists a space , an induced map on into , and the composition of a quotient map with an injection such that for all .

We assume an input layer of neurons where . The product space is the space of all possible input activity. For a neuron , define the cylinder set . Further, let operate on cylinder sets of using unions and intersections.1 That is, each component of is the indicator function for a set generated by finite combinations of unions and intersections of sets .

With the mapping , define an equivalence relation by if and only if , and define where is the equivalence class of under . The requirement for projecting onto equivalence classes reflects that in general, a disjunction-based system is many-to-one, and the space can be interpreted as the space of all discernible input activity. It is this quotient space that we embed into .

Let be the sets corresponding to each of the components of . Using distributive laws, each has a representation where is some finite union of cylinder sets from . The space is defined as where is count of all , and associate each dimension of with a corresponding set . Denote the corresponding dimension index . Define and as before with enumerated in the same order as .2

Theorem 5.
Assuming the notation above, the function defined by
formula
is injective, and . Hence, the system can be embedded, up to an equivalence relation, into the system .
Proof.

From our definition of , is obviously injective. The claim that follows readily as well. If , then with , , and . Hence, . A symmetric argument holds for when .

Given a classical neural system with synaptic weights, we can represent that as a system . This system induces equivalence classes based on differentiation in output layer. Specifically, two inputs differ to the point where the output-layer activity changes if and only if those two inputs sit in different equivalence classes. Finally, theorem 10 essentially says that we can take these equivalence classes and embed the action into the previously defined model. There are two consequences. First, by studying the conjunction-based systems of our model, we incorporate the classical system as well. Results using our model inform, in a particular way, about the projected, classical model. Second, given moderately sparse firing conditions, the map in general expands to a higher-dimensional space , and so the classical system computes a select subset of the information from the conjunction-based system while using fewer neurons and a lower level of activity.

4  Discussion

One challenge with identifying potential neural coding schemes for regions that have historically been explored behaviorally is that coarse functional predictions such as pattern separation (Yassa & Stark, 2011) are in reality quite limited in their utility as an algorithmic constraint (Aimone et al., 2011). Indeed, pattern separation alone is not adequate to ascertain a code’s value, as there are many different approaches to getting one layer of neurons to decorrelate its inputs. While the sparse coding scheme described here is a clear abstraction of the biological complexities of the DG region, it illustrates how combining this separation goal with the goal of information preservation is sufficient to define a code that makes useful predictions for the DG and DG neurogenesis.

The DG code we describe here is distinct from the grandmother cell concept that occasionally is discussed in the context of the medial temporal lobe broadly and DG region specifically (Quiroga, Kreiman, Koch, & Fried, 2008; Shors, 2008). The grandmother cell hypothesis refers to extremely sparse neural representations in which inputs are represented by a unique neuron, which in turn represents only one input. While it is simple to understand, obvious capacity and robustness considerations preclude this as a neurobiological possibility (Quiroga et al., 2008). Consistent with this, the code we describe here does not use grandmother cells; each input is represented by many neurons, and each neuron can respond to many inputs. However, because the code we describe is lossless, the representations are actually grandmother code–like. In this sense, the combination of neurons in the output is unique for a given input, and that combination is activated by only that input. While this is also probably an unrealistically strong constraint for the biological system, it is more consistent with widely held views of DG and hippocampal function.

One critical insight of this model is the formal illustration that neurogenesis represented here as nodes with distinct tuning curves is computationally compatible with an information-preserving decorrelation function. This analytical description is an important contribution to the ongoing debates of neurogenesis function (Aimone et al., 2011; Sahay et al., 2011). At first glance, a neurogenesis process in which young neurons are more active than mature neurons is incompatible with a desired pattern separation function (Aimone, Wiles, & Gage, 2009). Indeed, this supposition would be true if the only metric of importance was output correlation; our model would suggest that simply having a larger DG would be preferable to neurogenesis if only output correlation was considered. However, once information is considered, simply having more equivalent neurons is not required once an input space is adequately encoded and is of limited utility for encoding potentially novel inputs. Rather, because of the essentially unbounded range of novel inputs; it is far more effective to ensure against these with more scaling friendly “immature” neurons (Aimone et al., 2011).

A key prediction of this model is that the structure of cortical inputs should be very clearly reflected within DG input weights and the neurons’ population behavior. Indeed, our model is computationally practical only if there is considerable structure within the cortical representations; without such structure, the model would require an exponential expansion of neurons to adequately encode its inputs. This is seen clearly by our observation that a DG code designed to encode mEC grid cells requires a DG of a scale comparable to that observed biologically. That the DG would use a neural code with such a requirement is not that surprising; indeed, the observation of grid cells is indicative that the deeper areas of cortex that project to the hippocampus are highly refined in their representation (Hafting et al., 2005). Although equivalent coding schemes have not yet been identified within other areas of EC, it is reasonable to predict that other aspects of episodic memory (e.g., objects, context, emotional affect) are similarly highly structured in whatever representation they rely on. Our work illustrates that if the DG development is guided by its inputs’ structure, its representations can be highly efficient.

Finally, it is worth considering the differences between this abstract model and the real system. Biologically, the DG is a highly complex structure (Amaral et al., 2007); it not only uses a substantial number of feedforward and feedback inhibitory populations, but there is even an excitatory feedback population—the mossy cells—that would be expected to make DG dynamics quite complex. It is reasonable to believe that these numerous interneuron populations are providing complex positive and negative activity regulation in lieu of the tight algorithmic control used here. Such mechanisms would be expected to interact with mature and immature neurons differently, potentially leading to effects similar to those modeled here.

Appendix:  Proofs of Theorems 3, 4, and 7

Proof of Theorem 2.
Denote the number of nonzero entries in and with and respectively, and denote the number of entries where and agree with . Using proposition 2, the claim reduces to showing
formula
We begin with the fact that
formula
which holds as (as ) and . The remainder is basic algebraic manipulation:
formula
We conclude that
formula
Proof of Theorem 3.
As before, let , be nonzero, , , respectively, and , . For convenience of notation, denote and . We have
formula
A.1
We now reduce the inner term,
formula
and substituting into equation A.1 yields
formula
Of course, there are sets in , and the claim is shown.
Proof of Theorem 4.

Suppose for contradiction that , for some . There exists an integer such that and . Hence, there exists a -sized subset of with some where and . Similarly, there exist , , and where , , , and . Select a -sized set such that . Then, since and , . Since and , for all . However, by construction, we have , a contradiction.

Acknowledgments

This work was supported by Sandia National Laboratories’ Laboratory Directed Research and Development Program under the Hardware Acceleration of Adaptive Neural Algorithms Grand Challenge. Sandia National Laboratories is a multi-mission laboratory managed and operated by Sandia Corporation, a wholly owned subsidiary of Lockheed Martin Corporation, for the U.S. Department of Energy’s National Nuclear Security Administration under contract DE-AC04-94AL85000.

References

Aimone
,
J. B.
,
Deng
,
W.
, &
Gage
,
F. H.
(
2011
).
Resolving new memories: A critical look at the dentate gyrus, adult neurogenesis, and pattern separation
.
Neuron
,
70
(
4
),
589
596
.
Aimone
,
J. B.
,
Li
,
Y.
,
Lee
,
S. W.
,
Clemenson
,
G. D.
,
Deng
,
W.
, &
Gage
,
F. H.
(
2014
).
Regulation and function of adult neurogenesis: From genes to cognition
.
Physiological reviews
,
94
(
4
),
991
1026
.
Aimone
,
J. B.
,
Wiles
,
J.
, &
Gage
,
F. H.
(
2006
).
Potential role for adult neurogenesis in the encoding of time in new memories
.
Nature Neuroscience
,
9
(
6
),
723
727
.
Aimone
,
J. B.
,
Wiles
,
J.
, &
Gage
,
F. H.
(
2009
).
Computational influence of adult neurogenesis on memory encoding
.
Neuron
,
61
(
2
),
187
202
.
Amaral
,
D. G.
,
Scharfman
,
H. E.
, &
Lavenex
,
P.
(
2007
).
The dentate gyrus: Fundamental neuroanatomical organization (dentate gyrus for dummies)
.
Progress in Brain Research
,
163
,
3
22
.
Bliss
,
T. V.
, &
Lømo
,
T.
(
1973
).
Long-lasting potentiation of synaptic transmission in the dentate area of the anaesthetized rabbit following stimulation of the perforant path
.
Journal of Physiology
,
232
(
2
),
331
356
.
Brun
,
V. H.
,
Solstad
,
T.
,
Kjelstrup
,
K. B.
,
Fyhn
,
M.
,
Witter
,
M. P.
,
Moser
,
E. I.
, &
Moser
,
M.-B.
(
2008
).
Progressive increase in grid scale from dorsal to ventral medial entorhinal cortex
.
Hippocampus
,
18
(
12
),
1200
1212
.
Curto
,
C.
,
Itskov
,
V.
,
Morrison
,
K.
,
Roth
,
Z.
, &
Walker
,
J. L.
(
2013
).
Combinatorial neural codes from a mathematical coding theory perspective
.
Neural Computation
,
25
(
7
),
1891
1925
.
Danielson
,
N. B.
,
Kaifosh
,
P.
,
Zaremba
,
J. D.
,
Lovett-Barron
,
M.
,
Tsai
,
J.
,
Denny
,
C. A.
, …
Kheirbek
,
M. A.
(
2016
).
Distinct contribution of adult-born hippocampal granule cells to context encoding
.
Neuron
,
90
(
1
),
101
112
.
Deshmukh
,
S. S.
, &
Knierim
,
J. J.
(
2011
).
Representation of non-spatial and spatial information in the lateral entorhinal cortex
.
Frontiers in Behavioral Neuroscience
,
5
.
Fiete
,
I. R.
,
Burak
,
Y.
, &
Brookings
,
T.
(
2008
).
What grid cells convey about rat location
.
Journal of Neuroscience
,
28
(
27
),
6858
6871
.
Freund
,
T. F.
, &
Buzsáki
,
G.
(
1996
).
Interneurons of the hippocampus
.
Hippocampus
,
6
(
4
),
347
470
.
Hafting
,
T.
,
Fyhn
,
M.
,
Molden
,
S.
,
Moser
,
M.-B.
, &
Moser
,
E. I.
(
2005
).
Microstructure of a spatial map in the entorhinal cortex
.
Nature
,
436
(
7052
),
801
806
.
Hahnloser
,
R. H.
,
Kozhevnikov
,
A. A.
, &
Fee
,
M. S.
(
2002
).
An ultra-sparse code underlies the generation of neural sequences in a songbird
.
Nature
,
419
(
6902
),
65
70
.
Hargreaves
,
E. L.
,
Rao
,
G.
,
Lee
,
I.
, &
Knierim
,
J. J.
(
2005
).
Major dissociation between medial and lateral entorhinal input to dorsal hippocampus
.
Science
,
308
(
5729
),
1792
1794
.
Henze
,
D. A.
,
Urban
,
N. N.
, &
Barrionuevo
,
G.
(
2000
).
The multifarious hippocampal mossy fiber pathway: A review
.
Neuroscience
,
98
(
3
),
407
427
.
Henze
,
D. A.
,
Wittner
,
L.
, &
Buzsáki
,
G.
(
2002
).
Single granule cells reliably discharge targets in the hippocampal CA3 network in vivo
.
Nature Neuroscience
,
5
(
8
),
790
795
.
Jung
,
M.
, &
McNaughton
,
B.
(
1993
).
Spatial selectivity of unit activity in the hippocampal granular layer
.
Hippocampus
,
3
(
2
),
165
182
.
Laurent
,
G.
(
2002
).
Olfactory network dynamics and the coding of multidimensional signals
.
Nature Reviews Neuroscience
,
3
(
11
),
884
895
.
Lee
,
H.
,
Battle
,
A.
,
Raina
,
R.
, &
Ng
,
A. Y.
(
2006
).
Efficient sparse coding algorithms
. In
B. Schölkopf
,
J. Platt
, &
T.
Hoffman
(Eds.),
Advances in neural information processing systems
,
19
(pp.
801
808
).
Cambridge, MA
:
MIT Press
.
Leutgeb
,
J. K.
,
Leutgeb
,
S.
,
Moser
,
M.-B.
, &
Moser
,
E. I.
(
2007
).
Pattern separation in the dentate gyrus and CA3 of the hippocampus
.
Science
,
315
(
5814
),
961
966
.
Marr
,
D.
(
1971
).
Simple memory: A theory for archicortex
.
Philosophical Transactions of the Royal Society of London
,
262
(
841
),
23
81
.
McCulloch
,
W. S.
, &
Pitts
,
W.
(
1943
).
A logical calculus of the ideas immanent in nervous activity
.
Bulletin of Mathematical Biophysics
,
5
(
4
),
115
133
.
McNaughton
,
B. L.
, &
Morris
,
R. G.
(
1987
).
Hippocampal synaptic enhancement and information storage within a distributed memory system
.
Trends in Neurosciences
,
10
(
10
),
408
415
.
Moser
,
E. I.
,
Kropff
,
E.
, &
Moser
,
M.-B.
(
2008
).
Place cells, grid cells, and the brain’s spatial representation system
.
Annu. Rev. Neurosci.
,
31
,
69
89
.
Moser
,
E. I.
,
Moser
,
M.-B.
, &
Roudi
,
Y.
(
2014
).
Network mechanisms of grid cells
.
Philosophical Transactions of the Royal Society of London B: Biological Sciences
,
369
(
1635
),
20120511
.
Olshausen
,
B. A.
, &
Field
,
D. J.
(
1997
).
Sparse coding with an overcomplete basis set: A strategy employed by V1
?
Vision Research
,
37
(
23
),
3311
3325
.
Olshausen
,
B. A.
, &
Field
,
D. J.
(
2004
).
Sparse coding of sensory inputs
.
Current Opinion in Neurobiology
,
14
(
4
),
481
487
.
O’Reilly
,
R. C.
, &
McClelland
,
J. L.
(
1994
).
Hippocampal conjunctive encoding, storage, and recall: Avoiding a trade-off
.
Hippocampus
,
4
(
6
),
661
682
.
Quiroga
,
R. Q.
,
Kreiman
,
G.
,
Koch
,
C.
, &
Fried
,
I.
(
2008
).
Sparse but not grandmother-cellcoding in the medial temporal lobe
.
Trends in Cognitive Sciences
,
12
(
3
),
87
91
.
Sahay
,
A.
,
Wilson
,
D. A.
, &
Hen
,
R.
(
2011
).
Pattern separation: A common function for new neurons in hippocampus and olfactory bulb
.
Neuron
,
70
(
4
),
582
588
.
Shors
,
T. J.
(
2008
).
From stem cells to grandmother cells: How neurogenesis relates to learning and memory
.
Cell Stem Cell
,
3
(
3
),
253
258
.
Stensola
,
H.
,
Stensola
,
T.
,
Solstad
,
T.
,
Frøland
,
K.
,
Moser
,
M.-B.
, &
Moser
,
E. I.
(
2012
).
The entorhinal grid map is discretized
.
Nature
,
492
(
7427
),
72
78
.
Treves
,
A.
, &
Rolls
,
E. T.
(
1992
).
Computational constraints suggest the need for two distinct input systems to the hippocampal CA3 network
.
Hippocampus
,
2
(
2
),
189
199
.
Treves
,
A.
,
Tashiro
,
A.
,
Witter
,
M. P.
, &
Moser
,
E. I.
(
2008
).
What is the mammalian dentate gyrus good for
?
Neuroscience
,
154
(
4
),
1155
1172
.
Yassa
,
M. A.
, &
Stark
,
C. E.
(
2011
).
Pattern separation in the hippocampus
.
Trends in Neurosciences
,
34
(
10
),
515
525
.
Yoo
,
Y.
,
Koyluoglu
,
O. O.
,
Vishwanath
,
S.
, &
Fiete
,
I.
(
2012
).
Dynamic shift-map coding with side information at the decoder
. In
Proceedings of the 2012 50th Annual Allerton Conference on Communication, Control, and Computing
(pp.
632
639
).
Piscataway, NJ
:
IEEE
.

Notes

1

This can easily be generalized to include relative complement (inhibitory synapses) as well.

2

The association and the map are defined noncanonically.