Shannon's seminal 1948 work gave rise to two distinct areas of research: information theory and mathematical coding theory. While information theory has had a strong influence on theoretical neuroscience, ideas from mathematical coding theory have received considerably less attention. Here we take a new look at combinatorial neural codes from a mathematical coding theory perspective, examining the error correction capabilities of familiar receptive field codes (RF codes). We find, perhaps surprisingly, that the high levels of redundancy present in these codes do not support accurate error correction, although the error-correcting performance of receptive field codes catches up to that of random comparison codes when a small tolerance to error is introduced. However, receptive field codes are good at reflecting distances between represented stimuli, while the random comparison codes are not. We suggest that a compromise in error-correcting capability may be a necessary price to pay for a neural code whose structure serves not only error correction, but must also reflect relationships between stimuli.
Shannon's seminal work (Shannon, 1948) gave rise to two distinct though related areas of research: information theory (Cover & Thomas, 2006) and mathematical coding theory (MacWilliams & Sloane, 1983; Huffman & Pless, 2003). While information theory has had a strong influence on theoretical neuroscience (Attick, 1992; Borst & Theunissen, 1999; Rieke, Warland, de Ruyter van Steveninck, & Bialek, 1999; Quiroga & Panzeri, 2009), ideas central to mathematical coding theory have received considerably less attention. This is in large part due to the fact that the neural code is typically regarded as a description of the mapping, or encoding map, between stimuli and neural responses. Because this mapping is not in general understood, identifying which features of neural responses carry the most information about a stimulus is often considered to be the main goal of neural coding theory (Bialek, Rieke, de Ruyter van Stevenick, & Warland, 1991; deCharms & Zador, 2000; Jacobs et al., 2009; London, Roth, Beeren, Häusser, & Latham, 2010). In particular, information-theoretic considerations have been used to suggest that encoding maps ought to maximize information and minimize the redundancy of stimulus representations (Attneave, 1954; Barlow, 1961; Adelesberger-Mangan & Levy, 1992; Attick, 1992; Rieke et al., 1999), although recent experiments point increasingly to high levels of redundancy in retinal and cortical codes (Puchalla, Schneidman, Harris, & Berry, 2005; Luczak, Barthó, & Harris, 2009).
In contrast, mathematical coding theory has been primarily motivated by engineering applications, where the encoding map is always assumed to be well known and can be chosen at will. The primary function of a “code” in Shannon's original work is to allow accurate and efficient error correction following transmission across a noisy channel. “Good codes” do this in a highly efficient manner, so as to achieve maximal channel capacity while allowing arbitrarily accurate error correction. Mathematical coding theory grew out of Shannon's challenge to design good codes, a question largely independent of either the nature of the information being transmitted or the specifics of the encoding map. In this perspective, redundancy is critical to the function of a code, as error correction is possible only because a code introduces redundancy into the representation of transmitted information (MacWilliams & Sloane, 1983; Huffman & Pless, 2003).
Given this difference in perspective, can mathematical coding theory be useful in neuroscience? Because of the inherent noise and variability that is evident in neural responses, it seems intuitive that enabling error correction should also be an important function of neural codes (Schneidman, Berry, Segev, & Bialek, 2006; Hopfield, 2008; Sreenivasan & Fiete, 2011). Moreover, in cases where the encoding map has become more or less understood, as in systems that exhibit robust and reliable receptive fields, we can begin to look beyond the encoding map and study the features of the neural code itself. An immediate advantage of this new perspective is that it can help clarify the role of redundancy. From the viewpoint of information theory, it may be puzzling to observe so much redundancy in the way neurons are representing information (Barlow, 1961), although the advantages of redundancy in neural coding are gaining appreciation (Barlow, 2001; Puchalla et al., 2005). Experimentally, redundancy is apparent even without an understanding of the encoding map from the fact that only a small fraction of the possible patterns of neural activity is actually observed in both stimulus-evoked and spontaneous activity (Luczak et al., 2009). On the other hand, it is generally assumed that redundancy in neural responses, as in good codes, exists primarily to allow reliable signal estimation in the presence of noisy information transmission. This is precisely the kind of question that mathematical coding theory can address: Does the redundancy apparent in neural codes enable accurate and efficient error correction?
To investigate this question, we take a new look at neural coding from a mathematical coding theory perspective, focusing on error correction in combinatorial codes derived from neurons with idealized receptive fields. These codes can be thought of as binary codes, with 1s and 0s denoting neurons that are “on” or “off” in response to a given stimulus, and thus lend themselves particularly well to traditional coding-theoretic analyses. Although it has been recently argued that the entorhinal grid cell code may be very good for error correction (Sreenivasan & Fiete, 2011), we show that more typical receptive field codes (RF codes), including place field codes, perform quite poorly as compared to random codes with matching length, sparsity, and redundancy. The error-correcting performance of receptive field codes catches up, however, when a small tolerance to error is introduced. This error tolerance is measured in terms of a metric inherited from the stimulus space and reflects the fact that perception of parametric stimuli is often inexact. We conclude that the nature of the redundancy observed in receptive field codes cannot be fully explained as a mechanism to improve error correction, since these codes are far from optimal in this regard. On the other hand, the structure of receptive field codes does allow them to naturally encode distances between stimuli, a feature that could be beneficial for making sense of the transmitted information within the brain. We suggest that a compromise in error-correcting capability may be a necessary price to pay for a neural code whose structure not only serves error correction but must also reflect relationships between stimuli.
2. Combinatorial Neural Codes
2.1. Receptive Field Codes.
Neurons in many brain areas have activity patterns that can be characterized by receptive fields. Abstractly, a receptive field is a map from a space of stimuli X to the average (nonnegative) firing rate of a single neuron, i, in response to each stimulus. Receptive fields are computed by correlating neural responses to independently measured external stimuli. We follow a common abuse of language, where both the map and its support (i.e., the subset of X where fi takes on positive values) are referred to as receptive fields. Convex receptive fields are convex subsets of X. The main examples we have in mind pertain to orientation-selective neurons and hippocampal place cells. Orientation-selective neurons have tuning curves that reflect a neuron's preference for a particular angle (Watkins & Berkley, 1974; Ben-Yishai, Bar-Or, & Sompolinsky, 1995). Place cells are neurons that have place fields: each cell has a preferred (convex) region of the animal's physical environment where it has a high firing rate (O'Keefe & Dostrovsky, 1971; McNaughton, Battaglia, Jensen, Moser, & Moser, 2006). Both tuning curves and place fields are examples of receptive fields.3
2.2. Comparison Codes.
In order to analyze the performance of RF codes, we use two types of randomly generated comparison codes with matching size, length, and sparsity. In particular, these codes have the same redundancy as their corresponding RF codes. We choose random codes as our comparison codes for three reasons. First, as demonstrated by Shannon (1948) in the proof of his channel coding theorem, random codes are expected to have near-optimal performance. Second, the parameters can be tuned to match those of the RF codes; we describe the two ways in which we do this. Finally, random codes are a biologically reasonable alternative for the brain, since they may be implemented by random neural networks.
2.2.1. Shuffled Codes.
Given a RF code , we generate a shuffled code in the following manner. Fix a collection of permutations such that for all distinct , and set .5 The shuffled code has the same length, size, and weight distribution (and hence the same sparsity and redundancy) as . In our simulations, each permutation is chosen uniformly at random with the modification that a new permutation is selected if the resulting shuffled codeword has already been generated. This ensures that no two codewords of correspond to the same word in the shuffled code.
2.2.2. Random Constant-Weight Codes.
Constant-weight codes are subsets of in which all codewords have the same weight. Given an RF code on n neurons, we compute the average weight of the codewords in and round this to obtain an integer w. We then generate a constant-weight code by randomly choosing subsets of size w from [n]. These subsets give the positions of the codeword that are assigned a 1, and the remaining positions are all assigned zeros. This process is repeated until distinct codewords are generated, and the resulting code is then a random constant-weight code with the same length, size, and redundancy as , and approximately the same sparsity as .
3. Stimulus Encoding and Decoding
3.1. The Mathematical Coding Theory Perspective.
The central goal of this letter is to analyze our main examples of combinatorial neural codes, 1D and 2D RF codes, from a mathematical coding theory perspective. We draw on this field because it provides a complementary perspective on the nature and function of codes that is unfamiliar to most neuroscientists. We first discuss the standard paradigm of coding theory and then explain the function of codes from this perspective. Note that to put neural codes into this framework, we must discretize the stimulus space and encoding map so that we have an injective map from the set of stimuli to the code; we describe this in the next section.
Figure 2A illustrates the various stages of information transmission using the standard coding theory paradigm, adapted for RF codes. A stimulus gets mapped to a neural codeword under an (injective) encoding map , where is the (discretized) stimulus space. This map sends each stimulus to a neural activity pattern that is considered to be the ideal response of a population of neurons. The codeword, viewed as a string of 0s and 1s, then passes through a noisy channel, where each 0/1 bit may be flipped with some probability. A flip corresponds to a neuron in the ideal response pattern failing to fire, while a flip corresponds to a neuron firing when it is not supposed to. The resulting word is not necessarily a codeword and is referred to as the received word. This noisy channel output is then passed through a decoder to yield an estimate for the original codeword c, corresponding to an estimate of the ideal response. Finally, if an estimate of the original stimulus is desired, the inverse of the encoding map may be applied to the estimated codeword . Because the brain has access only to neural activity patterns, we will consider the ideal response as a proxy for the stimulus itself; the estimated neural codeword thus represents the brain's estimate of the stimulus, and so we can ignore this last step.
The mathematical coding theory perspective on stimulus encoding and decoding has several important differences from the way neuroscientists typically think about neural coding. First, a clear distinction is made between a code, which is simply a set of codewords (or neural response patterns) devoid of any intrinsic meaning, and the encoding map, which is a function that assigns a codeword to each element in the set of objects to be encoded. Second, this map is always deterministic, as the effects of noise are considered to arise purely from the transmission of codewords through a noisy channel.6 For neuroscientists, the encoding of a signal into a pattern of neural activity is itself a noisy process, and so the encoding map and the channel are difficult to separate. If we consider the output of the encoding map to be the ideal response of a population of neurons, however, it is clear that actual response patterns in the brain correspond not to codewords but rather to received words. (The ideal response, on the other hand, is always a codeword and corresponds intuitively to the average response across many trials of the same stimulus.) In the case of RF codes, a natural encoding map sends each stimulus to the codeword corresponding to the subset of neurons that contain the stimulus in their receptive fields. In the case of the random comparison codes, an encoding map that assigns codewords to stimuli is chosen randomly (details are given in the next section).
Another important difference offered by the coding theory perspective is in the process of decoding. Given a received word, the objective of the decoder is to estimate the original codeword that was transmitted through the channel. In the case of neural codes, this amounts to taking the actual neural response and producing an estimate of the ideal response, which serves as a proxy for the stimulus. The function of the decoder is therefore to correct errors made by transmission through the noisy channel. In a network of neurons, this would be accomplished by network interactions that evolve the original neural response (the received word) to a closely related activity pattern (the estimated codeword) that corresponds to an ideal response for a likely stimulus.
This leads us to the coding theory perspective on the function (or purpose) of a code. Error correction is possible only when errors produced by the channel lead to received words that are not themselves codewords, and it is most effective when codewords are “far away” from each other in the space of all words, so that errors can be corrected by returning the “nearest” codeword to the received word. The function of a code, therefore, is to represent information in a way that allows accurate error correction in a high percentage of trials. The fact that there is redundancy in how a code represents information is therefore a positive feature of the code rather than an inefficiency, since it is precisely this redundancy that makes error correction possible.
3.2. Encoding Maps and the Discretization of the Stimulus Space.
In the case of the comparison codes, we use the same discretized stimulus space as in the corresponding RF code and associate a codeword to each stimulus using a random (one-to-one) encoding map . This map is generated by ordering both the stimuli in and the codewords in the random code , and then selecting a random permutation to assign a codeword to each stimulus.
3.3. The Binary Asymmetric Channel.
In all our simulations, we model the channel as a binary asymmetric channel (BAC). As seen in Figure 2B, the BAC is defined by a false-positive probability p, the probability of a 0 being flipped to a 1, and a false-negative probability q, the probability of a 1 being flipped to a 0. Since errors are always assumed to be less likely than faithful transmission, we assume p, q<1/2. The channel operates on each individual bit, but it is customary to extend it to operate on full codewords via the assumption that each bit is affected independently. This is reasonable in our context because it is often assumed (though not necessarily believed) that neurons within the same area experience independent and identically distributed noise. The BAC has as special cases two other channels commonly considered in mathematical coding theory: p=q gives the binary symmetric channel (BSC), and p=0 reduces to the Z-channel.
Note that the probability of an error across this channel depends on the sparsity of the code. For a given bit (or neuron), the probability of an error occurring during transmission across the BAC is p(1−s)+qs, assuming that all codewords are transmitted with equal probability and all neurons participate in approximately the same number of codewords.
3.4. The ML and MAP Decoders.
3.5. An Approximation of MAP Decoding for Sparse Codes.
In cases where all codewords are sent with equal probability, it is easy to see from Bayes’ rule that (see appendix A.2). When codewords are not equally likely, MAP decoding will outperform ML decoding, but it is impractical in the neural context because we cannot know the exact probability distribution on stimuli. In some cases, however, it may be possible to approximate MAP decoding, leading to a decoder that outperforms ML while being just as easy to implement. Here we illustrate this possibility in the case of sparse codes, where sparser (lower-weight) codewords are more likely.
Figure 3 shows the results of two simulations comparing the MAP approximation to ML decoding on a 2D RF code. In the first case, Figure 3A, the probability distribution is biased toward sparser codewords, corresponding to stimuli covered by fewer receptive fields. Here we see that the MAP approximation significantly outperforms ML decoding. In the second case, Figure 3B, all codewords are equally likely. As expected, ML decoding outperforms the MAP approximation in this case, since it coincides with MAP decoding. When we consider a biologically plausible probability distribution that is biased toward codewords with larger regions in the stimulus space, we find that ML decoding again outperforms the MAP approximation (see appendix A.2 and Figure 8), even though there is a significant correlation between larger region size and sparser codewords. Thus, we will restrict ourselves to considering ML decoding in the sequel; for simplicity, we will assume all codewords are equally likely.9
4. The Role of Redundancy in RF Codes
The function of a code from the mathematical coding theory perspective is to represent information in a way that allows errors in transmission to be corrected with high probability. In classical mathematical coding theory, decoding reduces to finding the closest codeword to the received word, where “closest” is measured by a metric appropriate to the channel. If the code has large minimum distance between codewords, then many errors can occur without affecting which codeword will be chosen by the decoder (Huffman & Pless, 2003). If the elements of a binary code are closely spaced within , errors will be more difficult to decode because there will often be many candidate codewords that could have reasonably resulted in a given received word.
When the redundancy of a code is high, the ratio of the number of codewords to the total number of vectors in is low, and so it is possible to achieve a large minimum distance between codewords. Nevertheless, high redundancy of a code does not guarantee large minimum distance, because even highly redundant codes may have codewords that are spaced closely together. For this reason, high redundancy does not guarantee good error-correcting properties. This leads us to the natural question: Does the high redundancy of RF codes result in effective error correction? The answer depends to some extent on the particular decoder that is used. In the simulations that follow, we use ML decoding to test how well RF codes correct errors. We assume that all codewords within a code are equally likely, and hence ML decoding is equivalent to (optimal) MAP decoding. It has been suggested that the brain may actually implement ML or MAP decoding (Deneve et al., 1999; Ma et al., 2006), but even if this decoder were not biologically plausible, it is the natural decoder to use in our simulations as it provides an upper bound on the error-correcting performance of RF codes.
4.1. RF Code Redundancy Does Not Yield Effective Error Correction.
To test the hypothesis that the redundancy of RF codes enables effective error correction, we generated 1D and 2D RF codes having 75 neurons each (see appendix B). For each RF code, we also generated two random comparison codes: a shuffled code and a random constant-weight code with matching parameters. These codes were tested on the BAC for a variety of channel parameters (values of p and q). For each BAC condition and each code, 10,000 codewords selected uniformly at random were sent across the noisy channel and then decoded using ML decoding. If the decoded word exactly matched the original sent word, the decoding was considered “correct”; if not, there was a failure of error correction.
Figure 4 shows the fraction of correctly decoded transmissions for fixed values of q and a range of p values in the case of 1D RF codes (see Figure 4A) and 2D RF codes (see Figure 4B), together with the performance of the comparison codes. In each case, the RF codes had significantly worse performance (less than 80% correct decoding in all cases) than the comparison codes, whose performances were near-optimal for low values of p. Repeating this analysis for different values of q yielded similar results (not shown).
As previously mentioned, in the case of the BSC, nearest neighbor decoding with respect to Hamming distance coincides with ML decoding. Thus, in the case of a symmetric channel, codes perform poorly precisely when their minimum Hamming distance is small. Even though nearest neighbor decoding with respect to Hamming distance does not coincide with ML decoding on the BAC when , decoding errors are still more likely to occur if codewords are close together in Hamming distance. Indeed, the poor performance of RF codes can be attributed to the very small distance between a codeword and its nearest neighbors. Since codewords correspond to regions defined by overlapping receptive fields, the Hamming distance between a codeword and its nearest neighbor is typically 1 in an RF code, which is the worst-case scenario.10 In contrast, codewords in the random comparison codes are distributed much more evenly throughout the ambient space . While there is no guarantee that the minimum distance on these codes is high, the typical distance between a codeword and its nearest neighbor is high, leading to near-optimal performance.
4.2. RF Code Redundancy Reflects the Geometry of the Stimulus Space.
Given the poor error-correcting performance of RF codes, it seems unlikely that the primary function of RF code redundancy is to enable effective error correction. As outlined in the previous section, the poor performance of RF codes is the result of the very small Hamming distances between a codeword and its nearest neighbors. While these small Hamming distances are problematic for error correction, they may prove valuable in reflecting the distance relationships between stimuli, as determined by a natural metric on the stimulus space.
To characterize the relationship between dstim and dH on RF codes, we performed correlation analyses between these metrics on 2D RF codes and corresponding random comparison codes. For each code, we computed dstim and dH for all pairs of codewords and then computed the correlation coefficient between their values. Figure 5A shows a scatter plot of dstim versus dH values for a single 2D RF code; the high correlation is easily seen by eye. In contrast, the same analysis for a corresponding shuffled code (see Figure 5B) and a random constant-weight code (see Figure 5C) revealed no significant correlation between dstim and dH. Repeating this analysis for the receptive field and comparison codes used in Figure 4 resulted in very similar results (see Figure 5D). Thus, the codewords in RF codes appear to be distributed across in a way that captures the geometry of the underlying stimulus space rather than in a manner that guarantees high distance between neighboring codewords.
Previous work has shown that the structure of a place field code (i.e., a 2D RF code) can be used to extract topological and geometric features of the represented environment (Curto & Itskov, 2008). We hypothesize that the primary role of RF code redundancy may be to reflect the geometry of the underlying stimulus space and that the poor error-correcting performance of RF codes may be a necessary price to pay for this feature. This poor error correction may be mitigated, however, when we reexamine the role that stimulus space geometry plays in the brain's perception of parametric stimuli.
5. Decoding with Error Tolerance in RF Codes
5.1. Error Tolerance Based on the Geometry of Stimulus Space.
5.2. RF Codes “Catch Up” to Comparison Codes When Decoding with Error Tolerance.
We next investigated whether the performance of RF codes improved, as compared to the comparison codes with matching parameters, when an error tolerance was introduced. For each 1D RF code and each 2D RF code used in Figure 4, we repeated the analysis using fixed channel parameters and varying instead the error tolerance with respect to the induced stimulus space metric dstim. We found that RF codes quickly catch up to the random comparison codes when a small tolerance to error is introduced (see Figures 6A and 6B). In some cases, the performance of the RF codes even surpasses that of the random comparison codes.
In order to verify that the catch-up effect is not merely an artifact resulting from the assignment of random encoding maps to the comparison codes, we repeated the above analysis using Hamming distance dH instead of dstim, thus completely eliminating the influence of the encoding maps. The Hamming distance between codewords in a sparse code typically ranges from 0 to about twice the average weight, which corresponds to dH=25 for the 1D RF codes and dH=10 for the 2D RF codes considered here. The probability of correct decoding using an error tolerance measured by Hamming distance yielded similar results, with RF codes catching up to the random comparison codes for relatively small error tolerances (see Figures 6C and 6D). This suggests that errors in transmission and decoding for RF codes result in codewords that are close to the correct word not only in the induced stimulus space metric but also in Hamming distance.
The question that remains is now: Why do RF codes catch up?
5.3. ML Similarity and ML Distance.
Despite not being a metric on , dML is useful as an indicator of how close the ML decoder comes to outputting the correct idealized codeword. By definition, ML decoding errors will have large ML similarity to the correct codeword. In other words, even if , the value of will be relatively small. Unlike Hamming distance, dML naturally captures the notion that two codewords are close if they are likely to be confused after having been sent through the BAC channel and decoded with the ML decoder.13 In practice, however, dML is much more difficult to compute than Hamming distance. Fortunately, as we will see in the next section, there is a high correlation between dML and dH, so that dH may be used as a proxy for dML when using dML becomes computationally intractable.
5.4. Explanation of the Catch-Up Phenomenon.
The ML distance dML is defined so that ML decoding errors have small ML distance to the correct codeword, regardless of the code. However, tolerating small errors makes sense only if errors are quantified by distances between stimuli, given by the induced stimulus space metric dstim. The fact that RF codes catch up in error correction when an error tolerance with respect to dstim is introduced suggests that on these codes, dstim and dML correlate well, whereas on the comparison codes they do not. In other words, although the codewords in RF codes are not well separated inside , decoding errors tend to return codewords that represent very similar stimuli, and are hence largely tolerable.
To verify this intuition, we performed correlation analyses between dstim and dML on 2D RF codes and corresponding random comparison codes. For each code, we computed dstim and dML for all pairs of codewords and then computed the correlation coefficient between these two measures. Because finding dML among all pairs of codewords in a code with many neurons was computationally intractable, we performed this analysis on short codes having only 10 neurons, or length 10. Figure 7A shows a scatter plot of dstim versus dML values for a single 2D RF code; the high correlation is easily seen by eye. In contrast, the same analysis for a corresponding shuffled code (see Figure 7B) and a random constant-weight code (see Figure 7C) revealed no significant correlation between dstim and dML. Repeating this analysis for 10 matched sets of codes, each consisting of a 2D RF code, a corresponding shuffled code, and a corresponding random constant-weight code resulted in very similar results (see Figure 7D).
In order to test if the correlation between dstim and dML might continue to hold for our longer codes with n=75 neurons, we first investigated whether Hamming distance dH could be used as a proxy for dML, as the latter can be computationally intractable. Indeed, on all of our length 10 codes, we found near-perfect correlation between dH and dML (see Figure 7E). We then computed correlation coefficients using dH instead of dML for the length 75 2D RF codes and corresponding comparison codes that were analyzed in Figures 4, 5, and 6. As expected, there was a significant correlation between dstim and dH for RF codes but not for the random comparison codes (see Figure 7F). It is thus likely that dstim and dML are well correlated for the large RF codes that displayed the catch-up phenomenon (see Figure 6) but not for the comparison codes.
We have seen that although RF codes are highly redundant, they do not have particularly good error-correcting capability, performing far worse than random comparison codes of matching size, length, sparsity, and redundancy. This poor performance is perhaps not surprising when we consider the close proximity between RF codewords inside , a feature that limits the number of errors that can be corrected. On the other hand, RF code redundancy seems well-suited for preserving relationships between encoded stimuli, allowing these codes to reflect the geometry of the represented stimulus space. Interestingly, RF codes quickly catch up to the random comparison codes in error-correcting capability when a small tolerance to error is introduced. The reason for this catch-up is that errors in RF codes tend to result in nearby codewords that represent similar stimuli, a property that is not characteristic of the random comparison codes. Our analysis suggests that in the context of neural codes, there may be a natural trade-off between a code's efficiency and error-correcting capability and its ability to reflect relationships between stimuli. It would be interesting to investigate whether RF codes are somehow optimal in this regard, though this is beyond the scope of this letter. Likewise, it would be interesting to test the biological plausibility of the error tolerance values that are required for RF codes to catch up. For visual orientation discrimination in human psychophysics experiments, the perceptual errors range from about 4 degrees to 12 degrees (out of 180 degrees) (Mareschal & Shapley, 2004; Li, Thier, & Wehrhahm, 2000); this is roughly consistent with a 5% error tolerance, a level that resulted in complete catch-up for the 1D RF codes (see Figure 6A).
Throughout this work, we have assumed that neurons are independent. This assumption arose as a consequence of using the BAC as a channel model for noise, which operates on each neuron independently (see section 3.3). While somewhat controversial (Schneidman, Bialek, & Berry, 2003), some experimental evidence supports the independence assumption (Gawne & Richmond, 1993; Nirenberg, Carcieri, Jacobs, & Latham, 2001), in addition to a significant body of theoretical work suggesting that ignoring noise correlations does not have a significant impact on the decoding of neural population responses (Abbott & Dayan, 1999; Averbeck & Lee, 2004; Latham & Nirenberg, 2005). Nevertheless, it is quite possible that the error-correcting capabilities of RF codes may increase (or decrease) if this assumption is relaxed (Averbeck, Latham, & Pouget, 2006). It would thus be interesting to explore a similar analysis for channel models that produce correlated noise, though this is beyond the scope of this letter.
We have also assumed a perfect understanding of the encoding map; however, it is possible that error-correcting capabilities vary significantly according to what aspect of the stimulus is being represented, similar to what has been found in information-theoretic analyses (Nemenman, Lewen, Bialek, & de Ruyter van Steveninck, 2008). Furthermore, in assessing the error-correcting properties of RF codes as compared to random comparison codes, we used a decoder that was optimal for all codes. If instead we used a biologically motivated decoder, such as those suggested in Deneve et al. (1999) and Beck et al. (2008), the performance of the random comparison codes may be significantly compromised, leading to a relative improvement in error correction for RF codes.
Mathematical coding theory has been very successful in devising codes that are optimal or nearly optimal for correcting noisy transmission errors in a variety of engineering applications (MacWilliams & Sloane, 1983; Wicker, 1994; Huffman & Pless, 2003). We believe this perspective will also become increasingly fruitful in neuroscience, as it provides novel and rigorous methods for analyzing neural codes in cases where the encoding map is relatively well understood. In particular, mathematical coding theory can help to clarify apparent paradoxes in neural coding, such as the prevalence of redundancy when it is assumed that neural circuits should maximize information. Finally, we believe the coding theory perspective will eventually provide the right framework for analyzing the trade-offs that are inherent in codes that are specialized for information transfer and processing in the brain.
Appendix A: ML and MAP Decoding
A.1. ML Decoding on the BAC.
A.2. Comparison of ML and MAP Decoding Using Bayes’ Rule.
In our simulations with 2D RF codes, we have found that the above MAP approximation outperforms ML decoding when codewords in the distribution of transmitted words are weighted in a manner dictated by the sparsity of the code (see Figure 3). What if the codeword distribution is instead weighted by the sizes of the stimulus space regions corresponding to each codeword? In this case, Figure 8 shows that ML decoding outperforms the MAP approximation, further justifying our use of ML decoding in our analysis of the error-correcting properties of RF codes.
A.3. Failure of the Triangle Inequality for dML.
Appendix B: Details of the Simulations
B.1. Generation of 1D RF Codes.
To generate the 1D RF codes used in our simulations, we took the length of the stimulus space to be 1 and identified the points 0 and 1 since the stimuli represent angles in . Each receptive field (tuning curve) was chosen to be an arc of the stimulus space. We chose our receptive fields to have a constant radius of 0.08, which corresponds to a radius of 14.4 degrees in the orientation selectivity model. This parameter matches that in Somers, Nelson, and Sur (1995), where tuning curves in the visual cortex were set to have half-width-half-amplitudes of 14.9 degrees, based on experimental data from Watkins and Berkley (1974) and Orban (1984). Each receptive field was specified by its center point. We used 75 receptive fields to cover the stimulus space, and so our codewords had length 75. The centers of the receptive fields were selected uniformly at random from the stimulus space, with the following modification: while the stimulus space remained uncovered, the centers were placed randomly in the uncovered region. This modification allowed us to guarantee that the stimulus space would be covered by the receptive fields; we used a fine grid of 300 uniformly spaced test points to find uncovered regions in the stimulus space.
By examining all pairwise intersections of receptive fields, we found all the regions cut out by the receptive fields, and each such region defined a codeword (see Figure 1A). Note that each codeword corresponds to a convex region of the stimulus space. The center of mass of a codeword is the center point of the interval to which the codeword corresponds.
B.2. Generation of 2D RF Codes.
To generate the 2D RF codes used in our simulations, we took the stimulus space to be a square box environment. Each receptive field was the intersection of the stimulus space with a disk whose center lay within the stimulus space. All disks were chosen to have the same radius; this is consistent with findings that place fields in the dorsal hippocampus are generally circular and of similar sizes (Jung, Weiner, & McNaughton, 1994; Maurer, Vanrhoads, Sutherland, Lipa, & McNaughton, 2005). We chose the radius of our receptive fields to be 0.15, that is, 15% of the width of the stimulus space, to produce codes having a reasonable sparsity of . As with the 1D RF codes, we generated 75 receptive fields to cover the space, with each receptive field identified by its center point. In our simulations, the center points of the receptive fields were dropped uniformly at random in the stimulus space, with the same modification as for the 1D RF codes: while the space remained uncovered, the centers of the disks were placed uniformly at random in regions of the space that had yet to be covered. We used a fine grid of uniformly spaced test points to find uncovered regions in the stimulus space.
Again, by examining all intersections of receptive fields, we found all regions cut out by the receptive fields, and each region defined a codeword (see Figure 1B). Unlike with the 1D RF codes, however, the codeword regions in the 2D RF codes were not guaranteed to be convex or even connected subsets of the stimulus space, although the typical region was at least connected. For the purpose of defining a stimulus space distance on these codes, we defined the center of mass of a codeword to be an appropriate approximation of the center of mass of the region corresponding to the codeword, regardless of whether that center lay within the region. When the codeword region was large enough to contain points from the fine grid, we took the center of mass of the codeword to be the center of mass of the grid points contained in the codeword region. A small number of codewords had regions that were narrow crescents or other small shapes that avoided all grid points; in these cases, the center of mass of the codeword was taken to be the center of mass of the receptive field boundary intersection points that defined the region.
For Figure 7, we generated 10 new 2D RF codes of length 10. For these smaller codes, the radius was chosen to be 0.25 to ensure reasonable coverage of the space. All other parameters were as described above.
B.3. Details of Error Correction Simulations.
As a result of the chosen receptive field radii, the mean sparsity of the 1D RF codes was s=0.165, while the mean sparsity of the 2D RF codes was s=0.069. To test how effective each of these types of codes was compared to the random codes with matched parameters, we chose to make the error probabilities as high as possible while still abiding by our BAC channel constraints and maintaining a reasonable value for the expected number of errors in each transmission. Thus, we set q=0.20 for the 1D RF codes and q=0.10 for the 2D RF codes.14 To test the performance of these codes over varying degrees of channel asymmetry, the value of p was chosen to range from 0.05 to 0.15 in increments of 0.01 for the 1D RF codes, while p ranged from 0.01 to 0.06 in increments of 0.005 for the 2D RF codes.
C.C. was supported by NSF DMS 0920845 and an Alfred P. Sloan Research Fellowship. V.I. was supported by NSF DMS 0967377 and NSF DMS 1122519. K.M. was supported by NSF DMS 0903517 and NSF DMS 0838463. Z.R. was supported by Department of Education GAANN grant P200A060126. J.L.W. was supported by NSF DMS 0903517.
In the coding theory literature, the rate of a code of length n is given by , so that the redundancy as we have defined it is simply 1 minus the rate. Because “rate” has a very different meaning in neuroscience than in coding theory, we will avoid this term and use the notion of redundancy instead.
In the vision literature, the term receptive field is reserved for subsets of the visual field. Here we use the term in a more general sense that is applicable to any modality, as in Curto and Itskov (2008).
Note that this is distinct from the notion of “dimension of a code” in the coding theory literature.
If the same permutation were used to shuffle all codewords, the resulting permutation-equivalent code would be nothing more than the code obtained from a relabeling of the neurons.
In engineering applications, one can always assume the encoding map is deterministic. In the neuroscience context, however, it may be equally appropriate to use a probabilistic encoding map.
Although many of the overlap regions will be nonconvex, instances of the center of mass falling outside the corresponding region will be rare enough that this pathological case need not be considered.
In all of our decoders, we assume that ties are broken randomly, with uniform dis- tribution on equally optimal codewords.
In cases where the distribution of stimuli is not uniform, our analysis would proceed in exactly the same manner with one exception: instead of using the ML decoder, which may no longer be optimal, we would use the MAP decoder or an appropriate approximation to MAP that is tailored to the characteristics of the codeword distribution.
Note that this situation would be equally problematic if we considered the full firing rate information instead of a combinatorial code. This is because small changes in firing rates would tend to produce equally valid codewords, making error detection and correction just as difficult.
Another distance measure on neural codes was recently introduced in Tkačik, Granot-Atedgi, Segev, and Schneidman (2012).
This definition does not explicitly depend on the channel parameters, although details of the channel are implicitly used in the computation of P(rec|sent).
On the binary symmetric channel (BSC), Hamming distance does measure the likelihood of two codewords being confused after ML decoding of errors introduced by the channel.
In addition to the simulations shown here with the above parameters, we also tested the code performance over a range of both larger and smaller values of q and obtained similar results.