## Abstract

Within a given brain region, individual neurons exhibit a wide variety of different feature selectivities. Here, we investigated the impact of this extensive functional diversity on the population neural code. Our approach was to build optimal decoders to discriminate among stimuli using the spiking output of a real, measured neural population and compare its performance against a matched, homogeneous neural population with the same number of cells and spikes. Analyzing large populations of retinal ganglion cells, we found that the real, heterogeneous population can yield a discrimination error lower than the homogeneous population by several orders of magnitude and consequently can encode much more visual information. This effect increases with population size and with graded degrees of heterogeneity. We complemented these results with an analysis of coding based on the Chernoff distance, as well as derivations of inequalities on coding in certain limits, from which we can conclude that the beneficial effect of heterogeneity occurs over a broad set of conditions. Together, our results indicate that the presence of functional diversity in neural populations can enhance their coding fidelity appreciably. A noteworthy outcome of our study is that this effect can be extremely strong and should be taken into account when investigating design principles for neural circuits.

## 1 Introduction

Neurons are complex objects, bewildering in their anatomical and functional diversity. Neuroscience has managed to bring order to this chaos by recognizing that functional properties are often organized into spatial maps, where local neighborhoods have similar tuning, and that within a local neighborhood, the shapes of neurons often come in stereotyped patterns that can be divided into cell types. Still, local neural circuits are generally made up of neurons from many cell types, and, hence, a high degree of functional heterogeneity is present in them. To cite just two examples from visual areas, the retina is tiled by as many as 40 types of ganglion cells, so that any spot in visual space is monitored by a large number of ganglion cell types (Azeredo da Silveira & Roska, 2011; Baden et al., 2016; Robles, Laurell, & Baier, 2014; Seung & Sumbul, 2014). In turn, the diversity of their light responses results from the diversity in presynaptic circuits, made up of a dozen types of bipolar cells (Ghosh, Bujan, Haverkamp, Feigenspan, & Wassle, 2004) and over 25 types of amacrine cells (MacNeil & Masland, 1998; Marc et al., 2013). Primary visual cortex has great local diversity in receptive field shapes and spatial frequency tuning (Bonin, Histed, Yurgenson, & Reid, 2011; Ringach, Shapley, & Hawken, 2002), as was apparent even in the earliest studies (Hubel & Wiesel, 1962). In addition to the functional diversity that comes from dividing neurons into cell types, there is also variability within a given cell type, which comes from fluctuations in the morphology and chemical makeup of neurons, as well as fluctuations in the connectivity of local circuits (Asari & Meister, 2012; Brenowitz & Regehr, 2007; Dobrunz & Stevens, 1997, 1999; Prinz, Bucher, & Marder, 2004). Thus, the processing of information within local circuits is subject to both the heterogeneity that exists among cell types and the “random heterogeneity” coming from fluctuations in connectivity and biochemical processing within the same cell type.

One is naturally led to ask in which ways the heterogeneity among neurons participates in information processing. On the one hand, one can argue that a population of identical cells would favor coding by allowing the transmission of a well-averaged signal related to the tuning properties of the neurons. In this picture, random heterogeneity is a bug that results from a developmental inability to generate perfectly ordered circuits. But the fact that there exist perfectly ordered neural circuits in nature, such as the invertebrate ommatidia, and that in vertebrates, there appears to be a higher degree of heterogeneity in higher brain areas together indicate that this argument is too simplistic. On the other hand, one expects that heterogeneity can benefit coding because it endows a neural population with a broader range of “information sensors.”

In recent years, a number of studies have substantiated this latter line of thought. Analyses of studies of visual (Chelaru & Dragoi, 2008; Kastner, Baccus, & Sharpee, 2015; Osborne, Palmer, Lisberger, & Bialek, 2008), auditory (Holmstrom, Eeuwes, Roberts, & Portfors, 2010), and olfactory (Tripathy, Padmanabhan, Gerkin, & Urban, 2013) neurons have demonstrated that heterogeneity can enhance the processing of information by allowing for combinatorial codes. Theoretical work has also shown that splitting retinal ganglion cells into ON and OFF subpopulations is more efficient than splitting them into same-polarity subpopulations with different thresholds (Gjorgjieva, Sompolinsky, & Meister, 2014). Heterogeneity has also been shown to affect the dynamics of neural assemblies and thereby improve their coding properties (Hunsberger, Scott, & Eliasmith, 2014; Lengler, Jug, & Steger, 2013; Mejias & Longtin, 2012). In the case of coding with a population of broadly tuned neurons, cell-to-cell variability can counteract the harmful effects of correlated noise on the coding precision (Ecker, Berens, Tolias, & Bethge, 2011; Shamir & Sompolinsky, 2006; Wilke et al., 2001). Finally, increased heterogeneity of neural activity has been shown to correlate with better behavioral performance in a visual image recognition task (Montijn, Goltstein, & Pennartz, 2015).

Here, we provide complementary demonstrations that heterogeneity can benefit the coding of information appreciably. We analyze the activity of large populations of retinal ganglion cells in response to both artificial and natural stimuli, and we show that their heterogeneity is responsible for improved coding of visual information. A new aspect of this result, which we emphasize here, is that this effect—namely, the enhancement of coding accuracy due to heterogeneity—can be very large quantitatively and, hence, is a factor to take into consideration for understanding both the performance and design of neural codes. We also show that the Chernoff distance, another useful measure of coding fidelity (Kang & Sompolinsky, 2001), is appreciably larger in heterogeneous than homogeneous populations. Finally, we examine simple models that help us explore the mechanisms by which heterogeneity favors coding. We formulate mathematical arguments that demonstrate that heterogeneity is favorable quite generally. These arguments do not rely on particular forms of the tuning properties of neurons or other specific assumptions. Together, our results suggest that functional heterogeneity is not a bug but a feature of neural population codes.

## 2 Results

### 2.1 Choice of Stimuli and Population Responses

Any analysis of the role of heterogeneity in a population code must make some choice about the class of stimuli or experimental conditions to be studied. We start our analysis of the retinal population code with the case of spatially uniform stimulation. The virtue of this choice is that under these visual conditions, every retinal ganglion cell experiences exactly the same input. Thus, any differences in the response among cells can be attributed entirely to their heterogeneity. We use random flicker stimulation, where we sample a broad distribution of all possible temporal patterns of light intensity without imposing potentially limiting conditions from the start. Specifically, we randomly draw a value of light intensity from a gaussian distribution on a fast timescale (30 ms).

Under this stimulus ensemble, the response of retinal ganglion cells depends on the history of the stimulus going back over several hundred milliseconds into the past (Chichilnisky, 2001; Fairhall et al., 2006; Warland, Reinagel, & Meister, 1997). Given a frame time of 8.33 ms, this implies that the relevant stimulus is a vector of about 30 or more light intensity values. The entropy of these stimulus patterns is large enough that the same stimulus essentially never repeats under realistic experimental conditions. Thus, we can assume that the stimulus preceding every time bin in the neural response is different. Our task will be to use the activity of the retinal population to distinguish among stimuli. For the case of two discrete stimuli, this task amounts to distinguishing the set of firing probabilities in the population elicited at one point in time, $t1$, which we call the “target” stimulus, from that elicited at another point in time, $t2$, called the “distracter” (see Figure 1). We measure the firing probability in small time bins (20 ms) for each neuron and we denote the set of firing probabilities in response to the target stimulus as ${pi}$ and those in response to the distracter stimulus as ${qi}$, where $pi$ and $qi$ are the firing rates of the $i$th cell in the two conditions, respectively.

Figure 2 shows the firing rate of populations of retinal ganglion cells under stimulation by either a 30 sec segment of spatially uniform flicker or a 120 sec natural movie clip. As has been reported before, the firing rate for an individual ganglion cell was vanishing at most points in time and then rose and fell rapidly in a sparse set of firing events (Berry, Warland, & Meister, 1997). The firing rate averaged over the entire population was highly heterogeneous across time (see Figure 2B). In addition, there were appreciable differences among cells in their overall firing rate, averaged across the entire stimulus ensemble. This led to a broad distribution of overall firing rates across cells, ranging from a lower bound of 0.01 spikes/sec (below which we do not trust our spike sorting) up to almost 10 spikes/sec; most cells had an overall firing rate of less than 1 spike/sec (median rate for natural movies $=$ 0.27 spikes/sec; see Figure 2C). As a result of these two properties, the distribution of firing rates across cells and time bins was quite broad, with a prominent peak at 0, as the cells were mostly quite sparse, and a tail extending up to over 100 spikes/sec that approximately followed an exponential function (see Figure 2D). (This nearly exponential dependence has also been observed in the visual cortex. As this distribution has maximum entropy at a fixed mean, it has been suggested that this distribution represents a form of efficient coding using the firing rates of individual cells (Baddeley et al., 1997).) Similar forms of sparseness and heterogeneity have been observed in other neural systems as well (Chechik et al., 2006; Shoham, O'Connor, & Segev, 2006; Weliky, Fiser, Hunt, & Wagner, 2003). Together, these observations suggest that the retinal data that we analyze here have a structure to their population code similar to that in other brain regions.

### 2.2 How Relevant Is Heterogeneity for Population Coding?

In order to quantify the computational relevance of heterogeneity, we compared the performance of our measured neural population against an equivalent homogeneous population. Because the coding performance must increase as larger populations or higher overall firing rates are considered, we constructed our equivalent homogeneous populations so as to always have the same number of cells and average number of spikes as our real populations. Hence, every neuron in the homogeneous population had a firing rate equal to the average firing rate of the neurons in our measured population. Specifically, the firing probability in the homogeneous population given the target stimulus was $p\xaf=1N\u2211ipi$, and similarly for distracter stimuli.

To gain more intuition into this approach to formalizing the question, we can imagine a population of retinal ganglion cells with either the same or different contrast tuning functions. Then the question is whether one can better discriminate two contrasts using the heterogeneous or homogeneous population. Similarly for the visual cortex, we can imagine a population of neurons with either the same or different orientation tuning being used to discriminate between two different orientations. The answer to these questions will depend on the choice of contrasts and orientations, so we must carry out the calculation for many different choices of stimulus pairs. Of course, in our design, the two different stimuli do not differ by a single, known parameter, like contrast or orientation. However, this choice has the benefit of improving the generality of our possible conclusions because they apply over a broad range of realistic visual conditions rather than to a single, tightly controlled experimental task.

For a given pair of stimuli at times ($t1,t2$), we constructed a maximum likelihood decoder to distinguish the population activity elicited by the target stimulus, ${pi}$ from the activity elicited by the distracter, ${qi}$ (see section 4). A similar calculation was performed for the matched homogeneous population. Unsurprisingly, the resulting error rates depended very strongly on the particular choice of ($t1,t2$), as the neural activity elicited by some stimuli was quite high, while for many other stimuli, the population was not very active. To organize our results, we plotted the error rate as a function of the difference in firing probability for target and distractor stimuli, $\Delta \u2261|p\xaf-q\xaf|$, as we expect that the error rate should depend strongly on this quantity (see Figure 3). Indeed, under spatially uniform flicker, the error for the homogeneous population, $\u025b1$, ranged from close to 0.5 (chance level) down to less than 10$-3$ for pairs of time points with very different firing probabilities (see Figure 3A, open red circles). The error rate for the real, heterogeneous population, $\u025bN$, was strikingly lower, often by several orders of magnitude (see Figure 3A, solid blue circles). In order to focus more specifically on the difference in error rate between the homogeneous and heterogeneous populations, we calculated the ratio of the error rate for each pair of stimuli sampled, $\u025b1/\u025bN$. This ratio varied from close to one all the way up to almost 10$6$ (see Figure 3B). (Note that the maximum measurable error ratio was limited by the numerical methods that we used to sample errors.) We found this extreme difference to be surprising and noteworthy. A similar pattern of the error rate, as well as similar values of the error, was found under stimulation by natural movie clips (see Figures 3C and 3D), indicating that this result is not specific to spatially uniform flicker.

We note that the finite sampling from which we estimate the experimental firing probability can, in itself, give rise to an apparent benefit of heterogeneity. One reason is that even if the true firing probability of a cell is identical for two stimuli, noise in the neural response will cause the estimated firing probabilities to differ. Another reason is that finite sampling will slightly enhance the degree of heterogeneity. One way to estimate the significance of this effect is to create a matched homogeneous neural response for two stimuli and resample the firing probabilities of all the cells via bootstrapping. The resulting resampled population will have different estimated firing probabilities for all of the cells, and hence should have a lower discrimination error due to finite sampling alone. We carried out this procedure using 10 bootstrap resamples for each stimulus pair (see section 4) and calculated the ratio of the error for the homogeneous population and the error for the resampled homogeneous population. For spatially uniform flicker, where we had 300 stimulus repeats, this effect was very small (see Figure 3B, gray circles). For the natural movie, where we had only 70 repeats, this ratio was slightly larger but still far below the effect of real heterogeneity in the overwhelming majority of instances (see Figure 3D gray circles).

While the error rates in the homogeneous population were strongly determined by the difference in firing probabilities, $\Delta $, the error rates in the heterogeneous population varied widely for a given value of $\Delta $, especially those near zero. Inspection revealed that pairs of stimuli with very low error and $\Delta $ close to zero had neural activity with large but similar means for target and distracter stimuli. Such patterns of neural activity were nearly indistinguishable for the matched homogeneous population, but because different neurons were active in each activity pattern, the error was very low in the real, heterogeneous population.

Another way of quantifying the effect of heterogeneity is to calculate the mutual information per cell between pairs of target and distracter stimuli (see section 4). Carrying out this analysis, we found that the information for the matched homogeneous population had a substantial range, even at a given firing difference, $\Delta $ (see Figure 4A). But, again, the information for the real, heterogeneous population was larger and had an even greater range at a given $\Delta $. The results for the natural movie clip were similar to those for spatially uniform flicker (see Figure 4B). Plotting the heterogeneous information, $IN$, versus the homogeneous information, $I1$, showed that heterogeneity always improved the mutual information per cell, sometimes by large factors (see Figures 4C and 4D). As for the discrimination error, we estimated the effect that finite sampling has on increasing the mutual information by resampling the firing probabilities of the matched homogeneous population, which increased the information only marginally above the homogeneous value (see Figures 4C and 4D, gray dots). For many pairs of stimuli, the heterogeneous information was several orders of magnitude larger than the homogeneous information. This was especially true for activity patterns with nearly the same average firing probability (see Figures 4E and 4F), similar to our results for the discrimination error (see Figure 3).

Even at large $\Delta $, where there was substantial information in the homogeneous population, heterogeneity often enhanced the information per cell by factors of two or more. We emphasize that since this information is calculated per cell, even “small” multiplicative factors such as these imply an appreciable enhancement in the information contained in the entire population.

Next, we asked whether the effect of heterogeneity was greater when the population activity was higher or lower. To this end, we displayed the ratio of information in heterogeneous versus homogeneous populations in a color scale given by the average firing probability, $1/2(p\xaf+q\xaf)$ (see Figures 4E and 4F). We found that the ratio was systematically higher when the population activity was greater. This result can be interpreted by noting that neural activity in the ganglion cell population is sparse. As a result, the most common case is one in which both cells are silent in both time bins. In this case, there is no discriminability. When neural activity is higher, fewer cells have zero firing for both stimuli, and hence the population information is higher. This comparison also addresses the question of the importance of heterogeneity when the stimulus is well tuned to drive neural responses. In this case, average neural activity would be higher, leading to a stronger effect of heterogeneity.

Another important question is whether the effects that we report depend strongly on individual cells having perfectly reliable responses. First, cells that apparently come with perfect reliability, $pi=1$ and $qi=0$, or vice versa, are an artifact of finite data and how we estimate probabilities from those data. No cell is truly perfectly reliable. More broadly, one way to probe the role of highly reliable cells quantitatively is to calculate the maximum information encoded by a cell and compare it against the sum of information across cells. If a single cell typically dominates the population information, the maximum will be nearly equal to the sum. What we found instead was that the maximum was roughly 0.1 of the sum for stimulus pairs with the largest information. So, overall, information about different pairs of stimuli was broadly distributed throughout the population. It is still possible that a small number of cells do dominate the discriminability for particular pairs of stimuli. But single-cell coding cannot account for our conclusions.

We can extend the generality of our results by considering the Chernoff distance, as a measure quantifying coding fidelity in neural populations (Kang, Shapley, & Sompolinsky, 2004; Kang & Sompolinsky, 2001). Specifically, it describes the asymptotic limit of the mutual information that activity in a large neural population represents about an ensemble of stimuli. In this limit, the information is dominated by the distance between the “closest” pair of stimuli. Thus, the Chernoff distance can also be interpreted as a measure of coding that is defined between two stimuli. This measure ranges from zero, when no discriminability is possible, to infinity, when discrimination is perfect. Furthermore, the Chernoff distance is readily calculated in our case, the asymptotic limit (see section 4).

We found that the Chernoff distance was systematically much larger for the fully heterogeneous population than the matched homogeneous population (see Figures 5A and 5B), consistent with our results on both the discrimination error and the mutual information (see Figures 3 and 4). Part of the reason for this consistency is the fact that the Chernoff distance tracks both the error and the information (see Figures 5C and 5D). Because the Chernoff distance is closely correlated with both quantities, its evaluation helps confirm that these different measures yield consistent results on the benefit of heterogeneity in neural populations.

### 2.3 Coding Fidelity for Graded Levels of Heterogeneity

So far, we have compared the real heterogeneous neural population to a matched population where every neuron had identical stimulus tuning. While there certainly are some contexts in which it has been fruitful to analyze neural populations in terms of their average firing rate—for example, integration of sensory evidence in cortical area LIP (Roitman & Shadlen, 2002)—this is a somewhat caricatured limit. For instance, many classic studies of neural coding, such as the discrimination of the direction of random dot motion in cortical area MT (Newsome, Britten, & Movshon, 1989), have divided the neural population into two pools: neurons with the stimulus tuned to their peak or preferred direction versus neurons with tuning in the antipreferred direction. It is thus of interest to compare a fully heterogeneous population to populations with a coarser form of heterogeneity, as in the case of a population divided into preferred and antipreferred pools.

We can accomplish this by defining one pool (“preferred”) as all of the neurons that have a higher firing probability for the target stimulus, $pi\u2265qi$, and the other pool (“antipreferred”) as the remainder of the population (see Figure 6A). We then form a matched neural population with two pools by computing the average firing probability for target and distracter for the $N1$ neurons in pool 1, $p\xaf(1)$ and $q\xaf(1)$, respectively, and similarly for the $N2$ neurons in pool 2, $p\xaf(2)$ and $q\xaf(2)$ (depicted in Figure 6A by crosses). In this case, the two-pool population can be characterized by the spike count in pool 1, $k1$, and that in pool 2, $k2$, allowing us to calculate the discrimination error exactly (see section 4).

We can further subdivide the neural population into four pools. Here, we take the neurons in the preferred pool and divide them into equal groups having the top half versus the bottom half of the firing probabilities to the target stimulus (see Figure 6B). Similarly, the antipreferred pool can be divided into equal groups of neurons having the top half and the bottom half of firing probabilities for the distracter stimulus, respectively. The state of the four-pool neural population is uniquely described by four spike count variables, ${k1,k2,k3,k4}$, again allowing an exact calculation of the discrimination error. More generally, we can divide the neural population into any even number of pools, $L$, using an analogous method. For instance, we can form an eight-pool population by dividing the preferred pool into four groups with rank-ordered quartiles of target firing probabilities and the same for the antipreferred pool. For populations with more than four pools, it becomes unwieldy to perform the exact computation, so we instead relied on the same Monte Carlo sampling methods used for the real, fully heterogeneous population.

We found that the discrimination error decreased in a graded fashion as we increased the degree of heterogeneity. Figure 5C shows example plots of error versus the number of pools subdividing the population, $L,$ for several different choices of stimulus pairs. Although the error for the homogeneous case takes a range of values (as seen in Figure 3), the error decreased continuously as we divided the population into more pools.

This led us to compute another statistic, the improvement factor, $\lambda $, defined as the multiplicative factor by which the average error decreases when the number of pools is increased by a factor of 2, $\lambda L\u2261\u025bL/2/\u025bL$. The improvement factor was relatively large when dividing the homogeneous population into 2 pools ($\lambda \u223c5.5$), then settled down to a value $\lambda \u223c2--3$ up to $L=32$ pools, and finally rose again as full heterogeneity was obtained (see Figure 6D). This behavior implies that the error decreased roughly as a power law function of the pool number $L$ with an exponent in the range of two to three.

We also calculated the mutual information per cell as a function of the degree of heterogeneity. We found that it rose gradually and monotonically from less than 0.02 bit/cell for the homogeneous population to more than 0.06 bit/cell for the real, fully heterogeneous population (see Figure 6E). A similar trend was obtained under natural movie stimulation, with somewhat lower overall values, presumably due to the lower average firing rate of ganglion cells for this stimulus ensemble (see Figure 6F; $0.91\xb10.43$ spikes/sec for spatially uniform stimulation versus $0.42\xb10.18$ spikes/sec for the natural movie clip). In fact, under natural movie stimulation, the information increased linearly with log($L)$. Together, these results indicate that increasing the degree of heterogeneity in the neural population increases the fidelity of the population code, in a graded fashion.

### 2.4 Heterogeneity Arising from Different Cell Types

The natural interpretation of the pools that we have formed is that they correspond to neurons having similar tuning properties. For instance, in the case of motion direction discrimination in area MT, neurons in the preferred and antipreferred pools have opposite direction selectivity and hence belong to different direction columns. In the retina, an obvious method of choosing functional pools is to assign ganglion cells of the same functional type to a single pool. Following previous classification methods for the salamander retina, we divided the ganglion cells into eight functional types based on their reverse correlation under spatially uniform flicker (Marre et al., 2012; Segev, Puchalla, & Berry, 2006; Warland et al., 1997) (see Figures 7A and 7B).

We can explore graded levels of heterogeneity by splitting the neural population into successively more refined pools, dividing the population according to increasingly fine criteria of selectivity. For instance, it is natural to split the entire population into two pools formed from ON and OFF cells. Next, we can form four pools by dividing the OFF cells into fast, medium, and slow OFF, which have been distinguished in previous studies (Chen et al., 2013; Keat, Reinagel, Reid, & Meister, 2001; Warland et al., 1997). We can also form more than 8 pools by further splitting the 8 main cell types into 16 or 32 types (see section 4). We note that salamander ganglion cells have never previously been divided into 16 or 32 cell types, and we are not claiming that we are providing evidence that the salamander truly possesses this many functional types of ganglion cells. But we have two motivations for performing this analysis. First, the most current estimates of the total number of ganglion cell types in several mammalian species are in the range of 20 to even 40 types (Baden et al., 2016; Seung & Sumbul, 2014), considerably more than the 7 or 8 types typically described in the salamander. Second, this allows us to study greater levels of heterogeneity corresponding to finer gradations of the functional differences among neurons.

Similar to the results of our analysis of the discrimination error as a function of the number of neural pools, $L$, here the error rate decreased monotonically as we split the retinal population into more cell types. For two to eight cell types, the error ratio $\lambda $ was modest, but significantly greater than one (see Figure 7C). Interestingly, when we split the population into 16 or 32 cell types, the error ratio was substantially larger. The mutual information per cell, as before, increased monotonically with the number of cell types, again showing the largest changes at 16 and 32 cell types (see Figure 7D). This analysis suggests that the presence of a large number of cell types among the retinal ganglion cells serves a beneficial purpose for encoding visual information.

### 2.5 Scaling of the Discrimination Error with Population Size

It is interesting to ask how the effect of heterogeneity varies with the size of the neural population, both because this allows us to relate our study to many others that have involved fewer neurons and because it gives us some expectation for what might be observed with even larger populations. We studied this trend by randomly selecting subsets of our recorded ganglion cells and calculating the discrimination error. We first chose two stimulus pairs, one with moderately low discrimination error (see Figure 8A) and the other with very low error (see Figure 8B). In each case, we carried out this calculation for different degrees of heterogeneity (different colors). For both examples, the error decreased approximately exponentially with increasing population size, $N$. But notably, the rate of this decrease depended appreciably on the degree of heterogeneity, with steeper slopes for greater heterogeneity. Averaging similar calculations over many choices of stimulus pair, we found that this effect was robust (see Figure 8C). (Because the discrimination error was more naturally distributed on a logarithmic than linear scale, all averages over error rates here and elsewhere in the letter were geometric means, not arithmetic means.)

A similar qualitative trend emerged from all the analyses: the rate at which the discrimination error decreased as a function of population size, $N$, was steeper for greater degrees of heterogeneity, parameterized by the number of pools, $L$, into which the population was divided. Given the large number of neurons available in any brain area, trends as a function of population size are important properties of the population neural code. In all cases, the functional dependence appeared to follow a simple exponential form. As we will see in the following section, this behavior is expected in populations of independent neurons. Thus, we define a characteristic population size, $N*$, by fitting the discrimination error, $\u025b(N)$, to an exponential form, exp($-N/N*$). This scale, $N*$, measures the number of neurons that must be added to the population for the error to be reduced by a factor of $e$.

As the behavior of the error depended on the choice of stimulus pair as well as the number of pools, $L$, we calculated the characteristic size individually for each condition. In order to see how the error scaled with the degree of heterogeneity, we averaged values of $N*$ across all pairs of stimuli for the same value of $L$. The characteristic size varied widely from $N*=46.4\xb16.5$ (mean $\xb1$ SE, $n=55$) for the matched homogeneous populations down to $N*=8.1\xb10.77$ for full heterogeneity (see Figure 8D). After a steeper drop from one pool to two pools, the value of $N*$ was fit well by a power law form with an exponent, gamma = $-0.29\xb10.04$. Another way of thinking about this effect is as follows. If the number of neurons that process the population code is constrained, then it is advantageous to break up the population into pools with distinct functional properties. Equivalently, heterogeneity has an amplifying effect on the code, since by reducing $N*$, it enhances the effective size of the population. When there are more distinct functional pools in a population, the extra performance gained by adding each neuron is boosted.

### 2.6 Why Does Heterogeneity Improve the Population Code?

In all the examples studied in our analysis of neural data, heterogeneity reduced the error in discriminating between two stimuli and increased the mutual information per neuron. The consistency of this result naturally led us to think that the beneficial effect of heterogeneity might not be a fortuitous consequence of the statistics of neural activity in retinal ganglion cells under particular visual conditions, but might instead be a rather general property of population neural codes. To explore the effect of heterogeneity in greater generality, we examined neural population coding theoretically, from different perspectives and using different models that we describe hereafter.

#### 2.6.1 Simple Illustration: Homogeneous versus Two-Pool Populations

We begin with a homogeneous neural population of $N$ neurons, where the firing probability is $p$ for the target stimulus and $q$ for the distracter. In this simple situation, the state of the population is defined by the spike count, $k$, the number of neurons that fire among the $N$ cells in the population, and we can easily write down its probability distribution (see equations 9a and 9b). Using this result, we can calculate the discrimination error as a function of $N$, which decayed exponentially for large $N$ (see Figure 9A). As the firing probabilities, ($p,q$), are varied, the trend remains exponential, but the rate of decay changes. Similar to our analysis of real data described above, we were led to define a characteristic population size, $N*(p,q)$, as the inverse exponential rate. This function has a nontrivial and strong dependence on the firing probabilities ($p,q$), with values ranging from less than 1 for $p\u223c1$ and $q\u223c0$ and diverging as $p=q$ (see Figure 9B). We can derive an analytic formula for the quantity $N*$, which corresponds to the characteristic system size beyond which homogeneous coding becomes faithful, as a function of the firing probabilities $p$ and $q$ (see section 4). The analytic form agrees well with direct numerical calculations (see Figure 9B, solid lines).

Next, we introduced heterogeneity by splitting the homogeneous population into two pools. For clarity, we change only the firing probability for the target stimulus, $p\u2192(p1,p2)$, while keeping the mean number of spikes the same, $p=1/2(p1+p2)$. Examining the trend of error versus the number of neurons, $N$, we found that the rate of decay was the shallowest when $p1=p2$ (the homogeneous case) and became increasingly steep as the difference between $p1$ and $p2$ increased (see Figure 9C). This effect was particularly striking for $p1=0.9,p2=0.1$ (blue points); in this case, the neurons in the second pool offered no help at all in the stimulus discrimination, as the firing probability for the distracter was $q=0.1$. One way of understanding this result is by reference to the behavior of $N*(p,q)$: this function is strongly nonlinear, such that the increased separation between $p1$ and $q$ for one neural pool more than compensates for the decreased separation between $p2$ and $q$ in the other pool. In other words, the enhancement of the coding performance of one pool with neurons having better separated firing rates (from the firing rate in response to the distracter stimulus) generically exceeds the suppression of the coding performance in the other pool having more similar firing rates.

#### 2.6.2 Suppression of the Discrimination Error by Neuron-to-Neuron Variability

Intuition suggests that the above argument carries over to cases with more general forms of heterogeneity. Here, we investigate one such case, in which heterogeneity takes the form of neuron-to-neuron variability. While we find that, again, heterogeneity favors coding, the approach provides us with a complementary picture of why this is true.

#### 2.6.3 Enhancement of the Mutual Information by Neuron-to-Neuron Variability

The benefit of heterogeneity can similarly be seen by considering the behavior of the mutual information. Again, we compare a homogeneous population to a heterogeneous population. But here, we consider general neuron-to-neuron variability in firing rate; in particular, the magnitude of this variability need not be small. A limit in which one can derive a powerful and general rule is the case where the target stimulus is rare—namely, it occurs with probability $P(T)=\rho $, with $\rho \u226a1$. This is the case, for example, if one is trying to recognize one target stimulus versus all other stimuli (Schwartz, Macke, Amodei, Tang, & Berry, 2012) or if one is trying to recognize a target stimulus class that is a small subset of all possible stimuli (such as one person's face versus any other person in a large group of individuals).

## 3 Discussion

We have studied the role of functional heterogeneity in a population neural code using two mutually reinforcing approaches. First, we have analyzed experimental data from multielectrode recordings of populations of over 100 retinal ganglion cells. Here, we found that the error for discriminating between pairs of visual stimuli was often many orders of magnitude lower for the real heterogeneous population compared to a matched homogeneous population having the same number of cells and spikes (see Figure 3). Similarly, the mutual information about stimulus identity was also enhanced, often by more than an order of magnitude for the heterogeneous population versus the homogeneous one (see Figure 4). This heterogeneity effect depended strongly on the population size, with greater improvement for larger population sizes (see Figure 8).

Second, we have analyzed theoretically the fidelity of the neural population code in a simple model and in two broad limits. In one limit, we considered any arbitrary but small perturbation of each cell's firing rate away from a perfectly homogeneous population. We showed that this perturbation always decreases the discrimination error, regardless of the initial firing rates. In the other limit, we considered any possible set of firing rates within a neural population, but in the case in which the target stimulus was rare. Here, heterogeneity always increased the mutual information, that is, the information contained in the population about whether the target was present. These analytic proofs substantially increase the generality of the finding that heterogeneity benefits the neural population code.

In our analyses, we used a flexible framework in which the characteristics of the neural population were summarized by the set of firing probabilities for all cells, in response to a target stimulus, ${pi}$ and to a distracter stimulus, ${qi}$. These simple properties are readily measured in experiment and can be defined for any pair of stimuli or conditions. But at the same time, this approach does not address the potential trade-off implicit in the design of a neural circuit that attempts to achieve heterogeneous responses across an entire stimulus ensemble. Our results therefore complement previous studies that have considered this latter problem by assuming a specific form of tuning curve or receptive field model (Ecker et al., 2011; Kastner et al., 2015; Shamir & Sompolinsky, 2006; Wilke et al., 2001).

One strength of our approach is that we do not have to make any explicit assumptions about the response functions of neurons. For the benefit of tractability and concreteness, previous studies have often used models of the neural response that are incomplete or inaccurate. For instance, an orientation tuning curve for a V1 neuron does not contain any prediction about how the neuron will respond to a stimulus that is not an oriented grating, and the linear-nonlinear (LN) model of a ganglion cell's receptive field breaks down under many visual conditions (Barlow & Levick, 1965; Chen, Chou, Park, Schwartz, & Berry II, 2014; Clark, Benichou, Meister, & Azeredo da Silveira, 2013; DeVries, 2000; Olveczky, Baccus, & Meister, 2003; van Hateren, Ruttiger, Sun, & Lee, 2002). Many of the ways in which the real light responses of retinal ganglion cells deviate from simplified models, like the LN model, introduce additional heterogeneity among neurons. For instance, spatial hot spots within each receptive field reduce the redundancy of spatial information distributed among ganglion cells with similar, overlapping receptive fields (Soo, Schwartz, Sadeghi, & Berry, 2011), and realistic variations in the receptive field shape break the symmetry among ganglion cells in the same mosaic (Liu, Stevens, & Sharpee, 2009). Our results therefore imply that many of the complexities of neural circuits that are not captured by even state-of-the-art functional models can potentially play a positive role in improving the fidelity of the population neural code.

Another notable difference between our results and previous ones is the sheer effect size that we have observed: over 10-fold increases in the mutual information per neuron and over 10$5$-fold decrease in the discrimination error. One major source of this discrepancy is that we have analyzed larger populations than most previous studies have. This matters, because we have shown that the effect of heterogeneity depends strongly on the number of neurons in the population. There is every indication that this trend continues for even larger populations, making heterogeneity an even more relevant property for the realistically large neural populations that operate in many local neural circuits.

Other factors are more technical: we report the effect for discrimination between specific pairs of stimuli rather than for the average information over an entire stimulus ensemble. Since the effect of heterogeneity is, of course, negligible when neurons do not fire, averages that include frequent periods of silence will make the effect appear smaller than it is during periods of substantial neural activity. In any case, the large effect sizes that we observe point to an even greater potential than previously appreciated for neural circuits to use functional diversity in encoding information. In addition, we assessed the true coding fidelity of neural populations using maximum likelihood decoders based on measured firing rates. Conversely, studies that use cross-validated decoders cannot infer error rates that are smaller than the inverse number of trials (see section 4). However, we did estimate how much of the heterogeneity effect was due to finite sampling in our measurement of each cell's firing rate by resampling the responses of homogeneous neural populations to estimate realistic levels of heterogeneity due simply to finite sampling of our measured neural responses and then recalculating the coding fidelity. We found that our results for the effect of heterogeneity in measured neural activity were significantly larger than for the degree of heterogeneity due to finite sampling.

### 3.1 Limitations of the Current Study

While our study has made strides in demonstrating a wide range of circumstances in which heterogeneity is beneficial to the population neural code, we have left unexplored two important directions. First, we have disregarded noise correlation and its possible role in coding. This is a broad topic that has been the subject of many previous studies. Using a framework of parameterized tuning curves and Fisher information to evaluate the fidelity of coding in neural populations, several important studies have found that heterogeneity in the tuning curves can help reduce the deleterious impact of positive noise correlation or synchrony (Ecker et al., 2011; Padmanabhan & Urban, 2010; Shamir & Sompolinsky, 2006; Wilke et al., 2001). Another study added realistic levels of noise correlation to experimentally measured tuning curves and found that the benefit of heterogeneity survived (Osborne et al., 2008). Yet another study explicitly added heterogeneity to the distribution of pairwise correlations and found that this enhanced coding fidelity (Azeredo da Silveira & Berry, 2014). Taken together, these results suggest that the beneficial effects of heterogeneity will extend to the case in which a correlation structure among neurons is included, and in fact the benefits may even be enhanced.

Second, we have treated the response of each neuron as a firing probability in a small time bin (here 20 ms). The choice of this time bin is appropriate for the retinal code, as ganglion cells have roughly this level of temporal precision (Berry et al., 1997; Uzzell & Chichilnisky, 2004; van Rossum, O'Brien, & Smith, 2003). In such a small time bin, most ganglion cells fire zero or one spike. While it is possible for a cell to fire two or more spikes, most of the coding power of the neural population is contained in the binary response of each neuron (see Schwartz et al., 2012, where discrimination errors were compared for a binary code versus a spike count code). However, one limitation of this approximation is that each neuron has a fixed Poisson level of noise. It would be interesting to see how the effect of heterogeneity may vary with non-Poisson noise statistics.

### 3.2 What Is the Purpose of So Many Ganglion Cell Types?

One of the puzzles about the organization of the vertebrate retina is why there are so many different types of ganglion cells with overlapping receptive fields. Of course, some ganglion cell types project to unique brain centers and carry out qualitatively distinct visual computations, like ON direction-selective cells that project only to the accessory optic system (Vaney, Peichl, Wassle, & Illing, 1981) in the brain stem and convey an error signal corresponding to retinal image slip to the cerebellum, relevant to adjusting the gain of the vestibulo-ocular reflex (Raymond, Lisberger, & Mauk, 1996). Other cell types appear to have a clear function, such as the M1 melanopsin-containing cell, which measures light level over a long integration time (Berson, 2003) and projects not only to the suprachiasmatic nucleus, where it helps entrain the circadian rhythm, but also to other brain regions, like the superior colliculus (Hattar et al., 2006). However, many ganglion cell types project to the two major visual brain centers: the lateral geniculate nucleus and the superior colliculus (Berson, 2008; Dacey, Peterson, Robinson, & Gamlin, 2003). Visual information encoded by these different cell types are then combined by downstream neural circuits, for example, in the primary visual cortex, yielding a modified code representing the same region of visual space (Berson, 2008; Rodieck, 1979). So the question remains: Why are there so many ganglion cell types?

Our work offers one possible interpretation: a multiplicity of cell types helps to form a heterogeneous population code that can represent visual information more faithfully or over a broader range of stimulus patterns, as compared to a neural code using the same number of less diverse neurons. Specifically, we found that when we divided our recorded ganglion cell population into more and more cell types, the fidelity of the population code increased (see Figure 7). A functionally broad array of “sensors” allows a diverse population to capture more aspects of the input. By contrast, in a less diverse population, noise in the output of the subsets of identical sensors is averaged out more thoroughly. Our results indicate that this trade-off is biased significantly in favor of diversity. Because the effect of heterogeneity is so strong, it overcompensates for the deleterious effect of increased noise.

One mechanism to achieve functional diversity in neural populations is that of developmental noise: neurons can have the same genetic program that determines their synaptic contacts, but various sources of biophysical noise can still cause some degree of variability in the cell's synaptic circuit. However, this mechanism might not be sufficient to harness the full benefits of functional heterogeneity; components of, for instance, thermal noise acting on a scale much more modest than that of the neuron will sum to a small collective effect due to the law of large numbers. Instead, a better mechanism might be to have a set of different developmental programs that force neurons within the population to specialize their function and thereby achieve greater heterogeneity (see Figure 10). Given the broad range of conditions in which heterogeneity benefits the population neural code, developmental noise can then be expected to provide additional benefit even in a population divided into many cell types. And, in fact, we found that the experimental, fully heterogeneous population substantially outperformed a code with 32 cell types (in Figure 6C, the visual information encoded was 0.061 bit/cells for $L=111$ cells, while in Figure 7D, the information was 0.032 bits/cell for $L=32$ cell types). Developmental noise may thus be capitalized on by downstream circuits to represent information at a finer resolution.

## 4 Methods

### 4.1 Multielectrode Recording

We used a multielectrode array to record spike trains from large populations of retinal ganglion cells in the larval tiger salamander, a method that has been described elsewhere (Marre et al., 2012; Puchalla, Schneidman, Harris, & Berry, 2005). In brief, we euthanized animals according to institutional standards (IACUC protocol 1828: rapid decapitation following ice water anesthesia), dissected the retina out of the eye, cut a piece of a size roughly one-third of the entire retina, and placed the tissue ganglion-side down against the array. Retinas were held in place with a dialysis membrane that was mounted on a gantry that allowed precise vertically displacements by turning a screw. Oxygenated Ringer's solution was perfused over the tissue to keep it alive for many hours. Spike sorting was carried out with a custom-written algorithm (Marre et al., 2012).

### 4.2 Visual Stimulation

Visual stimuli were generated on a computer monitor whose light was focused on the photoreceptor layer of the retina (Puchalla et al., 2005). The mean light level was 11 mW/m$2$, which corresponds to phototopic vision in the salamander. Spatially uniform flicker consisted of light intensity values that were randomly drawn from a gaussian distribution every 8.33 ms. The width of gaussian distribution defined a temporal contrast of 33% of the mean. A 30 sec segment was repeated 300 times. The natural movie consisted of fish swimming in a tank against a background of aquatic plants—a visual environment that the larval tiger salamander encounters in its natural life cycle. Example frames of a similar movie have appeared elsewhere (Tkacik et al., 2014). A 120 sec segment of this movie was repeated 70 times.

### 4.3 Maximum Likelihood Decoding

The activity of a neural population of $N$ cells in a given time bin is denoted by $R={ri}$, where $ri$ is the activity of neuron $i$. The firing probability in each time bin was estimated from the peristimulus time histogram (PSTH) over many repeated presentations of the same stimulus (see above). In our treatment, we considered only binary neural activity in a single time bin, $ri=(0,1)$. This is expected to be a good approximation for small time bins, $\Delta t$, such as the 20 ms bins used in this study. In fact, a previous study found that even with 100 ms time bins, this binary approximation captured most of each ganglion cell's visual information about a spatial coding task (Schwartz et al., 2012).

Notice that our method does not rely on cross-validated decoders. The reason is that error rates smaller than $\u223c$1/# trials cannot be estimated by cross-validation methods. This means that the estimate of the error rate will be artificially constrained by the practical details of our neurophysiology experiment. These details do not apply to how the animal uses its own neural populations, as the animal potentially has access to much longer sampling periods. Because we are interested in what is the true coding fidelity of neural populations, we constructed maximum likelihood decoders based on the measured firing rates of each cell. Of course, with any finite sample of measured neural responses, there will be uncertainty in the firing rate of each cell. We address this issue with bootstrap resampling, as described in section below 4.5.

### 4.4 Selection of Stimulus Pairs

For many purposes, we wanted to average over a selection of many pairs of time points representing the population neural response to different pairs of visual stimuli. As it was not practical for us to sum over all possible pairs, we selected a representative subset of times covering the full range of average population firing rates. The resulting averages allow for significant comparisons among conditions, such as number of pools $L$ or neurons $N$, but should not be interpreted as true ensemble averages for each stimulus conditions.

### 4.5 Correction for Sampling Bias

Even if all neurons had exactly the same underlying firing probability for a given stimulus, we would observe some heterogeneity in their estimated firing probabilities due to finite sampling. This heterogeneity, while spurious, would appear to benefit coding. In order to evaluate the significance of this effect, we started with a matched homogeneous population and used bootstrap resampling to generate an apparently heterogeneous population. Specifically, if each cell has an average firing probability of $p$ for a given stimulus and if our experiment had $M$ repeated trials of this stimulus, then the total spike count would be $ncount=Mp$. We expect that this resampling procedure will generate fluctuations in the apparent spike count $\u223c$sqrt($ncount)$, which will change the firing probability by $\u223c$sqrt($p/M)$ in each cell, while preserving the same average firing probability. We then calculated the discrimination error and mutual information between stimuli for these apparently heterogeneous neural populations.

### 4.6 Discrimination Error in a Homogeneous Population of Neurons

### 4.7 Model of Neuron-to-Neuron Variability

### 4.8 Mutual Information for a Heterogeneous Population of Neurons

*log sum inequality*. This completes the demonstration that the mutual information between neural response and stimulus is smallest in the case of a uniform population, to first order in the probability of occurrence of a rare target stimulus.

### 4.9 Chernoff Distance

In the limit of $p\u2192$ 1 and $q\u2192$ 0, or vice versa, the Chernoff distance diverges. Of course, this is not a conceptual problem because with finite sampling, one cannot have confidence that $p\u2192$ 1 or $q\u2192$ 0. However, the divergence in the Chernoff distance was not numerically well behaved, in the sense that choosing $p=1-\delta $ would have values that depended strongly on $\delta $. For this reason, we left these stimulus pairs out of Figure 5. Such divergences were quite rare. In the spatially uniform stimulus, there were 526 out of 166,500 time points with $p=1$, which is 0.3% of all times; in the natural movie, there were only 66 out of 924,000 times points with $p=1$, which is 0.007% of all times.

## Appendix A: Cell Types

Following previous studies, we used the reverse correlation to random flicker stimulation to group ganglion cells into functional types (Marre et al., 2012; Segev et al., 2006). This study differs somewhat from previous studies in that we used spatially uniform flicker, which engages both the receptive field center and surround, while previous studies mostly used checkerboard flicker and found the temporal profile of the center alone. This difference could well have influenced our results.

Previous studies have found six to eight functional types. In a similar vein, we identified eight types here (see Figures 7A and 7B). These included fast, medium, and slow OFF, as well as fast, medium, and slow ON, as have been found in most previous studies. We also identified an OFF type with a reverse correlation barely different from average, which we called a “weak OFF” type. This may correspond to weak receptive field cells seen previously (Marre et al., 2012). Unlike previous studies, we also found an ON type with an uncommonly large reverse correlation, which we named “big ON.” The separation of the ganglion cell population into these eight functional types can be visualized by plotting the average reverse correlation of all cells of the same type along with error bars showing the standard error at each time point (see Figures 7A and 7B). The fact that these standard errors are clearly well separated at many time points serves as an a postieriori justification for our classification.

In order to study greater degrees of heterogeneity, we further split these 8 functional types into as many as 32 types (see Figure 11). This was accomplished by grouping together sets of ganglion cells with exceptionally similar reverse correlations or with qualitatively unusual features, such as a double-peaked, monophasic structure (e.g., type II double). The purpose was to further subdivide the neural population. When we plotted the average reverse correlation of cells within the same fine cell type, we found that their standard error was well separated from that of other fine cell types at many time points, as shown for our primary classification of 8 cell types. Here, we do not present any further evidence that these are true cell types that generalize across multiple retinas; indeed, some “types” comprise just a single cell.

Fast OFF cells were subdivided into eight subtypes (see Figure 11A): regular (or monophasic) OFF ($n=9$); biphasic OFF cells ($n=7$), which have been described before; big OFF cells ($n=3$), with a large-amplitude reverse correlation; type Ia ($n=4$), with a late peak in the reverse correlation; type Ib ($n=6$), with a slightly larger reverse correlation; type Ic ($n=3$), with a late shoulder to its reverse correlation; type Id ($n=2$), with a smaller shoulder; and small OFF ($n=2$), with a smaller-amplitude reverse correlation.

Medium OFF cells were subdivided into 11 subtypes (see Figures 11C and 11D): type II regular ($n=8$); type II slow ($n=3$), with a longer latency peak in the reverse correlation; type II fast ($n=2$), with a shorter latency peak; type II big ($n=2$), with a larger-amplitude reverse correlation; type II great ($n=1$) with a larger and broader reverse correlation; type II double1 ($n=3$), with a second narrow peak in the reverse correlation; type II double 2 ($n=2$), similar to double1 but with a larger amplitude; type II lobe ($n=2$), with a pronounced, late biphasic peak in the reverse correlation; type II weak1 ($n=2$), with a very small amplitude and somewhat biphasic reverse correlation; type II weak2 ($n=2$), with a small amplitude, monophasic reverse correlation; type II outlier ($n=1$), with a double-peaked reverse correlation.

Slow OFF cells were subdivided into three subtypes (see Figure 11B): regular ($n=5$), big ($n=4$), and rebound ($n=3$), with a pronounced, second peak in the reverse correlation. Weak OFF cells were divided into three subtypes (see Figure 11B): biphasic ($n=7$), monophasic ($n=5$), and unresponsive ($n=2$). ON cells had fast ($n=3$), medium ($n=8$), and big ON ($n=5$) subtypes, as before, but the remaining cells were subdivided into three outliers (data not shown): slow ON1, slow ON2, and biphasic ON.

In order to form 16 cell types, we merged several fine cell types together into four fast OFF types, four medium OFF types, three slow OFF types, one weak OFF type, and four ON types. For fast OFF cells, types Ia and Ib were merged ($n=10$); types Ic, Id, and big were merged ($n=10$); and regular OFF ($n=9$) and biphasic OFF ($n=7$) remained the same. For medium OFF cells, types II regular, II slow, and II fast were merged into type II core ($n=13$); types II double1, II double2, II big, and II outlier were merged into type II double ($n=8$); types II weak1, II weak II, and II great were merged into type II other ($n=5$); type II lobe ($n=2$) remained the same. For slow OFF cells, regular and big types were merged into slow OFF type ($n=9$); slow OFF rebound type ($n=2$) remained the same; three outlier cells were split away into slow OFF outlier type. Weak OFF cells were all merged together ($n=14$), and ON cells remained the same.

## Acknowledgments

M.B. acknowledges support from NEI grant EY014196 and NSF grant 1504977, and R.AdS. acknowledges support from Princeton University through the Global Scholars Program and from the CNRS through UMR 8550.

## References

## Author notes

^{*}

A.Z. is currently at SRI International, Princeton, NJ U.S.A.; F.L. is currently at Saint-Gobain, Paris, France.