Neurons integrate information from many neighbors when they process information. Inputs to a given neuron are thus indistinguishable from one another. Under the assumption that neurons maximize their information storage, indistinguishability is shown to place a strong constraint on the distribution of strengths between neurons. The distribution of individual synapse strengths is found to follow a modified Boltzmann distribution with strength proportional to . The model is shown to be consistent with experimental data from Caenorhabditis elegans connectivity and in vivo synaptic strength measurements. The dependence helps account for the observation of many zero or weak connections between neurons or sparsity of the neural network.
The vast majority of neurons in the brain are disconnected from each other (Song, Sjöström, Reigl, Nelson, & Chklovskii, 2005; Buzsáki & Mizuseki, 2014; Cossell et al., 2015). Why might this be? Neurons evolved to observe, integrate, and interpret incoming stimuli, so it would seem that more connectivity is never worse for the neuron and may well always be better. But in observations of brains, the connectivity rate between pairs of cortical neurons is about 10% to 20%, even when the two neurons overlap in the same physical region (Varshney, Sjöström, & Chklovskii, 2006; Lefort, Tomm, Floyd Sarria, & Petersen, 2009; Clopath & Brunel, 2013). One resolution to the sparseness paradox is that neurons operate in a competitive environment with space and resource constraints that inherently limit their connectivity (Schröter, Paulsen, & Bullmore, 2017). Alternatively, there may be an advantage to sparse connectivity. For example, if one assumes that neurons are gaussian channels, then the amount of information represented by all-to-all connectivity is not worth the cost of those connections in terms of space and resources (Varshney et al., 2006). The insight that sparseness is an advantage for information was further developed to show that perceptron models that maximize their storage have sparse networks (Brunel, Hakim, Isope, Nadal, & Barbour, 2004; Clopath & Brunel, 2013; Brunel, 2016). The validity of the assumptions of information theory is at best only approximated in the neuronal setting (Silver, 2010).
Here, an alternative approach is developed where the source of the sparseness of connectivity compensates for the loss of information incurred when neurons lose track of their inputs. When signals come into a dendritic arbor, the identity of the sender is lost or at least confused (Spruston, 2008; Jan & Jan, 2010). At the cell body, neurons perform a summation and nonlinear transformation that further decorrelates the output from the input neuron's identity (Larkum, Nevian, Sandler, Polsky, & Schiller, 2009; Krueppel, Remy, & Beck, 2011). Summation tends to drive total weight upward and dramatically decreases the number of distinguishable inputs (Kouh, 2017).
As we show, this indistinguishability of inputs leads to a probability of individual synapse strength that is strongly peaked near zero, and such distributions are actually observed as sparse neural networks (Yoshimura, Dantzker, & Callaway, 2005; Bullmore & Sporns, 2009) in biological systems (Song et al., 2005; Varshney, Chen, Paniagua, Hall, & Chklovskii, 2011) and in silico models (Brunel et al., 2004; Clopath & Brunel, 2013; Brunel, 2016). The model makes four major assumptions (discussed further in Figure 1): First, neurons store information in a Hebbian manner as connection strengths or weights. Second, synaptic inputs are unlabeled, and the cell cannot tell them apart when constructing its output. Third, inputs are linearly combined or summed to control the nonlinear output. Fourth, the total number of distinguishable configurations of neurons is maximized.
2.1 Distinguishable Synapses
Under the Hebbian assumption, each labeling of the weights represented by equation 2.1 corresponds to a potentially different use of the neural network, that is, a memory or some desired dynamic behavior. Regardless of the specific use of the network, it is assumed here that the system maximizes the number of potential states . The more potential configurations of the network there are, the more information can be stored and manipulated. Thus, assuming information storage and manipulation is the primary function of neurons, optimization processes will drive the system to the maximum.
2.2 Indistinguishable Synapses
It seems as though the problem is solved and is nearly trivial: neurons should have Boltzmann distributed total strength. However, this is the total strength of the cell instead of the individual synapse strength.
2.3 Two Synapses
To be clear, refers to the probability that the active inputs sum up to a total weight . What is measured, and is most important for determining neuron behavior, is the pairwise strength. Intuitively, to generate a Boltzmann distribution on the sums of a set of weights, the individual weights should be decreased by some amount. In the simplest example, if all neurons had two active synapses and a total strength of 10 units, then each individual synapse should have a strength of 5 units. The individual parts are necessarily smaller than the whole. However, neurons are not that simple. They have a variable number of active inputs and, as shown above, should tend toward a total average strength that follows a Boltzmann distribution. Thus, the problem devolves into this: Given a Boltzmann distribution of total weights for all cells, how are the individual synapses distributed?
2.4 N Synapses
Finally, this is the primary theoretical result: under the assumptions listed above, the distribution of synapse sizes for large neurons should approach the damped exponential (see equation 2.3) with far more than expected weak connections.
3 Comparison to Experiment
3.1 C. elegans
One experimental test for the predicted functional form of the synapse distribution comes from the C. elegans model system where the entire connectome is available (WormAtlas, Altun, Herndon, Crocker, Lints, & Hall, 2002–2017; Varshney et al., 2011). (See Figure 2.) There are some subtleties involved with applying the theory described here to C. elegans. Most significant, the connectome in C. elegans is essentially static with very little dynamics, so the weight distribution is determined by evolutionary processes.
The first prediction is that the overall connectivity is an exponential; this is shown in Figure 2A. From the full connectome, the total number of synaptic connections onto (or out of) each neuron is calculated and binned into a histogram. This total number of connections is used as a proxy for connection strength. In principle, the distributions may be different for input and output, so both are examined. There is some difficulty because some of the synapses are not clearly input or output (gap junctions), and they are ignored here. Including them as both input and output (not shown) preserves the exponential distribution. The distributions are exponential, supporting the theory that the total weight of any neuron is exponentially distributed. This indicates that evolution has favored genomes that generate neurons with maximally efficient information storage.
The stronger claim is that the individual connections are given by a modified Boltzmann distribution (see equation 2.3). The individual synapse distribution works very well (see Figure 2B). The theory presented here accounts for both the heavy tail and the large number with zero contacts. Note that these data are the union of two different reconstructions, and taking them individually has no effect on the distribution. Also, there is only one distribution because the input and output distributions must be the same. Figures 2B and 2C compare the data to fits of the test functions. Only the delta gausian and the modified Boltzmann forms are capable of capturing the large number of small weights. The modified Boltzmann also captures the correct inflection of the tail. None of the models are able to entirely explain the long tail. Thus, in C. elegans, currently the only complete connectome, the theory presented here is strongly supported.
3.2 Rat Visual Cortical Columns
While complete reconstructions of a cortical column are not available, there are data in which multiple measurements of connection strengths between neurons exists (Song et al., 2005; Sjöström, 2005). This data set applies directly to the strongest claim made here that pairs of neurons will exhibit modified Boltzmann distributions of connection strength. The data are measured by taking pairs of nearby neurons, in vivo, and measuring the sensitivity of one neuron to the stimulation of the other and vice versa. The strength of the connection was thus directly estimated. A histogram of strengths is constructed and fits remarkably well with the theory presented here (see Figure 3 and Table 1). Note that this is not a test of the total strength of a neuron being an exponential since that would require stimulating all the neuron's neighbors.
|Data Set .||Model .||Parameter .|
|Data Set .||Model .||Parameter .|
Note: The numbers in parentheses are 95% confidence intervals from a parametric bootstrap.
Assuming that cells behave as unlabeled integrate-and-fire neurons, the theory presented here predicts that the average total strength of all cells is given by a Boltzmann distribution to maximize their entropy. More interesting, restricting the cells to respond to linear combinations of their inputs implies that the inputs will have a distribution with smaller-than-expected values. For relatively modest assumptions, the strengths of the individual synapses, the inputs, are given by a modified Boltzmann distribution, . Consequently, the average connectivity strength between any two random neurons is expected to strongly favor zero strength and very weak connections. The zero-peaked pairwise neuron connectivity distribution was observed in C. elegans connectome data and electrophysiological data from cortical neurons.
One potential complication with the theory is that dendrites may do much more work than just linearly combine their inputs. Indeed, there is evidence that correlations between neighboring synapses on dendrites provide a local inhibition or nonlinear combination (Sjöström et al., 2008; Polsky, Mel, & Schiller, 2004; Silver, 2010). This adds significant complexity to the calculations presented above. Mathematically, local correlations would make the coefficients in the linear combiner state or time dependent, and numerical solutions would likely be most fruitful. A second complication is that the theory presented here ignores the network structure of the neurons (Song et al., 2005; Russo, Herrmann, & de Arcangelis, 2014). Including network structure would have the greatest impact on the enumeration of possible states. For example, in a fully connected order 3 network (, a triangle), rotational symmetry means that distinguishing labels of a,b,c from b,c,a may not be possible. This would change the overall connectivity between cells from the predicted exponential distribution (Brunel et al., 2004; Newman, 1988), and incorporating such network constraints remains an open problem. Also, the assumption that neurons have a well-defined average number of inputs could be improved on by, for example, switching to a grand canonical ensemble to allow synapse numbers to fluctuate explicitly. Regardless, the basic idea of indistinguishability of inputs remains and would likely still drive connection weights down, and the mean field theory presented here appears to recover at least gross characteristics of experimental systems.
Normalization of the modified Boltzmann distribution requires some careful thought because the integral does not converge on the range 0 to , so some finite cutoff may be introduced. Such a modification would make the distribution look more like the log–normal distribution commonly used to model synapse strength in the literature (Song et al., 2005; Ma, Kohashi, & Carlson, 2013; Buzsáki & Mizuseki, 2014; Cossell et al., 2015). But the theory presented here predicts a distribution rather than treating the problem as mainly phenomenological (Barbour, Brunel, Hakim, & Nadal, 2007): the large number of zero or small connection strengths stems directly from neurons combining their inputs. In addition to the sparse connectivity, the large end of the modified Boltzmann distribution has a fat tail and is eventually much larger than either a log-normal (Cossell et al., 2015) or modified gaussian (Brunel, 2016). Such large outlier connections have been observed in vivo (Lefort et al., 2009; Schröter et al., 2017), and they are partially accounted for here.
Finally, a significant implication of the modified Boltzmann distribution is to resolve the question of why neurons have such sparse connectivity (Yoshimura et al., 2005; Bullmore & Sporns, 2009; Barabási, 2009; Bullmore & Sporns, 2012). Sparse connectivity has been shown to be most efficient in simulations of neural networks where the neurons were allowed to determine their own weights (Brunel et al., 2004; Clopath & Brunel, 2013; Brunel, 2016), and it has been noted that summation tends to drive down connection strengths (Kouh, 2017). At first glance, anything less than potential all-to-all connectivity would seem to restrict the capacity of a neural network because removing a potential contact limits the number of inputs. Paradoxically, then, why are observed neural networks sparse or scale free (Bullmore & Sporns, 2012)? Certainly space constraints would seem to have a significant impact (Varshney et al., 2006; Bullmore & Sporns, 2012; Schröter et al., 2017), but there is some circularity to such assumptions because they assume neurons look the way they do. It is possible that brains could have evolved with all-to-all connectivity given some fantastic shape to neurons or an entirely different scheme for brains. The modified Boltzmann weight with its heavy emphasis on sparse connectivity takes a step back and demonstrates that given cells that linearly combine their inputs, sparsity is the optimal configuration for neural networks.
Appendix: Inverse Laplace Transform
This work was supported in part by NSF grant SMA 1041755 to the Temporal Dynamics of Learning Center, an NSF Science of Learning Center.