## Abstract

Simultaneous recordings from the cortex have revealed that neural activity is highly variable and that some variability is shared across neurons in a population. Further experimental work has demonstrated that the shared component of a neuronal population's variability is typically comparable to or larger than its private component. Meanwhile, an abundance of theoretical work has assessed the impact that shared variability has on a population code. For example, shared input noise is understood to have a detrimental impact on a neural population's coding fidelity. However, other contributions to variability, such as common noise, can also play a role in shaping correlated variability. We present a network of linear-nonlinear neurons in which we introduce a common noise input to model—for instance, variability resulting from upstream action potentials that are irrelevant to the task at hand. We show that by applying a heterogeneous set of synaptic weights to the neural inputs carrying the common noise, the network can improve its coding ability as measured by both Fisher information and Shannon mutual information, even in cases where this results in amplification of the common noise. With a broad and heterogeneous distribution of synaptic weights, a population of neurons can remove the harmful effects imposed by afferents that are uninformative about a stimulus. We demonstrate that some nonlinear networks benefit from weight diversification up to a certain population size, above which the drawbacks from amplified noise dominate over the benefits of diversification. We further characterize these benefits in terms of the relative strength of shared and private variability sources. Finally, we studied the asymptotic behavior of the mutual information and Fisher information analytically in our various networks as a function of population size. We find some surprising qualitative changes in the asymptotic behavior as we make seemingly minor changes in the synaptic weight distributions.

## 1 Introduction

Variability is a prominent feature of many neural systems: neural responses to repeated presentations of the same external stimulus typically vary from trial to trial (Shadlen & Newsome, 1998). Furthermore, neural variability often exhibits pairwise correlations, so that pairs of neurons are more (or less) likely to be co-active than they would be by chance if their fluctuations in activity to a repeated stimulus were independent. These so-called noise correlations (which we also refer to as “shared variability”) have been observed throughout the cortex (Averbeck, Latham, & Pouget, 2006; Cohen & Kohn, 2011), and their presence has important implications for neural coding (Zohary, Shadlen, & Newsome, 1994; Abbott & Dayan, 1999).

If the activities of individual neurons are driven by a stimulus shared by all neurons but corrupted by noise that is independent for each neuron (so-called private variability), then the signal can be recovered by simply averaging the activity across the population (Abbott & Dayan, 1999; Ma, Beck, Latham, & Pouget, 2006). If instead some variability is shared across neurons (i.e., there are noise correlations), naively averaging the activity across the population will not necessarily recover the signal, no matter how large the population (Zohary et al., 1994). An abundance of theoretical work has explored how shared variability can be either beneficial or detrimental to the fidelity of a population code (relative to the null model of only private variability among the neurons), depending on its structure and relationship with the tuning properties of the neural population (Zohary et al., 1994; Abbott & Dayan, 1999; Yoon & Sompolinsky, 1999; Sompolinsky, Yoon, Kang, & Shamir, 2001; Averbeck & Lee, 2006; Cohen & Maunsell, 2009; Cafaro & Rieke, 2010; Ecker, Berens, Tolias, & Bethge, 2011; Moreno-Bote et al., 2014; Nogueira et al., 2020).

*differential correlations*—those that are proportional to the products of the derivatives of tuning functions (see Figure 1a, right)—as particularly harmful to the performance of a population code (Moreno-Bote et al., 2014). While differential correlations are consequential, they may serve as a small contribution to a population's total shared variability, leaving ``nondifferential correlations” as the dominant component of shared variability (Kohn, Coen-Cagli, Kanitscheider, & Pouget, 2016; Montijn et al., 2019; Kafashan et al., 2020).

The sources of neural variability, and their respective contributions to the private and shared components, will have a significant impact on shaping the geometry of the population's correlational structure, and therefore its coding ability (Brinkman, Weber, Rieke, & Shea-Brown, 2016). For example, private sources of variability such as channel noise or stochastic synaptic vesicle release could be averaged out by a downstream neuron receiving input from the population (Faisal, Selen, & Wolpert, 2008). However, sources of variability shared across neurons, such as the variability of presynaptic spike trains from neurons that synapse onto multiple neurons, would introduce shared variability and place different constraints on a neural code (Shadlen & Newsome, 1998; Kanitscheider, Coen-Cagli, & Pouget, 2015). In particular, differential correlations are typically induced by shared input noise (i.e., noise carried by a stimulus) or suboptimal computations (Beck, Ma, Pitkow, Latham, & Pouget, 2012; Kanitscheider et al., 2015).

Past work has examined the contributions of private and shared sources to variability in cortex (Arieli, Sterkin, Grinvald, & Aertsen, 1996; Deweese and Zador, 2004). Specifically, by partitioning subthreshold variability of a neural population into private components (synaptic, thermal, channel noise in the dendrites, and other local sources of variability) and shared components (variability induced by afferent connections), it was found that the private component of the total variability was quite small, while the shared component can be much larger (see Figures 1b and 1c). Thus, neural populations must contend with the large shared component of a neuron's variability. The incoming structure of shared variability and its subsequent shaping by the computation of a neural population is an important consideration for evaluating the strength of a neural code (Zylberberg, Pouget, Latham, & Shea-Brown, 2017).

Moreno-Bote et al. (2014) demonstrated that shared input noise is detrimental to the fidelity of a population code. Here, we instead examine sources of shared variability, which do not necessarily result in differential correlations (they do not appear as shared input noise) and thus can be manipulated by features of neural computation such as synaptic weighting. We refer to these noise sources as “common noise” to distinguish them from the special case of shared input noise (Vidne et al., 2012; Kulkarni & Paninski, 2007). For example, a common noise source could include an upstream neuron whose action potentials are noisy in the sense that they are unimportant for computing the current stimulus. Common noise, because it is manipulated by synaptic weighting, can serve as a source of nondifferential correlations (see Figure 1a, middle), thereby having either a beneficial or a harmful impact on the strength of the population code. We aim to better elucidate the nature of this impact.

We consider a linear-nonlinear architecture (Paninski, 2004; Karklin & Simoncelli, 2011; Pillow, Paninski, Uzzell, Simoncelli, & Chichilnisky, 2005) and explore how its neural representation is affected by both a common source of variability and private noise sources affecting individual neurons independently. This simple architecture allowed us to analytically assess coding ability using both Fisher information (Abbott & Dayan, 1999; Yoon & Sompolinsky, 1999; Wilke & Eurich, 2002; Wu, Nakahara, & Amari, 2001) and Shannon mutual information. We evaluated the coding fidelity of both the linear representation and the nonlinear representation after a quadratic nonlinearity as a function of the distribution of synaptic weights that shape the shared variability within the representations (Adelson & Bergen, 1985; Emerson, Korenberg, & Citron, 1992; Sakai & Tanaka, 2000; Pagan, Simoncelli, & Rust, 2016). We find that the linear stage representation's coding fidelity improves with diverse synaptic weighting, even if the weighting amplifies the common noise in the neural circuit. Meanwhile, the nonlinear stage representation also benefits from diverse synaptic weighting in a regime where common noise may be amplified, but not too strongly. Moreover, we found that the distribution of synaptic weights that optimized the networks performance depended strongly on the relative amount of private and shared variability. In particular, the neural circuit's coding fidelity benefits from diverse synaptic weighting when shared variability is the dominant contribution to the variability. Together, our results highlight the importance of diverse synaptic weighting when a neural circuit must contend with sources of common noise.

## 2 Methods

The code used to conduct the analyses described in this article is publicly available on Github (https://github.com/pssachdeva/neuronoise).

### 2.1 Network Architecture

### 2.2 Measures of Coding Strength

In order to assess the fidelity of the population code represented by $\u2113$ or $r$, we turn to the Fisher information and the Shannon mutual information (Cover & Thomas, 2012). The former has largely been used in the context of sensory decoding and correlated variability (Abbott & Dayan, 1999; Averbeck et al., 2006; Kohn et al., 2016) while the latter has been well studied in the context of efficient coding (Attneave, 1954; Barlow, 1961; Bell & Sejnowski, 1997; Rieke, Warland, de Ruyter van Steveninck, & Bialek, 1999).

### 2.3 Structured Weights

Additionally, we consider cases in which $k$ is of order $N$, for example, $k=N/2$. Allowing $k$ to grow with $N$ ensures that typical values for the weights grow with the population size. This contrasts with the case in which $k$ is a constant, such as $k=4$, which sets a maximum weight value independent of the population size.

### 2.4 Unstructured Weights

## 3 Results

We consider the network's coding ability after both the linear stage $(\u2113)$ and the nonlinear stage $(r)$. In other words, the linear stage can be considered the output of the network assuming each of the functions $gi(\u2113i)$ is the identity. Furthermore, due to the data processing inequality, the qualitative conclusions we obtain from the linear stage should apply for any one-to-one nonlinearity.

### 3.1 Linear Stage

Examining equation 3.2 reveals that increasing the norm of $v$ without changing its direction (that is, without changing $\theta $) will increase the Fisher information, while increasing the norm of $w$ without changing its direction will either decrease or maintain information (since $0\u2264sin2\theta \u22641$). Additionally, if $v$ and $w$ become more aligned while leaving their norms unchanged, the Fisher information will decrease (since $sin2\theta $ will decrease). This decrease in Fisher information is consistent with the observation that alignment of $v$ and $w$ will produce differential correlations. If $v$ and $w$ are changed in a way that modulates both their norm and direction, the impact on Fisher information is less transparent.

To better understand the Fisher information, we impose a parameterized structure on the weights that allows us to increase weight diversity without decreasing the magnitude of any of the weights. This weight parameterization, which we call the structured weights, is detailed in section 2.3. We chose this parameterization for two reasons. First, we desired a scheme in which an increase in diversity must be accompanied by an amplification of common noise. We chose this behavior so that any improvement in coding ability can only be explained by the increase in diversity rather than a potential decrease in common noise. Second, we desired analytic expressions for the Fisher information as a function of population size, which is possible with this form of structured weights.

Under the structured weight parameterization, equations 3.1 and 3.3 can be explored by varying the choice of $k$ for both $v$ and $w$ (we refer to them as $kv$ and $kw$, respectively). It is simplest and most informative to examine these quantities by setting $kv=1$ while allowing $kw$ to vary, as amplifying and diversifying $v$ will only increase coding ability for predictable reasons (this is indeed the case for our network) (Shamir & Sompolinsky, 2006; Ecker et al., 2011). While increasing $kw$ will boost the overall amount of noise added to the neural population, it also changes the direction of the noise in the higher-dimensional neural space. Thus, while we might expect that adding more noise in the system would hinder coding, the relationship between the directions of the noise and stimulus vectors in the neural space also plays a role.

The analytical expressions for the structured regime reveal the asymptotic behavior of the information quantities. Neither quantity saturates as a function of the number of neurons, $N$, except in the case of $kw=1$ (see Figures 3a and 3b). In this regime, increasing the population size of the system also enhances coding fidelity. Furthermore, both quantities are monotonically increasing functions of the common noise synaptic heterogeneity, $kw$ (see Figures 3c and 3d), implying that decoding is enhanced despite the fact that the amplitude of the common noise is magnified for larger $kw$. Our analytical results show linear and logarithmic growth for the Fisher and mutual information, respectively, as one might expect in the case of gaussian noise (Brunel & Nadal, 1998). These qualitative results hold for essentially any choice of $(\sigma S,\sigma P,\sigma C)$.

In the case of $kw=1$, the signal and common noise are aligned perfectly in the neural representation. Thus, the common noise becomes equivalent in form to shared input noise. As a consequence, we observe the saturation of both Fisher information and mutual information as a function of the neural population. This saturation implies the existence of differential correlations, consistent with the observation that information-limiting correlations occur under the presence of shared input noise (Kanitscheider et al., 2015).

The structured weight distribution we described allows us to derive analytical results, but the limitation to only a fixed number of discrete synaptic weight values is not realistic for biological networks. Thus, we use unstructured weights, described in section 2.4, in which the synaptic weights are drawn from a log-normal distribution. In this case, we estimate the linear Fisher information and the mutual information over many random draws according to $wi\u223c\Delta +Lognormal(\mu ,\sigma 2)$. We are primarily concerned with varying $\mu $, as an increase in this quantity uniformly increases the mean, median, and mode of the log-normal distribution (see Figure 3e, inset), akin to increasing $kw$ for the structured weights.

Our numerical analysis demonstrates that increasing $\mu $ increases the average Fisher information and average mutual information across population sizes (see Figures 3e and 3f: bold lines). In addition, the benefits of larger weight diversity are felt more strongly by larger populations (see Figures 3e and 3f: different colors).

In the structured weight regime, our analytical results show that weight heterogeneity can ameliorate the harmful effects of *additional* information-limiting correlations induced by common noise mimicking shared input noise. They do not imply that weight heterogeneity prevents differential correlations, as the common noise in this model is manipulated by synaptic weighting, in contrast with true shared input noise. For unstructured weights, we once again observe that larger heterogeneity affords the network improved coding performance, despite the increased noise in the system. Together, these results show that linear networks could manipulate common noise to prevent it from causing induced differential correlations. However, neural circuits, which must perform other computations that may dictate the structure of the weights on the common noise inputs, can still achieve good decoding performance provided that the circuits' synaptic weights are heterogeneous.

### 3.2 Quadratic Nonlinearity

We next consider the performance of the network after a quadratic nonlinearity $gi(x)=x2$ for all neurons $i$. This nonlinearity has been used in a neural network model to perform quadratic discriminant analysis (Pagan et al., 2016) and as a transfer function in complex cell models (Adelson & Bergen, 1985; Emerson et al., 1992; Sakai & Tanaka, 2000). Furthermore, we chose this nonlinearity because we were able to calculate the linear Fisher information analytically (as an approximation to the Fisher information); see appendix A.3 for a numerical analysis with an exponential nonlinearity. However, the mutual information is apparently not analytically tractable; we performed a numerical approximation using simulated data.

#### 3.2.1 Linear Fisher Information

An analytic expression of the linear Fisher information is calculated in appendix A.1.3. Its analytic form is too complicated to be restated here, but we will examine it numerically for both the structured and unstructured weights. The qualitative behavior of the Fisher information depends on the magnitude of the common variability ($\sigma C$) and private variability ($\sigma P$) in a more complicated fashion than the linear stage, which depends on these variables primarily through their ratio $\sigma C/\sigma P$. Thus, we separately consider how common and private variability affect coding efficacy under various synaptic weight structures.

The information saturation (or growth) for various $kw$ can be understood in terms of the geometry of the covariance describing the neural population's variability. Information saturation occurs if the principal eigenvector(s) of the covariance align closely (but not necessarily exactly) with the differential correlation direction, $f'$, while the remaining eigenvectors quickly become orthogonal to $f'$ as population size increases (Moreno-Bote et al., 2014; see appendix A.2 for more details). When $kw=1$, the common noise aligns perfectly with the stimulus, and so the principal eigenvector of the covariance aligns exactly with $f'$ (as in Figure 1a, right). When $kw>1$, the principal eigenvector aligns closely, but not exactly, with the differential correlation direction. However, when $kw=2$, the remaining eigenvectors become orthogonal quickly enough for information to saturate. This does not occur when $kw>2$. The case of $kw\u223cO(N)$, meanwhile, is slightly different. Here, the variances of the covariance matrix scale with population size, so that the neurons simply exhibit too much variance for any meaningful decoding to occur. However, we believe that it is unreasonable to expect that the synaptic weights of a neural circuit scale with the population size, making this scenario biologically implausible.

When private variability dominates, we observe qualitatively different finite network behavior ($\sigma P=5$; see Figure 4b). For $N=1000$, both $kw=1$ and $kw=2$ exhibit better performance relative to larger values of $kw$ (by contrast, the case with $kw\u223cO(N)$ quickly saturates). We note that, unsurprisingly, the increase in private variability has decreased the Fisher information for all cases we considered compared to $\sigma P=1$ (compare the scales of Figures 4a and 4b). Our main interest, however, is identifying effective synaptic weighting strategies given some amount of private and common variability.

The introduction of the squared nonlinearity produces qualitatively different behavior at the finite network level. In contrast with Figure 3, increased heterogeneity does not automatically imply improved decoding. In fact, there is a regime in which increased heterogeneity improves Fisher information, beyond which we see a reduction in decoding performance (see Figure 4d). If the private variability is increased, this regime shrinks or becomes nonexistent, depending on the population size (see Figure 4e). Furthermore, entering this regime for higher private variability requires smaller $kw$ (i.e., less weight heterogeneity).

The results shown in Figures 4d and 4e imply that there exists an interesting relationship among the network's decoding ability, its private variability, and its synaptic weight heterogeneity $kw$. To explore this further, we examine the behavior of the Fisher information at a fixed population size ($N=1000$) as a function of both $\sigma P$ and $kw$ (see Figure 4c). To account for the fact that an increase in private variability will always decrease the Fisher information, we calculate the *normalized* Fisher information: for a given choice of $\sigma P$, each Fisher information is divided by the maximum across a range of $kw$ values. Thus, a normalized Fisher information allows us to determine what level of synaptic weight heterogeneity maximizes coding fidelity, given a particular level of private variability $\sigma P$.

Figure 4c highlights three interesting regimes. When the private variability is small, the network benefits from larger weight heterogeneity on the common noise. But as the neurons become noisier, the “Goldilocks zone” in which the network can leverage larger noise weights becomes constrained. When the private variability is large, the network achieves superior coding fidelity by having less heterogeneous weights, despite the threat of induced differential correlations from the common noise. Between these regimes, there are transitions for which many choices of $kw$ result in equally good decoding performance.

It is important to point out that Figures 4a to 4e capture only finite network behavior. Therefore, we extended our analysis by validating the asymptotic behavior of the Fisher information as a function of the private noise by examining its asymptotic series at infinity (see Figure 4f). For $kv=1,2$, the coefficient of the linear term is zero for any choice of $\sigma P$, implying that the Fisher information always saturates. In addition, when the common noise weights increase with population size (i.e., $kw\u223cO(N)$), the asymptotic series is always sublinear (not shown in Figure 4f). Thus, there are multiple cases in which the structure of synaptic weighting can induce differential correlations in the presence of common noise. Increased heterogeneity allows the network to escape these induced differential correlations and achieve linear asymptotic growth. If $kw$ becomes too large, however, the linear asymptotic growth begins to decrease. Once $kw$ scales as the population size, differential correlations are once again significant.

To summarize these results, we once again plot the normalized Fisher information (this time, normalized across choices of $\mu $ and averaged over 1000 samples from the log-normal distribution) for a range of private variabilities (see Figure 5c). The heat map exhibits a similar transition at a specific level of private variability. At this transition, a wide range of $\mu $'s provide the network with similar decoding ability. For smaller $\sigma P$, we see behavior comparable to Figure 5a, where there exists a regime of improved Fisher information. Beyond the transition, the network performs better with less diverse synaptic weighting, though it becomes less stringent as $\sigma P$ increases. The behavior exhibited by this heat map is similar to Figure 4c but contains fewer uniquely identifiable regions. This may imply that the additional regions in Figure 4c are an artifact of the structured weights.

We calculated the normalized Fisher information across a range of common noise strengths to determine the optimal synaptic weight distribution. The results for structured weights and unstructured weights are shown in Figures 6c and 6d, respectively. While they strongly resemble Figures 4c and 5c, they exhibit opposite qualitative behavior. As before, there are three identifiable regions in Figure 6c, each divided by abrupt transitions where many choices of $kw$ are equally good for decoding. For small common noise, the coding fidelity is improved with less heterogeneous weights, but as the common noise increases, the network enters the Goldilocks regions. After another abrupt transition near $\sigma C\u22480.34$, the network performance is greatly improved by heterogeneous weights.

Thus, common noise and private noise seem to have opposite impacts on the optimal choice of synaptic weight heterogeneity. When private noise dominates, the Fisher information is maximized under a set of homogeneous weights, since coding ability is harmed by amplification of common noise. When common noise dominates, the network coding is improved under diverse weighting: this prevents additional differential correlations and helps the network cope with the punishing effects on coding due to the amplified noise.

How should we choose the synaptic weight distribution within the extremes of private or common noise dominating? We assess the behavior of the Fisher information as both $\sigma P$ and $\sigma C$ are varied over a wide range. For the structured weights, we calculate the choice of $kw$ that maximized the network's Fisher information (within the range $kw\u2208[1,10]$; see Figure 6e). For the unstructured weights, we calculate the choice of $\mu $ that maximizes the network's average Fisher information over 1000 drawings of $w$ from the log-normal distribution specified by $\mu $ (see Figure 6f).

Figures 6e and 6f reveal that the network is highly sensitive to the values of $\sigma P$ and $\sigma C$. Figure 6e exhibits a bandlike structure and abrupt transitions in the value of $kw$, which maximizes Fisher information. This bandlike structure would most likely continue to form for smaller $\sigma P$ if we allowed $kw>10$. One might expect that the bandlike structure is due to the artificial structure in the weights; however, we see that Figure 6f also exhibits these types of bands. Note that the regime of interest for us is when private variability is a smaller contribution to the total variability than the common variability. When this is the case, Figures 6e and 6f imply that a population of neurons will be best served by having a diverse set of synaptic weights, even if the weights amplify irrelevant signals.

Together, these results highlight how the introduction of the nonlinearity in the network reveals an intricate relationship among the amount of shared variability, private variability, and the optimal synaptic weight heterogeneity. Our observations that the network benefits from increased synaptic weight heterogeneity in the presence of common noise are predicated on the size of the network (see Figures 4a and 4b and 6a and 6b) and the amount of private and shared variability (see Figures 4c, 6c, and 6d). In particular, when shared variability is the more significant contribution to the overall variability, the coding performance of the network benefits from increased heterogeneity, whether the weights are structured or unstructured (see Figures 6e and 6f). This implies that in contrast to the linear network, there exist regimes where increasing the synaptic weight heterogeneity beyond a point will harm coding ability (see Figures 4d and 4e and 5a and 5b), demonstrating that there is a trade-off between the benefits of synaptic weight heterogeneity and the amplification of common noise it may introduce.

#### 3.2.2 Mutual Information

When the network possesses a quadratic nonlinearity, the mutual information $I[s,r]$ is far less tractable than for the linear case. Therefore, we computed the mutual information numerically on data simulated from the network, using an estimator built on $k$-nearest neighbor statistics (Kraskov et al., 2004). We refer to this estimator as the KSG estimator.

We applied the KSG estimator to 100 unique data sets, each containing 100,000 samples drawn from the linear-nonlinear network. We then estimated the mutual information within each of the 100 data sets. The computational bottleneck for the KSG estimator lies in finding nearest neighbors in a $kd$-tree, which becomes prohibitive for large dimensions ($\u223c20$), so we considered much smaller population sizes than in the case of Fisher information. Furthermore, the KSG estimator encountered difficulties when samples became too noisy, so we limited our analysis to smaller values of $(\sigma P,\sigma C)$. Due to these constraints, we are only able to probe the finite network behavior of the mutual information.

Decreasing the private variability increases mutual information (see Figure 7b). However, the network sees a greater increase in information with diverse weighting when $\sigma P$ is small. This is consistent with the small $\sigma P$ regime highlighted in Figure 4c: the smaller the private variability, the more the network benefits from larger synaptic weight heterogeneity. Similarly, decreasing the common variability increases mutual information (see Figure 7c). If the common variability is small enough (e.g., $\sigma C=1$), then larger $kw$ harms the encoding. Thus, when the common noise is small enough, the amplification of noise that results when $kw$ is increased harms the network's encoding. It is only when the common variability becomes the dominant contribution to the variability that the diversification provided by larger $kw$ improves the mutual information.

As for the unstructured weights, we calculated the mutual information $I[s,r]$ over 100 synaptic weight distributions drawn from the aforementioned log-normal distribution. For each synaptic weight distribution, we applied the KSG estimator to 100 unique data sets, each consisting of 10,000 samples. Thus, the mutual information estimate for a given network was computed by averaging over the individual estimates across the 100 data sets. With this procedure, we explored how the mutual information behaves as a function of the private noise variability, common noise variability, and mean of the log-normal distribution.

Thus, these results highlight that there exist regimes where neural coding, as measured by the Shannon mutual information, benefits from increased synaptic weight heterogeneity. Furthermore, similar to the case of the linear Fisher information, the improvement in coding occurs more significantly when shared variability is large relative to private variability.

## 4 Discussion

We have demonstrated in a simple model of neural activity that if synaptic weighting of common noise inputs is broad and heterogeneous, coding fidelity is actually improved despite inadvertent amplification of common noise inputs. We showed that for squaring nonlinearities, there exists a regime of heterogeneous weights for which coding fidelity is maximized. We also found that the relationship between the magnitude of private and shared variability is vital for determining the ideal amount of synaptic heterogeneity. In neural circuits where shared variability is dominant, as has been reported in some parts of the cortex (Deweese & Zador, 2004), larger weight heterogeneity results in better coding performance (see Figure 6e).

Why are we afforded improved neural coding under increased synaptic weight heterogeneity? An increase in heterogeneity, as we have defined it, ensures that the common noise is magnified in the network. At the same time, however, the structure of the correlated variability induced by the common noise is altered by increased heterogeneity. Previous work has demonstrated that the relationship between signal correlations and noise correlations is important in assessing decoding ability; for example, the sign rule states that noise correlations are beneficial if they are of opposite sign as the signal correlation (Hu et al., 2014). Geometrically, the sign rule is a consequence of the intuitive observation that decoding is easier when the noise correlations lie perpendicular to the signal manifold (Averbeck et al., 2006; Zylberberg, Cafaro, Turner, Shea-Brown, & Rieke, 2016; Montijn et al., 2016).

The linear stage of the network constitutes a noisy projection of two signals (one of which is not useful to the network) in a high-dimensional space. Thus, we can assess the entire population by examining the relationship between the projecting vectors $v$ and $w$. We might expect that improved decoding occurs when these signals are farther apart in the $N$-dimensional space (Kanerva, 2009). For a chosen $kv$, this occurs as $kw$ is increased when the weights are structured. When the weights are unstructured, the average angle between the stimulus and weight vectors is large as either $\mu v$ or $\mu w$ increases. Increased heterogeneity implies access to a more diverse selection of weights, thus pushing the two signals apart. From this perspective, the nonlinear stage acts as a mapping on the high-dimensional representation. Given that no noise is added after the nonlinear processing stage in the networks, if the nonlinearities were one-to-one, the data processing inequality would ensure that the results from the linear stage would hold. But as we observed earlier, the nonlinear stage benefits from increased heterogeneity only in certain regimes. Thus, the behavior of the nonlinearity is important: the application of the quadratic nonlinearity restricts the high-dimensional space that the neural code can occupy, and thus limits the benefits of diverse synaptic weighting. Validating and characterizing these observations for other nonlinearities (such as an exponential nonlinearity or a squared rectified linear unit) and within the framework of a linear-nonlinear-Poisson cascade model will be interesting to pursue in future studies. For example, we performed a simple experiment numerically assessing the behavior of the linear Fisher information under an exponential nonlinearity. We observed that synaptic weight heterogeneity benefits coding, but information may saturate for a wide range of $kw$ (see appendix A.3). Thus, the choice of nonlinearity may affect the coding performance in the presence of common noise.

In this work, we considered the coding ability of a network in which a stimulus is corrupted by a single common noise source. However, cortical circuits receive many inputs and must likely contend with multiple common noise inputs. Thus, it is important to examine how our analysis changes as the number of inputs increases. Naively, the neural circuit could structure weights to collapse all common noise sources on a single subspace, but this strategy will fail if the circuit must perform multiple tasks (e.g., the circuit may be required to decode among many of the inputs using the same set of weights). Furthermore, there are brain regions in which the dimensionality is drastically reduced, such as cortex to striatum (10 to 1 reduction) or striatum to basal ganglia (300 to 1 reduction; Bar-Gad, Morris, & Bergman, 2003; Seger, 2008). In these cases, the number of inputs may scale with the size of the neural circuit. In such an underconstrained system, linear decoding will be unable to properly extract estimates of the relevant stimulus. This implies that linear Fisher information, which relies on a linear decoder, may be insufficient to judge the coding fidelity of these populations. Thus, future work could examine how the synaptic weight distribution affects neural coding with multiple common noise inputs. This includes the case when the number of common noise sources is smaller than the population size or when they are of similar scale, the latter of which may require alternative coding strategies (Davenport, Duarte, Eldar, & Kutyniok, 2012; Garfinkle & Hillar, 2019).

It may seem unreasonable that the neural circuit possesses the ability to weight common noise inputs. However, excitatory neurons receive many excitatory synapses in circuits throughout the brain. Some subset of common inputs across a neural population will undoubtedly be irrelevant for the underlying neural computation, even if these signals are not strictly speaking “noise” and could be useful for other computations. Thus, these populations must contend with common noise sources contributing to their overall shared variability and potentially hampering their ability to encode a stimulus. Our work demonstrates that neural circuits, armed with a good set of synaptic weights, need not suffer adverse impacts due to inadvertently amplifying potential sources of common noise. Instead, broad, heterogeneous weighting ensures that common noise sources will project the signal and noise into a high-dimensional space in such a way that is beneficial for decoding. This observation is in agreement with recent work that explored the relationship between heterogeneous weighting and degrees of synaptic connectivity (Litwin-Kumar, Harris, Axel, Sompolinsky, & Abbott, 2017). Furthermore, synaptic input, irrelevant on one trial, may become the signal on the next: heterogeneous weighting provides a general, robust principle for neural circuits to follow.

We chose the simple network architecture in order to maintain analytic tractability, which allowed us to explore the rich patterns of behavior it exhibited. Our model is limited, however. It is worthwhile to assess how our qualitative conclusions hold with added complexity in the network. For example, interesting avenues to consider include the implementation of recurrence, spiking dynamics, and global fluctuations. In addition, these networks could also be equipped with varying degrees of sparsity and inhibitory connections. Importantly, the balance of excitation and inhibition in networks has been shown to be vital in decorrelating neural activity (Renart et al., 2010). Past work has explored how to approximate both information-theoretic quantities studied here in networks with some subset of these features (Beck, Bejjanki, & Pouget, 2011; Yarrow, Challis, & Seriès, 2012). Thus, analyzing how common noise and synaptic weighting interact in more complex networks is of interest for future work.

We established correlated variability structure in the linear-nonlinear network by taking a linear combination of a common noise source and private noise sources (though our model ignores any noise potentially carried by the stimulus). This was sufficient to establish low-dimensional shared variability observed in neural circuits. As a consequence, our model as devised enforces stimulus-independent correlated variability. Recent work, however, has demonstrated that correlated variability is in fact stimulus dependent. Such work used both phenomenological (Lin, Okun, Carandini, & Harris, 2015; Franke et al., 2016) and mechanistic (Zylberberg et al., 2016) models in producing fits to the stimulus-dependent correlated variability. These models all share a doubly stochastic noise structure, stemming from both additive and multiplicative sources of noise (Goris, Movshon, & Simoncelli, 2014). It is therefore worthwhile to fully examine how both additive and multiplicative modulation interact with synaptic weighting to influence neural coding. For example, Arandia-Romero et al. (2016) demonstrated that such additive and multiplicative modulation, modulated by overall population activity, can redirect information to specific neuronal assemblies, increasing information for some but decreasing it for others. Synaptic weight heterogeneity, attuned by plasticity, could serve as a mechanism for additive and multiplicative modulation, thereby gating information for specific assemblies.

## A Appendix

### A.1 Calculation of Fisher and Mutual Information Quantities

#### A.1.1 Calculation of Fisher Information, Linear Stage

#### A.1.2 Calculation of Mutual Information, Linear Stage

#### A.1.3 Calculation of Linear Fisher Information, Quadratic Nonlinearity

### A.2 Information Saturation and Differential Correlations

In section 3.2.1, we observed that the Fisher information saturates in particular instances of the nonlinear network. Specifically, for the nonlinear network, Fisher information saturates for $kw=1$ and $kw=2$, but not for $kw>3$. Additionally, Fisher information saturates for $kw\u223cO(N)$. To understand why we observe saturation in some cases and not others, it is helpful to examine the eigenspectrum of the covariance matrix $\Sigma $ describing the neural responses. Here, we rely on an analysis in the supplement of Moreno-Bote et al. (2014).

### A.3 Linear Fisher Information under an Exponential Nonlinearity

## Acknowledgments

We thank Ruben Coen-Cagli for useful discussions. P.S.S. was supported by the Department of Defense through the National Defense Science and Engineering Graduate Fellowship Program. J.A.L. was supported through the Lawrence Berkeley National Laboratory-internal LDRD “Deep Learning for Science” led by Prabhat. M.R.D. was supported in part by the U.S. Army Research Laboratory and the U.S. Army Research Office under Contract No. W911NF-13-1-0390.