## Abstract

The activity of neurons is correlated, and this correlation affects how the brain processes information. We study the neural circuit mechanisms of correlations by analyzing a network model characterized by strong and heterogeneous interactions: excitatory input drives the fluctuations of neural activity, which are counterbalanced by inhibitory feedback. In particular, excitatory input tends to correlate neurons, while inhibitory feedback reduces correlations. We demonstrate that heterogeneity of synaptic connections is necessary for this inhibition of correlations. We calculate statistical averages over the disordered synaptic interactions and apply our findings to both a simple linear model and a more realistic spiking network model. We find that correlations at zero time lag are positive and of magnitude , where *K* is the number of connections to a neuron. Correlations at longer timescales are of smaller magnitude, of order *K*^{−1}, implying that inhibition of correlations occurs quickly, on a timescale of . The small magnitude of correlations agrees qualitatively with physiological measurements in the cerebral cortex and basal ganglia. The model could be used to study correlations in brain regions dominated by recurrent inhibition, such as the striatum and globus pallidus.

## 1. Introduction

Simultaneous measurements of the activity of multiple neurons have shown significant correlations, and this observation has stimulated the debate on whether and how correlations contribute to neural computation. In principle, correlations allow robust signal processing, because redundancies across neurons can be exploited to separate the signal from the noise (Abbott & Dayan, 1999; Panzeri, Schultz, Treves, & Rolls, 1999). Experimental studies of the cerebral cortex suggest that correlations improve decoding of stimuli (Graf, Kohn, Jazayeri, & Movshon, 2011), but it remains unclear whether a parsimonious decoder should rely on correlations (Averbeck & Lee, 2003). A challenge to this hypothesis is the observation that correlations are reduced when animal subjects are actively engaged in discrimination (Cohen & Newsome, 2008; Cohen & Maunsell, 2009), and even when they simply start a movement (Poulet & Petersen, 2008). In addition, neurons with similar responses to stimuli show higher correlations (Zohary, Shadlen, & Newsome, 1994; Lee, Port, Kruse, & Georgopoulos, 1998; Maynard et al., 1999; Bair, Zohary, & Newsome, 2001; Constantinidis & Goldman-Rakic, 2002; Averbeck & Lee, 2003; Romo, Hernandez, Zainos, & Salinas, 2003; Kohn & Smith, 2005; Smith & Kohn, 2008; Huang & Lisberger, 2009; Ecker et al., 2010; Komiyama et al., 2010), implying that coding of stimuli should be worsened by correlations (Abbott & Dayan, 1999; Panzeri et al., 1999; Sompolinsky, Yoon, Kang, & Shamir, 2001; Wilke & Eurich, 2002; Averbeck, Latham, & Pouget, 2006; Gutniski & Dragoi, 2008). Another caveat is that the neural code is largely unknown, and if the noise measured in physiological studies encodes some signal, then any correlation would decrease the available information (Nadal & Parga, 1994).

Besides the possible function of correlations in signal and information processing, their physiological causes remain unclear. It has been shown that the correlation between nearby neurons is driven by their correlated synaptic input (Lampl, Reichova, & Ferster, 1999; Poulet & Petersen, 2008). However, a quantitative understanding of the circuit mechanisms regulating correlations between cortical cells is still missing, and the goal of this study is to determine the dependence of correlations on different properties of the neural circuitry. The measured correlation between neurons depends on different factors and varies across studies (Cohen & Kohn, 2011): it increases with the proximity of neuron pairs (Maynard et al., 1999; Constantinidis & Goldman-Rakic, 2002; Smith & Kohn, 2008; Ecker et al., 2010; Komiyama et al., 2010), their activity (de la Rocha, Doiron, Shea-Brown, Josic, & Reyes, 2007), and the temporal window on which action potentials are counted (Bair et al., 2001; Reich, Mechler, & Victor, 2001; Constantinidis & Goldman-Rakic, 2002; Averbeck & Lee, 2003; Kohn & Smith, 2005; Smith & Kohn, 2008; Huang & Lisberger, 2009; Mitchell, Sundberg, & Reynolds, 2010). Figure 1 shows the correlation measured in eight different studies as a function of temporal window for spike counts. Results vary, although correlations are generally found positive and of small magnitude in both the cortex and the basal ganglia (Raz, Vaadia, & Bergman, 2000).

Previous modeling studies of neural circuits have found that the mean correlation between neurons is small, of the order of *N*^{−1}, where *N* is the number of neurons in the network. Small correlations have been observed, not surprising, in networks characterized by weak connection strengths (Ginzburg & Sompolinski, 1994; Bernacchia & Amit, 2007). More surprising, the same result has been obtained in the case of strong connections, such as the high-conductance state (Destexhe, Rudolph, & Paré, 2003; van Vreesvijk & Sompolinski, 1996), provided that the network includes a strong inhibitory feedback (Renart et al., 2010; Hertz, 2010; Tetzlaff, Helias, Einevoll, & Diesmann, 2012). Here we provide an analytical study of correlations in a simple linear model, and we apply our findings to predict correlations in a more realistic spiking network model. We confirm both the observed small correlation and the crucial effect of the inhibitory feedback in reducing it. In addition, we study the effect of the heterogeneity of connection strengths by using random matrix theory and a diagrammatic formalism, and we show that inhibition of correlations crucially depends on such heterogeneity. The model can be compared to brain regions dominated by recurrent inhibition, such as the striatum and globus pallidus.

We find that correlations at zero time lag are of magnitude , where *K* is number of connections received by a neuron, while correlations of the activity integrated across time are of order *K*^{−1}, suggesting that inhibition of correlations operates on a timescale of . These results are consistent with previous modeling studies, suggesting that a linear approximation is adequate to predict correlations in more realistic spiking models (Renart et al., 2010). In addition, our findings highlight the difference between the effect of the number of neurons *N* versus the number of connections *K* on correlations. The small correlations predicted by this and previous modeling studies qualitatively match the small correlations observed in neurons of the cerebral cortex and basal ganglia.

## 2. Methods

We consider both a linear model and a more realistic spiking network model. In both models, we consider a neural circuit of *N* neurons, receiving input from *N _{ext}* external neurons, where each neuron integrates the signal from other neurons weighted by the synaptic connection strength.

*x*is the activity of neuron

_{i}*i*in the local circuit and

*G*is the strength of the synaptic connection from neuron

_{ij}*j*to neuron

*i*. The external (feedforward) input to the circuit is provided by the activities

*x*, and the synaptic connection from the

^{ext}_{j}*j*th external neuron to the

*i*th local neuron is given by the strength

*G*. All neuronal activities evolve in time, while the connectivity matrices

^{ext}_{ij}*G*and

*G*are fixed.

_{ext}We define the average number of local connections received by a neuron as *K* and the external connections as *K _{ext}*. We assume that the connectivity matrices are random, which makes the network akin to a disordered system, characterized by a random but fixed substrate. We consider two scenarios, represented schematically in Figures 2a and 2b:

- The network is fully connected (
*K*=*N*,*K*=_{ext}*N*) with random connection strengths (all-to-all; see Figure 2a), characterized by a gaussian distribution. The mean and variance of matrix elements are determined by the parameters_{ext}*g*and for the local connections and*g*and for the external connections:_{ext} - The network is sparse; only a fraction of connections exists (
*k*=*K*/*N*,*k*=_{ext}*K*/_{ext}*N*), and the others are set to zero (sparse; see Figure 2b). Connections are selected at random but of constant strength, equal to for recurrent connections and for external connections. The distribution is Bernouillian. The mean and variance of matrix elements are_{ext}

*K*,

*N*: In the all-to-all network,

*k*=1 and

*K*=

*N*. In the sparse network, for convenience of notation, we use the parameters and . The mean connection is negative for

*G*(inhibitory) and positive for

*G*(excitatory), since

_{ext}*g*and

*g*are positive. Note that the connections are strong in the sense that the magnitude of the excitatory and inhibitory input to each neuron, which is of order , is much larger than their sum (which is of order one; see section 3).

_{ext}Theoretical analysis also considers the case in which local connections can be either excitatory or inhibitory, with two separate populations of excitatory and inhibitory neurons. In general, the analysis considers the case in which the mean and variance of the synaptic strength depend on the presynaptic neuron. We discuss in appendix B that all theoretical results hold provided that the parameters *g* and are substituted by the means across presynaptic neurons. However, we do not show results of simulations for that case since that is outside the scope of this letter.

We assume that the external activity **x**_{ext}(*t*) is a stochastic process uncorrelated in both space and time, that is, a white noise characterized by mean and covariance (the overbar denotes the average over different realizations of the noise, and denotes either the discrete Kronecker or continuous Dirac function). Therefore, equation 2.1 corresponds to a Ornstein-Uhlenbeck stochastic process (Gardiner, 1985).

We test theoretical results by running numerical computer simulations of the linear model. We simulate the dynamics of equation 2.1 with a simple Euler integration method, where each simulation runs for 200,000 time steps and each time step is 0.002 . For each set of parameter values, we use a single realization of the external input noise and a single realization of the random connectivity matrix. Since a simulation runs for a long time and the network is composed of a large number of neurons, we do not expect those specific realizations to affect the results significantly (in other words, we expect the system to be ergodic and self-averaging). We calculate sample mean and covariance by averaging across all time steps of a simulation. The correlation is calculated for each neuron pair by the standard Pearson's formula. Finally, we calculate the spatial mean and variance of those quantities across neurons (for the temporal mean) or across neuron pairs (for the covariance and correlation). We also use a semianalytic control by applying the spatial mean and variance on, instead of the full simulation, the theoretical results following the average over temporal noise but preceding the average over spatial noise, namely, equations A.3 and A.9. The corresponding results are represented by filled symbols in the figures, while results of the full simulations are represented by open symbols.

*dt*=0.02 ms. Equation 2.8 is similar to equation 2.1 of the linear model and describes the dynamics of the total current

*I*received by neuron

_{i}*i*—both the external excitatory and the recurrent inhibitory input (both types of input are integrated according to the same time constant ). The matrices describing the synaptic strengths,

*G*and

*G*, are defined in the same way as in the case of the linear model, in the fully connected case (see equations 2.2 and 2.3), although in the spiking model, those matrices are given in units of 8 nAms. In those units, the parameters are

_{ext}*g*=1, ,

*g*=1, . The variable

_{ext}*S*(

_{i}*t*) describes whether neuron

*i*emits an action potential at time

*t*or not, respectively,

*S*(

_{i}*t*)=1/

*dt*or

*S*(

_{i}*t*)=0. Equation 2.9 describes the dynamics of the membrane potential

*V*of neuron

_{i}*i*, which integrates linearly the total current according to the capacitance

*C*and conductance

_{m}*g*of the membrane, where

_{m}*V*is the resting potential. If the membrane potential

_{L}*V*exceeds the threshold potential

_{i}*V*at time

_{th}*t*, it is set to the reset potential

*V*and an action potential is emitted (

_{rs}*S*(

_{i}*t*)=1/

*dt*). The variable

*S*(

^{ext}_{j}*t*) describes the action potentials emitted by the external neurons. Their activity is modeled by a Poisson process characterized by an emission rate , which is constant in time and equal for all external neurons. Parameters used in simulations are ms,

*V*=−50 mV,

_{th}*V*=−70 mV,

_{rs}*V*=−70 mV, Hz,

_{L}*C*=0.4 nF, and

_{m}*g*=20 nS (the time constant of the membrane potential is

_{m}*C*/

_{m}*g*=20 ms).

_{m}We run 20 s simulations for different values of the network size, from *N*=50 to *N*=1000, with all other parameters fixed, each simulating 20 s of network activity (10^{6} time steps). For a given network size, *N*=200, we run additional five simulations at different values of the external input, from Hz to Hz. We use those five simulations to determine the change of the total current as a function to the change in . This change is approximately linear and is quantified in terms of the four statistics studied in this work; the mean, the spatial variance, the temporal variance, and the covariance (see equations 3.1, 3.2, 3.4, and 3.5). All statistics are calculated with respect to the currents measured at the reference external input, Hz. For example, the spatial variance is calculated by recording the steady current for each neuron at the reference value and looking at the distribution across neurons of the difference between the reference current and the steady currents measured for the other values of the external input. Linear regression is applied to fit the linear change of the four statistics as a function of the external input, and the effective parameters () are determined by inverting the equations of the four statistics given by the linear model—equations 3.1, 3.2, 3.4, and 3.5—where and are the mean and variance of the change in external rate ( and ). The effective parameters are used in equation 3.6 to predict correlations in the spiking model at variable network size.

## 3. Results

We study neural activity and correlations among neurons in a heterogeneous neural circuit model. Local recurrent connections are dominated by inhibition, while external feedforward projections are excitatory. Results are shown for a simple linear model, and at the end of the section, we also include simulations of a more realistic spiking network model. For the linear model we show the results of both theory and simulations, and we conclude by showing that the theory developed for the simple linear model can be used to predict correlations in the spiking network.

We consider two types of circuits: all-to-all connectivity with random strengths (see Figure 2a) and sparse random connections of fixed strengths (see Figure 2b). Results are displayed in a single notation for either case (see section 2). Figures 2c and 2d show the distribution of activity across neurons, and Figures 2e and 2f shows the distribution of correlations across neuron pairs. The purpose of this work is to describe how the mean and variance of those distributions depend on the parameters of the neural circuit. The activity values *x* are interpreted as deviations from a steady state of the input currents to each neuron, around which the neural dynamics is approximately linear. If we denote the steady current as *I*_{0}, the input current is equal to *I*=*I*_{0}+*x*. As long as the linear approximation is valid, the correlations observed in the model are insensitive to the nature of the steady state (i.e., to the value of *I*_{0}).

*N*is large, this is independent of the specific realization of the connectivity; therefore we perform its average over the distribution of connections, and we obtain (see equation A.5 in appendix A; angular brackets denote averaging over the random connectivity, overline denotes temporal average) The numerator of this expression is equal to the mean excitatory input received by a neuron, , while the denominator expresses the recurrent inhibition, whose total postsynaptic strength is . Therefore, the strong recurrent inhibition counterbalances the large excitatory input and determines a relatively low activity, regardless of the network size. Note that the numbers of local and external connections,

*K*and

*K*, are both large, but they tend to balance in the expression above. Figure 3a shows an example of mean activity as a function of the number of connections. The mean activity is rather insensitive to the number of connections, which are taken equal to the external ones in each simulation (

_{ext}*K*=

*K*). The analytical result, equation 3.1, agrees with numerical simulations of the linear dynamics in both the all-to-all and the sparse networks.

_{ext}*x*=0 destabilizes, and the linear approximation fails (see appendix A). Figure 3b shows an example of the spatial variance as a function of the variability of the recurrent connections. The analytical result, equation 3.2, agrees with numerical simulations of the linear dynamics in both the all-to-all and the sparse networks.

*K*

^{−1/2}, while the second term remains finite (the factor is close to one; see equation A.13). The first term indicates that recurrent inhibition (

*g*) reduces temporal fluctuations. In fact, the inhibitory feedback not only reduces the mean activity (see equation 3.1), but also cuts down fluctuations by quickly counterbalancing the external excitatory input. This can be verified by calculating the instantaneous covariance between the external excitatory and the local inhibitory input, which is found large and negative, equal to . The second term implies that nonzero fluctuations arise even in large networks (large

*K*), and inhibition cannot exert an instantaneous and exact balance for each neuron. However, fluctuations nearly vanish if the external input is homogeneous (), in which case the inhibitory feedback would definitively counterbalance the homogeneous drive. Furthermore, as in the case of spatial fluctuations, temporal fluctuations increase with the heterogeneity of connections, recurrent () and external (). Temporal fluctuations diverge when the network approaches the instability point, when the linear approximation fails ().

*K*, the correlated variance vanishes. The activities of neuron pairs tend to covary due to their shared external input, but the recurrent inhibition makes the covariance small, of order

*K*

^{−1/2}. In the sparse network, the covariance vanishes if the probability of external connections is small (), since the shared external input between neurons tends to zero in that case. In the all-to-all network, the mean covariance vanishes if the mean input connection is zero (

*g*=0). In that case, even if neurons receive a shared external input, neuron pairs may weight different inputs with the same or opposite signs, leading to, respectively, positive or negative covariance. Therefore, while the mean covariance across neuron pairs is zero, the covariance of single pairs may be positive or negative.

_{ext}*K*

^{−1/2}, despite the strong and shared excitatory input between neurons. However, this result holds only in presence of the local recurrent inhibition (

*g*>0) and provided that external connections are heterogeneous (). Heterogeneity of local connections () also contributes in decreasing the correlation.

Therefore, the inhibitory feedback and the random connectivity are responsible for the small correlation. If the inhibitory feedback is removed, *g*=0, the correlation becomes large. If the network heterogeneity is removed, , the correlation is equal to one, because the network is homogeneous and all neurons get the same input ( when ; see equation A.13). Figure 4 shows an example of the mean correlation as a function of the number of connections and the heterogeneity of the network. The analytical result, equation 3.6, agrees with numerical simulations of the linear dynamics in both the all-to-all and the sparse network. Insets in Figure 4 show the standard deviation of correlations, which appear to decrease with the number of connections as *K*^{−1/2} and to increase with the heterogeneity of the network.

The final issue that we address is the timescale of correlations. Neural activity integrates the input on multiple timescales because of the large number of neurons and the heterogeneity of their connections. Which timescales are responsible for correlations? What correlations characterize the activity integrated in time? Note that the mean correlation in equation 3.6 and Figure 4 is the correlation at zero lag; the instantaneous correlation. We investigate the timescale of correlations in Figure 5, where the cross-correlation of neural activity is shown at different time lags. The correlation has a peak at zero lag and shows an exponential decay in time. As we have shown in Figure 4, the correlation at zero lag decreases with the number of connections as *K*^{−1/2}. Figure 5 shows that the timescale of correlation, determining its rate of decay, also decreases with the number of connections. In fact, we show in appendix A that the integrated correlation across all time lags, namely, the total area of the cross-correlation, is of magnitude *K*^{−1}. Since the total area is approximately equal to correlation peak times temporal width, the temporal width is of order *K*^{−1/2}. Therefore, inhibition decorrelates on a fast timescale, and integrating neural activity, even for a relatively short time, has the effect of further decreasing the magnitude of correlations (see equation A.22).

It is worth noting that while in other studies, the results are often described in terms of the number of neurons *N*, here both *N* and the number of connections *K* play a role. For the sparse network, it is interesting to note that all the above results depend on the number of neurons *N* only through the parameter , because *N* affects the sparsity of connections and therefore also their variance. The order of magnitude of correlations holds regardless of the number of neurons, which may be taken even infinite for any fixed value of *K*. However, the dependence of correlations on the heterogeneity , and therefore *N*, may be quite substantial. Figure 6 shows how the mean correlation varies as a function of either *K* or *N*: for a relatively weak inhibition, the mean correlation depends mostly on the number of connections *K*, while for stronger inhibition, the mean correlation depends mostly on the number of neurons *N*.

### 3.1. Spiking Network Simulations.

We tested the predictions of the linear model in a more realistic spiking network, described by a current-based integrate-and-fire model (see section 2). The spiking network is characterized by the nonlinear dynamics inherent in the generation of action potentials. However, we tested the hypothesis that this nonlinear system, when displaying small fluctuations around a steady state, may be approximated by a linear system and therefore by the equations derived in the previous section. Figure 7 shows the dynamics of an example neuron's input current and membrane potential, and spike times (rasters) of that neuron and other neurons from the spiking network. Simulation results reproduce qualitatively the phenomenology observed in the cerebral cortex. Since the input current puts neurons close to the firing threshold, neurons are susceptible to noise and fire irregularly, with noisy spike emission times (Shadlen & Newsome, 1998). The distribution of firing rates across neurons is broad, with a higher proportion of neurons displaying low firing rates (Baddeley et al., 1997; Hromadka, DeWeese, & Zador, 2008).

Our goal is not to provide a formal theory for the linear approximation of a spiking model. Instead, we show the results of a quantitative comparison of the two models, and we briefly summarize the theoretical arguments underlying this comparison. Since all results of the linear model are stated in terms of the mean and variance of the synaptic matrix, we hypothesize that the linear response of the spiking network can be described by an effective set of synaptic parameters . For a given network size (*N*=200), we probed the response of the spiking network to small changes in the external input, and we used these linear responses to fit the effective parameters of the connectivity. Then we used these parameters to predict the mean correlation across a wide range of network sizes (from *N*=50 to *N*=1000), without fitting any additional parameter.

Figures 8a to 8d shows the change in the total current (excitatory and inhibitory) as a consequence of the change in the external input rate. This change is approximately linear. The figure shows four statistics of the current: the change in mean current (see Figure 8a), spatial variance (see Figure 8b), temporal variance (see Figure 8c), and covariance (see Figure 8d). In the linear model, those quantities are calculated, respectively, in equations 3.1, 3.2, 3.4, and 3.5, where *x _{ext}* corresponds to the change in external input rate. We use linear regression to fit the slopes in Figures 8a to 8d and invert the equations of the linear model to obtain the effective values of the parameters of the connectivity (see section 2). The fitted values are

*g*=2.44, ,

*g*=1.99, , to be compared with those used to generate the synaptic matrices in the spiking model (see section 2;

_{ext}*g*=1, ,

*g*=1, in units of 8 nAms). Then we use these values to predict the mean correlation across a wide range of network sizes by using the formula for the linear model, equation 3.6. The result is plotted in Figure 8e, showing a remarkable agreement with theory. The fact that the theory provides a good fit for

_{ext}*N*=200 is obvious, since parameters are fit at that specific network size (although in a different simulation). However, the good fit across a wide range of network sizes suggests that equations of the linear model provide a good instrument to probe spiking networks.

The theoretical arguments underlying the above analysis are based on the dynamics of the total current integrated by the membrane potential of a neuron—both the inhibitory recurrent and the excitatory external current (see Figure 7 top). This dynamics is described by equation 2.8 in section 2, which is similar to the equation describing the linear system, equation 2.1. In the spiking network, the external stimulus is characterized by a sum over *N _{ext}* independent Poisson spike trains weighted by the matrix

*G*. If

^{ext}*N*is large enough, this is approximately equal to a gaussian white noise process (Amit & Tsodyks, 1991) and therefore is equivalent to the external input of the linear model. The main difference between the linear and spiking model is the input from neurons in the recurrent network, determined by the spike trains

_{ext}*S*(

_{i}*t*). Those spike trains are nonlinear and noninstantaneous functions of the input currents and also provide an additional source of noise due to the discrete spike times. Nevertheless, we found that the parameters of an effective linear system, determined by the linear response of the spiking network, are able to well predict the correlations.

## 4. Discussion

We found that inhibitory feedback and heterogeneous connections have important effects on the dynamics of the activity in a neural circuit. The strong excitatory input, shared between neurons, tends to drive the network to a highly active and correlated state. The inhibitory feedback is responsible for balancing the network activity and also for reducing temporal fluctuations, in particular, the correlated fluctuations across neurons. The heterogeneity of couplings plays a crucial role in reducing correlations, since homogeneous connections would determine homogeneous and therefore highly correlated activity. As a consequence, the observed mean correlation is positive and of small magnitude. The fact that mean correlation is positive is obvious, since neurons in a large population cannot be anticorrelated on average.^{1} What is not obvious is that the mean correlation is of small magnitude.

The main contribution of our work is an analytical calculation of the effect of heterogeneity on correlations in terms of random connectivity or random synaptic strengths. In the presence of heterogeneous connections and inhibitory feedback, the mean correlation at zero time lag is small, and it decreases with the number of connections as . The mean correlation integrated on large timescales is even smaller, of order *K*^{−1}, indicating that inhibition downsizes correlations on a timescale of . Other modeling studies have addressed the issue of correlations in neural circuits. In previous studies (Ginzburg & Sompolinski, 1994; Hertz, 2010; Renart et al., 2010; Tetzlaff et al., 2012), the mean correlation was found to decrease with the number of neurons as *N*^{−1}. Ginzburg and Sompolinski (1994) and Hertz (2010) studied a network in which connections strengths are of order *N*^{−1}, which implies a weak interaction between neurons and therefore a weak correlation. More surprisingly, Renart et al. (2010) and Tetzlaff et al. (2012) found weak correlations even in the case of strong interactions, with connection strengths of order *N*^{−1/2}. Both studies found a mean correlation of magnitude *N*^{−1}, provided that the correlation is integrated across large temporal windows. However, Renart et al. (2010) show that the mean correlation at zero time lag of membrane currents is of magnitude *N*^{−1/2}. These results are consistent with our findings (compare Figure 5 with Figure 2E of Renart et al., 2010, and Figure 3D of Tetzlaff et al., 2012). However, we highlight a potential difference by noting that the main parameter affecting correlations may be the number of connections *K* rather than the number of neurons *N*. The exclusive contribution of these two parameters has not been studied in detail in previous studies, and we have shown that it may depend on the network regime.

We also studied a more realistic spiking network model, and we confronted the analytical solutions of the linear model with the simulations of the nonlinear spiking model. We looked at small, linear changes in the current as a function of changes in the external firing rate input to the spiking network and computed the effective parameters of the linear model able to explain those changes. We found that those effective linearized parameters are able to predict correlations accurately even when changing the network size significantly. This suggests that the linear approximation is adequate for studying correlations. We did not consider the problem of a complete linear theory of spiking models, which would address the issues of computing the linearized kernel and the effective interaction matrix. Those issues have been studied, for example, in Lindner, Doiron, and Longtin (2005), Trousdale, Hu, Shea-Brown, and Kresimir (2012), and Tetzlaff et al. (2012).

The small mean correlation observed in our study and previous modeling studies agrees qualitatively with the experimental observations. The model may be useful to investigate correlations in different brain areas, especially those dominated by inhibition, such as the striatum and globus pallidus. Interestingly, large correlations between pallidal neurons have been shown to be correlated with Parkinsonism (Raz et al., 2000; Wilson, Beverlin, & Netoff, 2011). The model is also consistent with the strong anticorrelations between excitatory and inhibitory inputs observed experimentally (Okun & Lampl, 2008; Cafaro & Rieke, 2010; Salinas & Sejnowski, 2000). However, different experimental studies report quantitative differences in measured correlations. For example, correlations depend on the temporal window on which action potentials are counted to determine a neuron's firing rate (see Figure 1). While neural dynamics occurs on a variety of timescales in our model,^{2} as well as in real neurons (Bernacchia, Seo, Lee, & Wang, 2011), additional modeling studies are necessary to capture the wide range of phenomena observed in the experimental measures of correlations, including the effects of distance between neurons, multiple timescales and firing activity (Cohen & Kohn, 2011).

## Appendix A:. Statistics of Random Networks

**x**is the vector of local neural activities,

**x**

_{ext}is the vector of external neural activities, the matrices

*G*(of size ) and

*G*(size ) express respectively the recurrent connections and the feedforward projections, and

_{ext}*I*is the identity matrix. The equation of dynamics is linear and, given the interaction matrices

*G*,

*G*and the input signal

_{ext}**x**

_{ext}, the neural activity can be expressed as a sum over the external input weighted by an exponential temporal decay, We assumed that initial conditions have decayed and that the inequality holds, to prevent network activity from growing in time without bounds. In the limit of large

*N*, the real part of the eigenvalues of

*G*is bounded by (Gudowska-Nowak et al., 2003). Therefore, if , some eigenvalues of (

*G*−

*I*) have a nonnegative real part, and the integral does not converge. This corresponds to an unstable fixed point at

**x**=0, and network activity grows in time without bounds.

**x**

_{ext}(

*t*) with its average , and we perform the integral, obtaining (temporal average is denoted by overbar) where the vector

**1**has all

*N*components equal to one. Because the matrices of connection strengths are heterogeneous,

_{ext}*G*and

*G*, different neurons have a different mean activity. In order to calculate the spatially averaged activity, we compute the sample mean across neurons. For large

_{ext}*N*, this is independent of the specific realization of the spatial disorder; therefore, we perform its average over the distribution of connectivity strengths, namely, The average (angular brackets) is across all possible realizations of the random matrices

*G*and

*G*. We denote by the transpose operation. Note that in expression A.4, the row vector has

_{ext}*N*components, while the column vector

**1**has

*N*. In the following, we will use the same notation regardless of the dimension of

_{ext}**1**, since that can be determined by the dimension of the multiplied matrix. Since

*G*and

*G*are independent, we can substitute

_{ext}*G*with its mean, . Furthermore, we show in appendix B, equation B.15, that . Therefore, the mean activity is equal to This expression is used in the main text (see equation 3.1).

_{ext}*N*to obtain We rewrite this expression by using the trace operator and its cyclic invariance. Namely, for any arbitrary matrices

*A*,

*B*, the following equations hold: and Tr(

*AB*)=Tr(

*BA*). We obtain Again, since

*G*and

*G*are independent, we can average separately the factors involving the two matrices. A simple calculation shows that . Furthermore, we show in equations B.27 and B.28 that the following two equalities hold: and . Using the expression of the mean activity, equation A.5, the spatial variance is equal to This expression is used in the main text, equation 3.2.

_{ext}*Q*, since

*G*and

*Q*are dependent and do not commute. Note also that equation A.9 represents the covariance at zero time lag. We will consider the case of finite time lag at the end of this section.

*G*and

*G*are independent, we can average separately the factors involving the two matrices. A simple calculation gives . Furthermore, using equations B.30 and B.31, we obtain This corresponds to equation 3.4 in the main text. The factor is equal to It is never smaller than one, and it is very close to one for a wide range of parameters, including for large

_{ext}*K*and small . However, it diverges near the critical point .

*G*and

*G*are independent, we can average separately the factors involving the two matrices. We use . Furthermore, using equations B.30 to B.32 and neglecting all terms of order

_{ext}*N*

^{−1}, we obtain This expression is used in the main text in equation 3.5.

*Q*=

*Q*(0). The covariance at finite time lag is equal to (Gardiner, 1985) We define the temporally integrated activity as a linear convolution of the activity, namely, where

_{x}*h*(

*t*) is a given convolution kernel. Using the Wiener-Khinchin theorem and the convolution theorem, it is straightforward to calculate the covariance of the integrated activity, which is equal to where the second-order kernel is equal to . If time integration is slow enough, such that the kernels

*h*and

*h*

_{2}are approximately constant in a time interval in which the covariance

*Q*is sensibly different from zero, the above expression simplifies to In the last two equalities we have, respectively, integrated equation A.16 and used equation A.10, which we have multiplied by (

_{x}*I*−

*G*)

^{−1}on the left side and by on the right side. The latter expression can be averaged over the network disorder to compute the mean correlation of the integrated activity. We will consider only the case of , since the case is straightforward and is not our focus, and we denote

*Q*=

_{y}*Q*(0). In addition, we substitute

_{y}*h*

_{2}(0)=

*T*

^{−1}, where

*T*is defined as the characteristic integration time of the kernel.

*Q*are the temporal variances of the integrated activity of different neurons. As in the computation of the variance of

_{y}*x*, we take the sample mean across neurons and average over the spatial disorder, obtaining Again, we applied the trace operator to select the diagonal elements, we used its cyclic invariance, and since

*G*and

*G*are independent we can average separately the factors involving the two matrices. We use and equations B.27 and B.28 to obtain Note that the first term in square brackets is small, of order

_{ext}*K*

^{−1}, and could be neglected.

*Q*. The off-diagonal elements are the pairwise covariances, and the sample mean across neuron pairs can be averaged over the spatial disorder to obtain the average covariance. As in the computation of the covariance of

_{y}*x*, we use the matrix to select the off-diagonal elements, and we obtain Again, we used the cyclic invariance of the trace operator, and since

*G*and

*G*are independent, we can average separately the factors involving the two matrices. We use and equations B.27 to B.29. Because the leading term is

_{ext}*K*

^{−1}, here we keep terms of order

*N*

^{−1}, and we obtain

*K*

^{−1}in using equation A.20. This expression shows that correlations of integrated activity can be negative and are small, of order

*K*

^{−1}.

## Appendix B:. Traces of Random Matrix Products

In this appendix, we introduce the diagrammatic notation to calculate the quenched averages of random matrix products (see, e.g., Gudowska-Nowak et al., 2003). In the context of neural networks, a diagrammatic notation has been also implemented recently by Rangan (2009), Pernice, Staude, Cardanobile, and Rotter (2011), and Trousdale et al. (2012). Theoretical results are obtained for the gaussian distribution, although numerical simulations suggest that they generalize to other distributions with the same mean and variance (e.g., Bernouilli). We consider the case in which the mean of the matrix element is and then recover the scaling studied in the main text by analytical continuation and the substitution . We conclude by studying the case of a nonhomogeneous mean (e.g., interconnected excitatory and inhibitory neurons).

*R*in the limit of large

*N*(where the size of the matrix is ). The matrix

*R*is characterized by independent and normally distributed elements, each element having zero mean and variance

*N*

^{−1}: We start by calculating the second order, that is, the average trace of the square of

*R*. For convenience of notation, we omit the sum over the indices (in this case, the sum over the indices

*a*,

*b*,

*c*,

*d*): The diagram corresponding to this expression is shown in Figure 9a. The diagram is obtained by drawing one node for each one of the four indices

*a*,

*b*,

*c*,

*d*and by drawing an edge for each delta function in the expression, where the two nodes connected by the edge correspond to the two indices of the delta function. Horizontal edges are due to the operations of trace (base edge) and matrix multiplication (middle edge), while arc-shaped edges are due to averaging. The multiple edges determine different paths, and each pair of nodes connected by a path (even if not linked by an edge) corresponds to a pair of indices that must be equal, since they are connected by a sequence of delta functions. Therefore, for each closed loop in the diagram, there is one redundant delta function, which can be eliminated without performing the sum over the corresponding indices. This implies that each closed loop contributes with a factor

*N*due to a free sum over

*N*elements. Since the diagram for the second order has one loop, we have .

*k*. The next order is therefore the fourth order, which is equal to (again we omit the sum over all indices) The fourth order has three diagrams—one for each term in the sum, shown in Figure 9b. The middle diagram has two closed loops, and the other two have only one loop. Therefore, the other two terms can be neglected; the middle term contributes with a factor

*N*

^{2}, and the fourth-order gives . The contribution of the fourth-order moment () can be neglected because the corresponding terms in the sum have quartets of indices with the same value. The number of those terms is smaller by a factor of

*N*

^{2}with respect to the number of second-order terms. Similar arguments apply for higher order moments.

*k*has (2

*k*−1)!! diagrams of which only one has

*k*loops; therefore for all values of

*k*.

*R*have zero mean, while the matrices considered in the main text (

*G*and

*G*) have nonzero mean. As explained below, in order to calculate the average trace of matrix powers with nonzero mean, we need to compute averages where

_{ext}*R*is interleaved by the matrix of ones. We denote by

**1**the column vector of

*N*components all equal to one, by the row vector, and by the matrix with all elements equal to one (we denote by the transpose operation). We consider the two second-order terms: It is not surprising that these two expressions are equal, since the trace is cyclic invariant. The only difference between these expressions and equation B.2 is the absence of a factor in the former expression and in the latter. This corresponds to cutting, respectively, the middle and the base horizontal edges in the diagram of Figure 9a. In general, inserting a matrix of ones at a given point of the sequence of

*R*products is equivalent to cutting the horizontal edge at that point in the corresponding diagram. If the edge belongs to a closed loop, the cut has the effect only of removing a redundant delta function; there is no change in the contribution of that diagram to the sum. Conversely, if the edge belongs to an open path, the cut determines an additional

*N*factor, because the delta function removed was not redundant. Since all diagrams have at least one closed loop, inserting a single matrix of ones has no effect at all orders. Therefore, for all . Unless more loops are available to cut, inserting more matrices of ones may cut open paths; therefore, the trace may be multiplied by

*N*. An additional

*N*factor is obtained also by multiplying the matrix of ones with itself, which occurs whenever additional matrices are inserted at the same point in the sequence (we have that and if

*k*>0).

*N*

^{−1}). We consider the matrix

*G*equal to Note that the mean of this matrix has a different scaling with respect to that considered in the main text, but we will recover the latter by the substitution . A power of

*G*is calculated by multiplying

*G*to itself, and this determines an ordered product of powers of the matrices

*R*and . Note that these two matrices do not commute; therefore, the binomial theorem cannot be applied. We consider the average trace where the trace on the right-hand side is applied to an ordered product of matrices and matrices

*R*, and the sum runs over all the ordered products for a given

*k*and . Using the above results, we find that the contribution of any of those traces is zero for odd, equal to one for (provided that

*k*is even), equal to

*N*for , and at most of order for . Therefore, the leading-order terms are (for any value of

^{k}*k*) and (for

*k*even); all other terms can be neglected, and we find If the matrix

*G*is further multiplied by a matrix of ones, the term can also be neglected, and we find that for all values of

^{k}*k*. Note that if the mean of

*G*has a higher order in

*N*, the result still holds. This expression is particularly useful to compute the average of bracket expressions. Because = for any matrix

*A*and vectors

**x**,

**y**, the expression can be rewritten as for all values of

*k*. Since any infinitely differentiable function

*f*can be expanded in Taylor series, the above result implies that Therefore, the following expression can be calculated and used to compute the mean activity in the main text: Note that the substitution must be applied to recover the scaling studied in the main text.

*N*

^{2}, and the second order is . The fourth order is equal to The three diagrams are shown in Figure 10b. The first two diagrams have one loop, and the third has three. Therefore, that diagram contributes with a factor

*N*

^{3}, and the fourth order is equal to . The diagrams for the sixth order are shown in Figure 10c: only one diagram has four loops, and no diagram has three; therefore, . Iterating the procedure, we find that order 2

*k*has (2

*k*−1)!! diagrams of which only one has

*k*+1 loops; therefore for all values of

*k*. Other combinations of powers of

*R*and its transpose give a smaller contribution— for .

*R*and remains of order one. Adding more matrices increases the trace by an order

*N*for each matrix, provided that no further loops are cut.

*G*and its transpose, where the trace on the right-hand side is applied to an ordered product of matrices , matrices

*R*and matrices . If and , the trace is equal to

*N*

^{k+l}, and the term is of order one. If and , the trace contributes with an order

*N*, provided that

*k*=

*l*. If , the traces contribute at most with an order , and the term is of order one, while if , the term is of smaller order. Therefore, the leading order is

*N*, and we have In the case in which matrices of ones are inserted, the term , is no longer leading, and many other terms have to be considered. Those are the terms for , and for which additional inserted matrices cut the same loop. Since the leading diagrams at all orders have one two-node loop in the middle and one at the boundaries, if a matrix of ones is inserted in the middle or at the boundaries, additional matrices must continue to be inserted at the same place in order to cut the same loop. We eliminate one sum and use the index in place of and . We obtain Both expressions are equal to Furthermore, we calculate the average trace with two inserted matrices. In that case, the leading term is for and (or

*m*=0), and we obtain

*K*is proportional to

*N*, this substitution may change the order of magnitude of various terms in the summation considered above, possibly modifying the leading terms in each sum. Note that all series converge only for |

*g*|<1, but their sum can be evaluated at by analytical continuation. Then, approximating the sums by the leading terms described above is accurate under the assumption that all series involving lower-order terms converge to bounded functions of

*g*.

We conclude this appendix by studying the case of nonhomogeneous mean and variance. We have assumed that the mean and variance are homogeneous: they take the same value for different matrix elements: and . However, the same methods could be used to analyze the more general case in which the mean and variance are inhomogeneous. In fact, as long as the mean and variances do not depend on *N*, they do not change the order of different terms in the sums considered above. Therefore, the calculation would consist of taking only the leading terms and recalculating their value according to the new matrices of means and variances. For example, even in the inhomogeneous case, the sums resulting in equations B.15, B.29, and B.32 would still be determined uniquely by the mean , and the sums resulting in equations B.27 and B.30 would be still determined uniquely by the variance . Sums affected by both the mean and variance, such as those resulting in equations B.28 and B.31, would still be calculated by using only the leading terms determined above.

*g*is positive for excitatory neurons and negative for inhibitory neurons. In that case, all results above still hold, with the simple substitutions: Namely, the parameters

_{j}*g*and now measure the mean connection strength and the mean variance across presynaptic neurons. Simulations suggest that a similar substitution, a mean of

*g*and across all matrix entries, works well even for general nonhomogeneous parameters.

## Acknowledgments

This study was supported by the U.S. National Institutes of Health grant R01 MH062349 and the Swartz Foundation.

## References

## Notes

^{1}

The mean correlation must be larger than ; therefore, it must be nonnegative in infinitely large populations. Proof: Any covariance matrix *Q* is positive definite; therefore for any vector *h*. If we choose and define the correlation matrix , we have that . Therefore the mean correlation must be . Tighter bounds on the mean correlation can be obtained by using , where and are, respectively, the minimum and maximum eigenvalue of *Q*. This implies .

^{2}

Since the dynamics is linear (see equation A.1), the timescales of the network are determined by the eigenvalues of the matrix (*I*−*G*). A random matrix with gaussian and independent elements has eigenvalues distributed uniformly in a circle in the complex plane centered at 0 and of radius (where its elements have variance ; see Gudowska-Nowak, Janik, Jurkiewicz, & Nowak, 2003). One isolated eigenvalue is found approximately at *mN*, where *m* is the mean of the elements of the matrix and *N* is its dimension. Finally, the identity matrix translates all eigenvalues by one. Therefore, the eigenvalues of (*I*−*G*) are contained in a circle centered at 1 and of radius , with one additional eigenvalue approximately equal to . Timescales, in units of , are equal to the inverse of those eigenvalues; therefore, the fastest timescale is equal to , while the remaining timescales are between and .