## Abstract

A neuronal population is a computational unit that receives a multivariate, time-varying input signal and creates a related multivariate output. These neural signals are modeled as stochastic processes that transmit information in real time, subject to stochastic noise. In a stationary environment, where the input signals can be characterized by constant statistical properties, the systematic relationship between its input and output processes determines the computation carried out by a population. When these statistical characteristics unexpectedly change, the population needs to adapt to its new environment if it is to maintain stable operation. Based on the general concept of homeostatic plasticity, we propose a simple compositional model of adaptive networks that achieve invariance with regard to undesired changes in the statistical properties of their input signals and maintain outputs with well-defined joint statistics. To achieve such invariance, the network model combines two functionally distinct types of plasticity. An abstract stochastic process neuron model implements a generalized form of intrinsic plasticity that adapts marginal statistics, relying only on mechanisms locally confined within each neuron and operating continuously in time, while a simple form of Hebbian synaptic plasticity operates on synaptic connections, thus shaping the interrelation between neurons as captured by a copula function. The combined effect of both mechanisms allows a neuron population to discover invariant representations of its inputs that remain stable under a wide range of transformations (e.g., shifting, scaling and (affine linear) mixing). The probabilistic model of homeostatic adaptation on a population level as presented here allows us to isolate and study the individual and the interaction dynamics of both mechanisms of plasticity and could guide the future search for computationally beneficial types of adaptation.

## 1 Introduction

When we talk about adaptation, we take it to mean what W. Ross Ashby had in mind when writing his seminal book *Design for a Brain*, where he argued that in a volatile environment, “‘adaptive’ behavior is equivalent to the behavior of a stable system” (Ashby, 1954, p. 64). Active dynamical mechanisms that stabilize the activity of neural populations in spite of sudden changes in sensory inputs, lesions, or rewiring of synaptic connections have been studied extensively under the general term *homeostatic plasticity*. Some of them are confined within individual neurons, generally referred to as *intrinsic plasticity*; others operate within synapses and are thus grouped under the umbrella term of *synaptic plasticity*.

Here we aim to unify such forms of plasticity in a single mathematically simple framework of continuous-time stochastic processes that enables us to analyze their distinct functional roles and interactions and allows us to extend the notion of homeostatic plasticity from the level of individual neurons to that of populations.

A wide variety of specific candidate mechanisms of synaptic and intrinsic plasticity has been studied extensively, from both biological as well as information theoretical perspectives: Bienenstock, Cooper, and Munro (1982) proposed a form of synaptic plasticity that determines the growth or decay of synaptic connections under the constraint of maintaining a fixed mean firing rate. Turrigiano and Nelson (2004) conjectured a role of homeostatic plasticity in stabilizing the transmission of information in feedforward networks by fine-tuning the balance between excitatory and inhibitory connections. Both approaches assume a self-regulating form of synaptic adaptation that renders a neuron population invariant to additive shifts in its inputs, ensuring that resulting mean firing rates remain well within physiological bounds. In the abstract, dynamical systems models of biological spiking neurons, spike rate adaptation effects were incorporated by slowly changing adaptation variables (Izhikevich, 2003) and spike-induced responses (Jolivet, Lewis, & Gerstner, 2004) to closely match experimental measurements. In these models, a neuron's adaptation to its spiking output, rather than its input, is the driving force of homeostatic plasticity.

Scale invariance was proposed via a form of plasticity referred to as synaptic scaling (Turrigiano, 2008) or gain control (Burrone & Murthy, 2003), where synaptic connection strength or neural excitability is regulated, such that the variance of membrane potentials remains fixed. Multiple timescales of such adaptation were observed, potentially serving different purposes, that jointly improve information transmission (Fairhall, Lewen, Bialek, & de Ruyter Van Steveninck, 2000). Synaptic depression (Abbott, Varela, Sen, & Nelson, 1997) and diffusion of neurotransmitters (Sweeney, Kotaleski, & Hennig, 2015) were suggested as two fast-acting candidate mechanisms that could achieve scale invariance in order to increase the dynamic range of a neuron's output and thus improve its ability to transmit information.

The concept of a neuron as a bottleneck of information transmission (Bell & Sejnowski, 1995; Stemmler & Koch, 1999) expanded this notion and offered an information-theoretical explanation for the utility of homeostatic adaptation. In this framework, the maintainance of stable characteristics of a neuron's output, in spite of changes in the characteristics of its input, allows a neuron to discover and encode information about its inputs in a stable representation, thus making a population more robust to environmental changes, noise, or spike timing variability (Buesing & Maass, 2010). Intrinsic plasticity mechanisms were proposed to tune the nonlinear response function of neurons to optimize properties of their outputs, such as the maximization of information transmission (Toyoizumi, Pfister, Aihara, & Gerstner, 2005) or the minimization of divergence from a desired stationary distribution (Savin, Joshi, & Triesch, 2010; Triesch, 2007). Their interaction with synaptic plasticity and synaptic scaling was analyzed (Toyoizumi, Kaneko, Stryker, & Miller, 2014) and shown to yield emerging properties, such as the ability to implement blind source separation on prewhitened inputs in simple model neurons (Buesing & Maass, 2010; Savin et al., 2010; Triesch, 2007), a feat observable in vitro as well (Isomura, Kotani, & Jimbo, 2015). Similar results were obtained by Hyvärinen and Oja (1998), who used synaptic scaling in combination with simple nonlinear Hebbian learning rules to discover independent components.

These results provide crucial insight into the capabilities and limitations of neural plasticity and serve as the basis of this article. We contribute to this field of theoretical research by unifying the information-theoretical concept of intrinsic plasticity, enforcing a stable, fixed distribution of activations in the face of changing input statistics and Hebbian synaptic plasticity within an abstract but simple, probabilistic, and compositional model of adaptive neuron populations that avoids several limitations of the approaches discussed above. The model's activation function is directly parameterized by the membrane potential statistics that neurons should adapt to. Thus, the often complex update rules proposed in models of intrinsic plasticity are replaced by causal (nonlinear) filters of the membrane potential, such that adaptation dynamics can be analyzed and convergence ensured. Since our neuron model is fully determined by the stochastic properties of the neuron's membrane potential and activation processes, it could be easily adjusted to experimental data observed in vivo.

The continuous-time nature of our model facilitates studying the interaction between neural dynamics, synaptic dynamics, and plasticity and makes it easier to reconcile with its biological counterpart. By restricting plasticity to intrinsic and Hebbian synaptic plasticity while excluding global mechanisms such as a detailed balancing of excitation and inhibition or synaptic scaling, our model makes use only of information locally available to neurons and synapses, respectively. In light of Ashby (1954), we view many of the suggested benefits of plasticity as instances of the more general principle of homeostatic self-regulation, such that, for example, the purpose of blind source separation becomes, first and foremost, to find an informative, transformation-invariant representation of a population's input. We thus aim to elevate the notion of homeostatic adaptation from the level of individual neurons to the level of populations through the interplay of intrinsic and synaptic plasticity.

To illustrate the capabilities of the model, we reproduce results by Savin et al. (2010) and theoretically analyze the complementary role that intrinsic plasticity plays in stabilizing Hebbian learning, thus allowing individual neurons to discover informative components of their input signals. We demonstrate the generality of these results in a network trained on image patches, where lateral decorrelation drives neurons to learn a transformation-invariant representation of their multivariate inputs by implementing a form of principal or independent component analysis.

## 2 An Adaptative Network Model

The adaptive network model comprises adaptive neurons and adaptive synapses. The intrinsic plasticity mechanism implemented within each neuron uses locally available information about statistical properties of the neuron's membrane potential to adapt its behavior, such that its output remains stationary with predetermined statistics. Hebbian synaptic plasticity uses the product of (a function of) activations of a synapse's presynaptic source and postsynaptic target to update its connection strength. Correlated activity between source and target thus drives synaptic growth. By combining the adaptation of the marginal statistics via intrinsic plasticity with the adaptation of the copula via synaptic plasticity, the adaptive network realizes plasticity of its multivariate joint outputs and can become invariant to a large range of changes in its input statistics.

A population of neurons can thus be seen as a computational unit that receives a multivariate stochastic process as its input and linearly transforms and projects it on a set of adaptive neurons via adaptive synaptic connections. Each neuron then nonlinearly transforms its marginal input, and the joint activations of these neurons are taken to be the multivariate output of the population that subsequently becomes the input to the same or another population. (See Figure 1 for an illustration of the adaptive network model.)

We adhere to a simplistic yet powerful class of linear-nonlinear models (Ostojic & Brunel, 2011) that separates the dynamics of a neuron into two components: a linear, spatiotemporal filtering of inputs and a nonlinear transformation thereof, which yields the instantaneous firing intensity of the neuron. Although beyond the scope of this work, the model can be further extended using a spike train point process that samples spikes according to the neuron's time-varying firing intensity. Multiple input signals are linearly combined within the neuron's dendritic tree through weighted adaptive synaptic connections and integrated into a neuron's time-varying membrane potential , a real-valued stochastic process. The statistical properties of this membrane potential process are assumed to change rarely or slowly, such that the process can be locally well approximated by a (piece-wise) stationary process. The neuron's nonlinear activation function is parameterized by the vector and maps its membrane potential to an intensity or instantaneous firing rate , also referred to as the neuron's activation or output. It follows that the activation is a stationary stochastic process as well, and for both the membrane potential and the activation, stationary distributions and can be derived such that and (see lemma ^{7} in the supplementary text A.1). For the special case of an exponential function , this corresponds to a continuous-time generalized linear model as proposed by Truccolo, Eden, Fellows, Donoghue, and Brown (2005).

Conversely, given both a stationary distribution of membrane potentials and a desired distribution of firing rates , a nonlinearity can be derived that satisfies (see lemma ^{8}). This nonlinear activation function can be parameterized by some sufficient statistics of the membrane potential distribution , the estimation of which is the principal task of intrinsic plasticity. By continuously estimating , such an adaptation mechanism can maintain the desired output distribution in spite of gradual changes in the statistics of its membrane potential and thus generates a (nearly) stationary output process . In this work, spike generation is not modeled; instead, the continuous activations are used directly in a rate-coding paradigm to derive theoretical results.

### 2.1 Model Components

The network model described is composed of four components to be chosen independently. First, a class of stochastic processes can be chosen to model the dynamics of individual membrane potentials and thus determine their marginal membrane potential distributions. Second, the desired marginal distribution of the neurons' activation can be defined to match theoretical considerations (e.g., the exponential distribution—Triesch, 2007) or biological data (e.g., the log-normal distribution—Hromádka, DeWeese, & Zador, 2008). Third, codependency between the signals projected onto the neurons can be introduced or modified by an appropriate connectivity structure of synaptic connections. Finally, the precise mechanisms of intrinsic and synaptic plasticity can be chosen to implement invariance with respect to certain changes in the population's input statistics. Each of these modeling choices is briefly discussed next.

#### 2.1.1 Membrane Potential Processes

^{5}). This choice of stochastic process is commonly used as a candidate model of membrane potentials (Ricciardi & Lánský, 2006). While this choice of membrane potential process is mathematically convenient, questions have been raised about the applicability of such a model to biological data (Shinomoto, Sakai, & Funahashi, 1999), and a more sophisticated choice could be made here if required. The stationary membrane potential distribution family of choice should be general enough that any relevant changes in the distribution (e.g., due to effects of learning) are captured in its parameters. In the examples presented here, input processes exhibit either stationary gaussian, Laplacian, or beta distributions.

Note that the stochastic input term in the process makes no distinction between an unknown “signal” and “noise” present in the input—just the combination of both is modeled. This makes it possible to match the statistical properties to biological observations without knowledge of what is signal and what is noise in light of the neuron's computational role.

#### 2.1.2 Activation Distributions

The stationary distribution of firing rates affects the neuron's capacity to transmit information about its membrane potential and has a considerable impact on the neuron's computational role. For a fixed mean firing rate, for example, the exponential distribution maximizes the entropy of the neuron's intensity (Triesch, 2007), whereas a narrowly peaked bimodal distribution could alternate between periods of high and low firing rates, leading to more precisely timed bursts of spikes. Distributions with heavier tail probabilities turn the neuron into a coincidence detector, while others may match biological observations for certain neuron types best. When linearly decorrelated, marginally uniform activations become maximally informative (see also section 2.1.4). For a more detailed discussion of the effects, that the choice of an activation distribution can have on the neuron model, we refer readers to the supplementary text A.6. In the following, a continuous distribution is assumed for mathematical convenience, but this is not strictly necessary.

When modeling connected neural populations, the distribution should be chosen in accordance with the stationary distribution of the membrane potential process, such that a linear combination of multiple neurons' outputs, filtered by synaptic responses and integrated, results in the assumed stationary distribution of the membrane potential. However, when a gaussian process is chosen for the membrane potential, this may be neglected for a large number of synaptic inputs due to the central limit theorem and the combined smoothing effect of the synaptic and membrane potential spike response. For the examples presented here, either log-gaussian or log-Laplacian stationary activation distributions are used.

#### 2.1.3 Intrinsic Plasticity

Consider a model neuron as described above with a known membrane potential process that has the stationary exponential family distribution parameterized by sufficient statistics , where is some continuous function of the random variable . By filtering the process with a causal exponential filter, we construct an exponentially weighted running estimate of the membrane potential's sufficient statistics. The dynamic properties of this process are derived in the supplementary text A.4. As outlined in section 2, the neuron's activation function is chosen to map the membrane potential distribution with cumulative distribution function onto the desired output distribution with cumulative distribution function . By using as an empirical estimate of the expected value , the activation function thus always approximates the mapping from the current membrane potential distribution parameterized by to the desired firing intensity distribution, making the neuron invariant to any (slow) changes in the parameters of the membrane potential distribution. It should be stressed that this functional notion of intrinsic plasticity models the combined effect of all forms of homeostatic mechanisms within the neuron, such as spike-rate adaptation, effects due to depletion, diffusion, and aggregation of neurotransmitters, gene expression, and more. If desired, multiple adaptation variables, changing with different timescales, can be included as long as they serve as sufficient statistics of the membrane potential distribution.

#### 2.1.4 Connectivity and Copulas

Neurons within a population may be related through shared inputs or lateral connections. To model the relationship between the signals processed by each neuron, we assume that a population of neurons receives inputs from a multivariate stochastic process, which is subsequently linearly transformed by synaptic connections, projected onto the neurons, and integrated into their membrane potentials. With an appropriate choice of weights, a sufficiently large number of synaptic connections can induce an arbitrary covariance structure. Consequently, any change in the covariance structure of the input process could be reversed by an appropriate choice of synaptic connections (see the supplementary text B.3 for more information).

*copula*with density (Embrechts, Lindskog, & McNeil, 2001). For neurons, this allows us to express the joint distributions of a population's membrane potentials and activations as follows, where denotes a density and a cumulative distribution function of the corresponding random variable in the subscript (see lemma

^{10}in the supplementary text B.1):

Under the effect of intrinsic plasticity, all activations are identically distributed, and the joint distribution of activations is fully determined by the desired marginal stationary activation distribution and the copula . Note that , which captures the relation between the neurons' joint outputs, also reflects the structure already found between their membrane potentials, which is in turn determined by the structure between the input signals driving the neurons. The copula thus captures the interrelation between the neurons independent of their activation functions or any invertible transformation thereof (see the supplementary text B.1). The copula itself is a probability distribution, the entropy of which measures the mutual information between the random variables modeled by it (Ma and Sun, 2011). Mutual independence is thus achieved when the entropy of their copula is maximized, resulting in a jointly uniform distribution, also referred to as the independence copula.

Specifically for stationary membrane potential processes with gaussian or Laplacian distributions, as used in the following, the copula is fully determined by the covariance structure of the inputs and the synaptic connection weights. A direct consequence of this is that an appropriate choice of synaptic connection weight can be used to render the neurons fully statistically independent (see the supplementary text B.2). In the special case of marginally uniform firing rate distributions, the copula is identical to the population's joint firing rate distribution, and mere uncorrelatedness of the neurons corresponds to jointly uniform, and thus statistically independent, outputs. The choice of the marginal membrane potential distribution thus defines what aspect of the copula synaptic weights can influence, whereas the marginal output distribution defines what can be revealed about it through (linear) correlation coefficients or Hebbian learning. This conceptual separation of the marginal distributions, on which the adaptive neurons operate, and the copula, which captures the full interrelation of neurons due to their related inputs and synaptic connections, allows us to better analyze the effects of intrinsic and synaptic plasticity and their interaction.

#### 2.1.5 Hebbian Plasticity

### 2.2 The Full Network Model

. | Gaussian log-Gaussian . | Laplacian log-Laplacian . | Beta Uniform . | Laplacian log-Gaussian . |
---|---|---|---|---|

: | ||||

: | ||||

where : | ||||

and : |

. | Gaussian log-Gaussian . | Laplacian log-Laplacian . | Beta Uniform . | Laplacian log-Gaussian . |
---|---|---|---|---|

: | ||||

: | ||||

where : | ||||

and : |

In some simulations, to ensure that multiple neurons within a population represent different aspects of their inputs, lateral inhibition is employed, such that one neuron inhibits all others, the next neuron inhibits all but the first, and so on. These inhibitory synaptic weights are trained via simple anti-Hebbian learning (Földiák, 1990) according to rule 2.15 with a negative scaling constant and constrained to remain negative. This connectivity structure introduces a strict ordering of decorrelated neurons without imposing recurrent connectivity on the population.

## 3 Results

To illustrate the properties of the model described above, we present three simulation examples.

### 3.1 Intrinsic Plasticity in Isolation

First, consider a gaussian model neuron with a leaky integrating membrane potential as given in equation 2.14, driven by a single gaussian process as given by equation 2.16. During simulation, the mean and standard deviation of the gaussian input signal are changed at three points in time, resulting in four time intervals in each of which the membrane potential exhibits a different stationary gaussian distribution. We estimate the sufficient statistics of the respective membrane potential distributions through the two adaptation variables and (see Table 1, gaussian log-gaussian), both of which change on a timescale much slower than the autocorrelation in the membrane potential process.

During simulation, the sufficient statistics, as well as the Kullback-Leibler divergence between the neuron's estimated and real membrane potential distribution, are tracked and aggregated over 100 trials. This equivalently measures the divergence between the desired, log-gaussian output distribution and the one actually achieved by the neuron (see supplementary text A.5 for a derivation).

The results are summarized in Figure 2. Evidently the model neurons reliably adapt to the various input statistics on a timescale determined by the dynamics of the adaptation variables, as well as the dynamics of the membrane potential. For an analytical derivation of the adaptation dynamics, we again refer you to the supplementary text A.4. The result illustrates that using the estimated membrane potential statistics in the parameterized activation function allows the neuron to compensate for sudden changes in the input distribution and maintain the neuron's desired output distribution.

### 3.2 Interaction between Intrinsic and Synaptic Plasticity

While the ability of individual neurons to adapt to changes in their environment as illustrated in the previous experiment is arguably a very useful feature by itself, our primary interest lies in the interaction of such plastic neurons with plastic synapses. We present two variations of an experiment inspired by Hyvärinen and Oja (1998) and Savin et al. (2010) to illustrate how the interaction of both plasticity mechanisms can drive individual neurons to become selective to a transformation-invariant representation of their multivariate input by discovering a principal or independent component. We are able to reproduce these results using only intrinsic plasticity and Hebbian synaptic plasticity as discussed above and need not rely on a third, distinct mechanism of synaptic scaling as suggested there. For both conditions, we provide a detailed fixed-point analysis of the synapse dynamics in the presence and absence of intrinsic plasticity and show theoretically how independent component analysis emerges (only) from the complementary contributions of intrinsic and synaptic plasticity. To this end, we use plasticity operating on two distinct timescales: a fast-acting form of intrinsic firing rate adaptation that renders the neurons invariant to changes in the scale of their membrane potentials and much slower-acting Hebbian plasticity that adapts the weight vectors projecting into the neuron.

In both simulations, consider two independent source signals modeled by stationary stochastic processes. The source signals are mixed into two different linear combinations by multiplying the two-dimensional signal vector with a mixing matrix , here chosen to be the rotation matrix with an angle of . The transformed signal vector is then projected into two adaptive neurons through a matrix of adaptive synaptic weights . The process driving the two adaptive neurons is thus given by . Recovering the two original source signals such that requires finding the matrix without knowing . This problem is accordingly referred to as blind source separation (Keziou, Fenniri, Messou, & Moreau, 2013). Since the sources here represent statistically independent signals, this can be realized through independent component analysis (Hyvärinen & Oja, 2000, short: ICA). In both simulations, the covariance matrix of both neurons' membrane potentials is determined by the constant mixing matrix and the time-varying weight matrix . In particular, the variance of each neuron 's membrane potential is given as a function of the corresponding row of , that is, the vector of weights projecting into neuron . At each point in time, the effect of the fast-acting intrinsic plasticity, which renders the neurons invariant to changes in their marginal membrane potential variances, can thus be modeled as a normalization of each membrane potential: . This effect of intrinsic plasticity is equivalent to synaptic scaling (Turrigiano, 2008), albeit implemented within the neuron without explicit knowledge of the weight vector, rather than changing the weight vector itself. For each weight vector , this makes it possible to theoretically derive the vector field of expected weight changes from the Hebbian weight update equation 2.7, identify fixed points, and prove convergence. A corresponding vector field in the absence of intrinsic plasticity can be derived in strict analogy, only replacing the normalization of the membrane potential by a scaling with a constant independent of the weight vector: . An analytical derivation of attractor landscapes can be found in the supplementary text C.2.

We show that the basic Hebbian plasticity rule, which updates synaptic weights according to equation 2.15 based on the pre- and postsynaptic activities and , respectively, can achieve blind source separation if and only if paired with intrinsic plasticity. By recovering the original independent sources, the neurons become invariant to the mixing effect of an (orthogonal) transformation , illustrating a network-level form of invariance due to plasticity. Thus, multiple neurons can develop a nonredundant, information theoretically efficient representation of their multidimensional inputs (Isomura et al., 2015).

#### 3.2.1 PCA in Individual Neurons

For the first experiment, assume the sources to be gaussian with different variances. The model neurons are chosen to map gaussian membrane potentials to log-gaussian activations. Over time, we trace the evolution of the synaptic weights under the influence of synaptic and intrinsic plasticity and observe that both weight vectors rapidly align themselves with the largest eigenvector of the covariance matrix of the joint membrane potential distribution. As a consequence, both neurons become tuned to the first principal component of the two-dimensional input, which corresponds to the one recovered original source with largest variance. The magnitude of the resulting weight vectors is equal for both neurons. The top-left panel of Figure 3 summarizes these results, and the two panels on the top right confirm that due to plasticity, the empirical distributions of the membrane potential and activation approach the desired distributions at the end of the simulation.

To understand these results, the respective roles of intrinsic and synaptic plasticity must be analyzed. Hebbian synaptic plasticity as employed here drives the synaptic connections to maximize postsynaptic activations. Due to the nonlinear relationship, here exponential, between membrane potential and activation, more dispersed stimuli are more effective at driving the neuron to fire at high rates, since large, positive deviations of the membrane potential are amplified, whereas similar negative deviations are attenuated. (For a precise derivation of how changes in higher-order moments of the membrane potential distribution affect the neuron's mean firing rate due to its nonlinear activation function, see the supplementary text C.1.) Hebbian plasticity thus rotates the weight vectors toward those directions in input space where dispersion is maximized. Here, for the example of gaussian signals, where all moments beyond the second vanish and due to the activation function's strong sensitivity to the second moment, they correspond to the directions of the principal components, ordered by the associated eigenvalues. In the absence of intrinsic plasticity, synaptic scaling, or other stabilizing mechanisms such as the BCM rule (Bienenstock et al., 1982), such a rule invariably leads to instability. Here, however, fast-acting intrinsic plasticity stabilizes Hebbian plasticity by keeping the post-synaptic neuron's output distribution constant, independent of the magnitude of the current weight vector, thus constraining synaptic growth. The choice of the desired activation distribution of the neuron thus indirectly determines the stable length of the weight vector resulting from Hebbian plasticity, as well as the speed and reliability of convergence to the principal components.

The bottom left panel of Figure 3 shows the attractor landscape of the Hebbian learning procedure, which exhibits two stable solutions, each corresponding to a weight vector aligned with the first principal component and an unstable fixed point at 0, pushing weight vectors to non trivial solutions. As indicated by the domains of attraction, a single neuron with Hebbian synapses and intrinsic plasticity converges to either of two representations of the first principal component of the input, recovering the source signal with larger variance. The temporal dynamics and stability of the adaptation procedure are illustrated in the second row panel on the right side of Figure 3, where the angles between each weight vector and the direction of the first principal component () are plotted as a function of time. After quickly rotating toward , the weight vectors remain stable at an angle close to 0, demonstrating the long-term stability of the solution. To illustrate the crucial role of intrinsic plasticity for stabilizing Hebbian learning, the bottom right panel of the same figure shows the phase space and weight vector trajectories from identical starting positions in the absence of intrinsic plasticity. For the parameters chosen here, both initial conditions now lie within the domain of attraction of a stable fixed point at the trivial solution .

#### 3.2.2 ICA in Individual Neurons

The perspective gained from the previous example can be transferred to a more interesting experiment presented by Savin et al. (2010). Consider now that the two sources are identical, independent nongaussian (here Laplacian) processes, which are again mixed by the same matrix and projected onto adaptive neurons through a matrix of adaptive synaptic weights . Applying the same reasoning as in the previous example, Hebbian learning paired with intrinsic plasticity leads the neuron to discover those projections of the input signal that result in the most dispersed membrane potentials. However, due to the fact that now their covariance matrix itself is orthogonal, any normed linear combination of the inputs has the same variance, and thus no particular unique principal components can be defined. As a consequence, the original source signals cannot be recovered using any method such as PCA that takes only the second moment, that is, the covariance matrix, into account. However, since the input signals are now no longer gaussian, higher-order moments beyond the variance, such as kurtosis, can be taken into account to define unique directions of maximum dispersion. In the example presented here, the source signals' marginal Laplace distributions are leptokurtic, showing a higher kurtosis than any normalized linear combination of the two. The independent components (ICs), maximizing kurtosis, thus correspond to the demixed original source signals.

Since the nonlinear activation function is also sensitive to higher moments beyond the variance (see the supplementary text C.1), the neural adaptation procedure in this case selects the directions that maximize kurtosis and thus discovers the most prominent independent component, which corresponds to one demixed source signal. (See Figure 4 for a summary of the results.) To demonstrate that both neurons, due to different initial weight vectors, truly discover independent components, the top-right panel of Figure 4 shows the copula function of both neurons at the beginning and end of the simulation, converging to a uniform distribution that indicates statistical independence (rather than mere uncorrelatedness).

#### 3.2.3 PCA and ICA in Populations

In the two experiments above, we consider the extreme cases where either no unique independent components can be defined, because the kurtosis of the gaussian inputs is 0, or no unique principal components can be defined, as the covariance matrix of the Laplacian inputs is the identity. In each case, a neuron then recovers the appropriately defined dominant component, a principal component or an independent component, respectively. Since the nonlinear activation functions chosen here depend on both second- and third-order moments (and more), this raises the question of whether, in the presence of both principal and independent components, a model neuron would select the former, the latter, or neither, instead converging to a compromise between the two. For an exponential activation function, the optimal solution maximizes a weighted combination of all moments of the neuron's membrane potential, with lower moments being the most influential (see supplementary text C.1). While dominant principal components are thus strongly favored by the adaptation procedure, independent components or yet other directions could coexist as stable solutions.

In order for a population of adaptive neurons to implement either PCA, ICA, or any other transformation-invariant representation of the population's input signals, it is crucial that individual neurons reliably become selective to different components. In the presence of strongly dominant components, as shown in the first example, different initial conditions alone are insufficient to ensure that different principal or independent components are discovered.^{1} To mitigate this problem, lateral strictly inhibitory synapses with scale parameter are placed between the adaptive neurons and updated via anti-Hebbian learning as outlined in section 2.2.

For two Laplacian source signals as in the previous example, we vary the mixing coefficients to realize three different scenarios in which the directions of maximum dispersion change from favoring principal components to favoring independent components. In all three cases, we train adaptive neurons mapping the Laplacian inputs to log-gaussian outputs and trace the weight vectors under the effect of Hebbian and intrinsic plasticity. (See Figure 5 for a summary of the results.) In accordance with expectations, the lateral decorrelation forces neurons to discover different components of the inputs in a predictable manner. While one adaptation variable emulates synaptic scaling by controlling the membrane potential's variance, the other compensates for the imbalance between feedforward activation and the combined effect of feedforward and lateral inhibition by controlling the mean. The identities of the extracted components by each neuron may change under nonorthogonal transformations, but the resulting representation of the input learned by the population is invariant with respect to further orthogonal transformations of the input.

### 3.3 Learning Representations of Image Patches

Finally, to demonstrate the generality of the results discussed thus far on high-dimensional data, we present a network of the same structure as above, evaluated on two different data sets of image patches of size 2828 pixels. For each pixel, an adaptive “sensory neuron” is trained to map its input, the corresponding pixel's intensity (plus a small noise term) scaled to the range from 0 to 1, which we assumed to be distributed according to a beta distribution, onto a uniformly distributed activation in the range from 0.5 to 0.5. These marginally uniform activations are subsequently used as stimuli and projected through Hebbian synapses onto five adaptive neurons that are laterally decorrelated via anti-Hebbian synapses in the same manner as in the previous example with a relative weight scale of . The neurons are chosen to map gaussian inputs to log-gaussian activations and thus exhibit the same exponential activation function as discussed previously. Each image patch is presented for a simulated time of 50 ms, and the total simulation time is 150,000 s at discretization steps of 1 ms each. Time constants of the neurons' membrane potentials are set to 5 ms, and adaptation rates for intrinsic, anti-Hebbian, and Hebbian plasticity are set to 10 s, 100 s, and 200 s, respectively.

First, a freely available collection of 500,000 natural image patches (Winder & Impressive Machines LLC, N.d.) is used, where the model neurons are shown to become selective to principal components, rather than independent components as extracted using an implementation of the FastICA algorithm (Hyvärinen & Oja, 2000; Pedregosa et al., 2011). For a summary of the results, see Figure 6. Due to the unstructured nature and translation-invariant statistics of the natural image patches, this result is in line with our expectations. Since the inputs to the adaptive neurons violate the assumption of gaussianity, intrinsic plasticity fails to achieve identical variance in the activation of all neurons but still succeeds in stabilizing Hebbian plasticity. The discrepancy between the assumed gaussian membrane potential distribution and the sparser observed distributions resulting from the neurons' discovery of nongaussian components could be alleviated by using a more general membrane potential distribution family such as the generalized normal distribution, which contains both gaussian and Laplacian distributions as special cases. Here, however, we restrict ourselves to the much simpler assumption of gaussian membrane potentials, for which closed-form expressions of the sufficient statistics exist.

Second, an identical network with the same parameters as in the previous case is used with 60,000 image patches of the same size from the MNIST database of handwritten digits (LeCun, Cortes, & Burges, N.d.). (For a summary of the results, see Figure 7). Due to the structuredness of the sample images, components representing the class averages emerge as the learned weights rather than principal components. In an unsupervised fashion, the population thus learns a sparse representation of its inputs. This result is in line with a prediction made by Triesch (2007), who proposes that in a high-dimensional learning problem with clustered data, the relative contribution of each cluster on a neuron's activation can be approximated by the expected activation in a corresponding percentile of the activation distribution.^{2} In this case, a neuron optimally tuned to respond strongly to a single cluster out of the 10 available can be heuristically expected to exhibit the 10% of its highest firing rates in response to inputs drawn from its favored cluster, resulting in an expected activation of in response to that class. Assuming that intrinsic plasticity successfully enforces the desired activation distribution with mean , the ratio could be viewed as a measure of selectivity toward a dominant class, which can be predetermined by an appropriate choice of activation distribution. As the neuron's mean activation drives Hebbian learning of its input synapses, this selectivity in the neuron's response to specific input classes is proportionally reflected in its synaptic weights. For a log-gaussian distribution with parameters and as used here, a well-tuned neuron could thus be expected to show an overall mean activation of with a mean activation of in response to its favored class input. For comparison, a neuron with enforced exponential activation distribution of equal mean could be expected to respond with a mean activation of to its favored cluster and would thus be slightly less selective for a unique class. For the first neuron in the numerical simulations above, the empirical results at and even exceed the heuristic predictions, implying that the neuron is indeed highly selective to a single cluster mean, in this case corresponding to the digit 0. The discrepancy between theoretical prediction and numerical results may be attributed to several compounding effects, such as temporal dynamics of the stochastic processes used here, high intraclass variability, or the correlatedness of the cluster centers of the MNIST digits, which make a direct quantitative comparison between numerical results and theoretical predictions difficult.

In both cases discussed, convergence to the final solution is quick, with components converging one by one in the sequence determined by the lateral inhibition structure, and the discovered representation remains stable over the remainder of the simulation. As the learned components reflect the structure present in the stimuli, the population's representation of its input is thus invariant to (orthogonal) transformations in the 784-dimensional input space, as any such transformation of the input space is counteracted by an according adjustment of synaptic weights and intrinsic excitability.

## 4 Discussion

Our numerical results confirm the theoretical conclusion that a population of adaptive neurons, implementing only local mechanisms of intrinsic and synaptic plasticity, can realize homeostatic self-regulation that renders it invariant with respect to affine-linear/orthogonal transformations of its multivariate input. This is here achieved by the recovery of independent or principal components, but could be similarly realized by finding any other arbitrary but uniquely defined representation of the population's input. For clustered inputs, the activation distribution could be chosen in accordance with a heuristic outlined by Triesch (2007), such as to achieve a certain specificity in the neurons' response to specific clusters and thus enforce a sparse code.

The generality of the model allows the implementation and combination of various other forms of invariance, even where gradient-based methods might become prohibitively complex. For example, the stochastic membrane potential process with its parametric stationary distribution can be chosen freely, so that its sufficient statistics resemble the properties we wish the neuron to become invariant to, such as higher-order moments.

Nonlinear forms of Hebbian plasticity as discussed by Brito and Gerstner (2016) could be used to learn sparse input representations other than independent or principal components. Additionally, a concave nonlinear dependency on the postsynaptic activation could act to further stabilize Hebbian learning by slowing synaptic growth at high firing rates, thus allowing for much slower intrinsic adaptation to compensate.

By including interaction delays in the synapse model, synaptic plasticity could be used to influence not just the instantaneous correlation structure between neurons but the full auto- and cross-correlation structure in time. By choosing different time constants for different adaptation variables, multiple timescales of adaptation can be modeled, a phenomenon observed in vivo (Fairhall et al., 2000; Turrigiano, 2008). Here, only a combination of slow Hebbian plasticity with fast intrinsic plasticity, as could be realized by rapidly acting homeostatic mechanisms such as spike rate adaptation or the accumulation or depletion of neurotransmitters, is discussed. However, biological evidence suggests that in particular, slow intrinsic adaptation (Turrigiano, 2008) and fast-changing weights (Zucker & Regehr, 2002) should be studied. Instead of the artificially constrained topology of lateral connectivity, which is used here to demonstrate the recovery of (only) dominant components, random sparse connectivity could be used in a sufficiently large network to find a similar, overcomplete, and invariant representation of the population's input.

Despite the generality of our results, we make several assumptions that should be asserted. First, intrinsic plasticity here operates on membrane potentials only. While rate-based or spike-triggered adaptation could be included, we abstain from doing this due to the complexity of the resulting model. Second, the neuron model is rate based; spiking effects could be included only via point or renewal processes with time-varying rate functions. This constrains the scope of spike timing effects and thus spike timing–dependent plasticity mechanisms that can be incorporated into the model. Third, intrinsic adaptation effects are not modeled as part of the membrane potential itself but rather as separate adaptation variables, similar to, for example, the adaptive exponential model (Gerstner & Brette, 2009), and can thus be observed only by their effect on the neurons' output behavior. Fourth, for the derivation of adaptation dynamics, the sufficient statistics that the neurons become invariant to are assumed to be twice differentiable for mathematical convenience, despite this not being a necessary condition. Fifth, the restriction of membrane potentials to (locally) stationary diffusion processes, albeit still very general, may be overly restrictive for some applications where synaptic inputs exhibit more structure, such as neurons driven by a low number of incoming spikes. In such cases, a different class of processes should be used. Finally, the model is stated here in the limiting case, where changes to the parameters of the neurons' inputs occur so slowly or rarely that the resulting membrane potentials can be assumed to be locally or piece-wise stationary. When adaptation variables and input statistics evolve on similar timescales, this assumption is violated; the separation between input processes and their (slowly changing) parameters breaks down, and some of the arguments presented here can no longer be directly applied. In a machine learning context, this problem is commonly referred to as concept drift (Tsymbal, 2004), and adaptive networks might be well capable of coping with it. In this regime of very fast adaptation, intrinsic plasticity might no longer stabilize neural activations but could instead endow neurons with a local estimate of membrane potential statistics, potentially increasing their dynamic range (Fairhall et al., 2000; Fairhall, Lewen, Bialek, & de Ruyter Van Steveninck, 2001). This remains to be studied in future work.

In ongoing work, we explore whether a simple choice of intrinsic plasticity, resulting in scale invariance, can be used to stabilize activity in sensory neuron populations, deep feedforward and recurrently connected networks, thus improving information transmission as suggested by Turrigiano and Nelson (2004). This effect could potentially complement or replace specifically crafted inherently stable (deep) feedforward networks (see Klambauer, Unterthiner, Mayr, & Hochreiter, 2017, for example) by addressing the problem of vanishing gradients, a pressing issue in the field of deep learning.

Conversely, despite the hypothesized stabilizing effect of intrinsic plasticity in feedforward networks, pathological recurrent connectivity could potentially lead intrinsic plasticity to actively destabilize a network. Such pathological network behavior needs to be studied to discover limitations and possible undesired sideeffects of plasticity and could potentially provide some theoretical insight into neurological conditions such as epilepsy.

Finally, in order to assert the biological compatibility of the abstract mathematical model presented here, the biophysical mechanism underlying plasticity needs to be tied to adaptation variables, and a reasonable class of membrane potential processes and instantaneous firing rate distributions needs to be determined from biological evidence to guide modeling choices.

## Appendix A: Mapping Stochastic Processes

### A.1 Preliminaries

*(Itô's lemma, theorem 4.2.1. of Øksendal, 2003).*For a function and an -dimensional stochastic Itô process , the transformation is a -dimensional Itô process that can be expressed as

For linear transformations , we see that stochastic differentials preserve linearity. A special case of this lemma is:

Using corollary ^{2} and lemma ^{3}, the properties of a specific process, the Ornstein-Uhlenbeck process, can be derived.

*(Ornstein-Uhlenbeck process).*Let be a one-dimensional Itô process of the form where . We call an Ornstein-Uhlenbeck process (Stepanov, 2013, section 2.6). For an initial condition , its solution is given by For constant , , it is a stationary process with stationary gaussian distribution with mean and autocovariance .

When the above process is driven not by white noise but an integrable time-varying function (i.e., the solution of another stochastic process), a solution of can be equivalently derived where integration is not performed with respect to the differential but instead with respect to .

### A.2 Deriving a Nonlinear Activation Function

Let denote the cumulative distribution function (CDF) of , and let denote the CDF of an arbitrary distribution . Define . If is continuous with density , then the distribution from lemma ^{5} is equal to with density .

### A.3 Autocorrelation of Filtered OU Process

^{4}) with autocorrelation function then the resulting autocorrelation function for with is determined by the stochastic input and can be derived as follows (we assume , ): By considering the limit where both and are far from 0 and and the influence of the initial condition vanishes, this can be simplified to a stationary autocovariance function: A direct result from equation A.29 is that the variance of the membrane potential then is given by .

### A.4 Estimating Sufficient Statistics

^{4}. Let be an Ornstein-Uhlenbeck with time-varying mean and standard deviation and . We consider an adaptation variable . For and for , we can use the deterministic part of the solution as defined in lemma

^{4}to derive the expectation The causal filter here is a valid probability density with mean . We thus see that the adaptation variable approximates the membrane potential's true mean value, lagging behind with a delay on the scale of . A similar derivation can be done for the second adaptation variable, . The larger the two time constants and are chosen, the smoother, yet slower, the resulting adaptation variables approximate the true sufficient statistics. The noise in the adaptation variable can be estimated by considering its variance, which, as calculated as in section A.3, is proportional to and approaches 0 for a sufficiently slow adaptation time constant . We conclude that the proposed mechanism of estimating sufficient statistics via filtering is valid and asymptotically correct, and it follows the dynamics derived here. See Figure 2 to verify the match between derived and simulated adaptation parameter dynamics.

### A.5 Approximation Quality of Adaptive Neurons

To measure the quality of an adaptive neuron's temporary estimate of the current stationary membrane potential distribution , the Kullback-Leibler divergence between the two can be used. This measure is very informative, since it simultaneously represents the mismatch between the neuron's produced and desired activation distributions:

How well the desired distribution of activation can be achieved thus crucially depends on how well the neuron's adaptation parameters can approximate the sufficient statistics of the membrane potential distribution. Since the divergence on both membrane potential and activation side of the neuron is identical, we just refer to the divergence between neurons.

^{7}, the divergence between two gaussian neurons can be calculated based on the membrane potential distribution's sufficient statistics and the corresponding approximation by the adaptation variables, The Kullback-Leibler divergence is then defined as follows: We see that this converges to 0 for and .

### A.6 Effects of the Choice of Activation Distribution

The choice of membrane potential distribution and activation distribution together determines the resulting activation function. While the computational role of the neuron is often explained based on the nonlinearity used, the distributions allow a different perspective. Consider first a discrete, bimodal probability distribution of activations, where the activation is either at a high, fixed value with probability or a 0 with probability . The expected activation or firing rate then is , and the neuron switches between periods of high firing rates (bursts) and periods of low firing rates.

Due to the continuous nature of the membrane potential process, the periods of bursting can be expected to be dense, measurable intervals, which allows us to understand the neuron as a bursting neuron. Depending on the time constants of the membrane potential dynamics, these intervals could represent longer or shorter bursts of spikes. For sufficiently high rates and low with a constant mean firing rate , the timing of spikes can be made precise, in particular if renewal processes with refractoriness are used to sample spikes, limiting the number of spikes per interval. The shape of the distribution of firing rates thus allows influencing the temporal precision of spiking and controls the amount of noise introduced into spike trains sampled by an inhomogeneous Poisson process. For this example, the corresponding cumulative distribution function, and thus the nonlinearity, are step functions, and the neuron switches between stochastic bursting and silence. For continuous activation distributions, similar observations can be made. Highly curtotic distributions, where the mean is much larger than the median, are dominated by rare yet disproportionately high firing rates and can thus best be viewed as encoding such rare events that elicit high membrane potentials (coincidences). Depending on the spike sampling mechanism used, different distributions might yield optimal information transmission through spiking output.

## Appendix B: Copulas and Joint Distributions

### B.1 Preliminaries

*(Copulas, Sklar's theorem; see Embrechts et al., 2001).*For an -variate continuous random variable with joint cumulative distribution function and invertible marginal cumulative distribution functions , the copula function can be defined as follows:

Except information about the marginals, the copula thus contains all information about the joint distribution.

*(marginal invariance of copulas).* Let be an -variate continuous random variable with copula and let be monotone functions. Then and have the same copula .

### B.2 Dependency of the Copula of the Joint Distribution of Activation on the Covariance of Gaussian Inputs

^{10}, the stationary distribution of the neurons' joint activation thus has the same copula as the joint membrane potential distribution and the joint input distribution. The stationary, multivariate gaussian input process with covariance matrix has a gaussian copula (Embrechts et al., 2001) of the following form, where denotes the standard gaussian CDF: Given , which depends on only the covariance matrix of the stationary joint distribution of the signals driving a population of neurons and marginal probability densities of the neurons activation, the joint distribution of activation in a population is thus fully specified.

### B.3 Inducing Structure through Input Weights

The same reasoning can be applied across time, rather than neurons, when considering the autocovariance of a single neuron instead of the covariance between different neurons. If the autocovariance structure of a gaussian membrane potential process across different points in time is modeled by a multivariate gaussian distribution, it can again be decomposed into a (gaussian) copula and marginal distributions. The copula is preserved under the nonlinear transformation through the neuron's activation function and thus captures the autocovariance structure of its activation process as well. By adding multiple weighted synaptic connections with different delays for a single source signal, the autocovariance can be controlled in the same way as the covariance is controlled above.

Combining these two approaches, a set of weighted, delayed synaptic connections can be used to induce cross-covariance structures in the neurons' inputs and thus membrane potentials and activations.

## Appendix C: PCA and ICA in Single Neurons

### C.1 Sensitivity of Nonlinear Neurons to Higher Moments

For a linear function, higher-order coefficients are , and the expected value of the output depends on only the first and zero-order moment. For polynomials of order , all moments up to order influence the expected output of the function. For a general exponential function with parameters and , all coefficients are positive and form a decreasing sequence that converges to 0. In this case, the expected value of the output is a linear combination of all centered moments of with rapidly decreasing weights. For a gaussian random variable , where the mean and all moments beyond the second are fixed at zero, the expected value of the output is thus dominated by the second moment, the variance, whereas for a zero mean, unit covariance Laplacian distribution the third and higher moments influence the expected value of . The relative dependence of a function's expected output on the moments of is determined by the coefficients of its Taylor expansion (if it exists).

### C.2 Fixed-Point Analysis of the PCA and ICA Neurons

Since the stationary joint distribution of is explicitly given in both experiments (a multivariate gaussian or a product distribution of two independent Laplace distributions, rotated by ), these expectations can be analytically calculated for the two neuron types (adaptive and nonadaptive) in both conditions (gaussian and Laplacian inputs). Locations of fixed points can be found where expected weight changes vanish to zero. The bottom panels of Figures 3 and 4 show the calculated vector field for both neuron types in both experimental setups, respectively.

## Notes

If it were the case that an individual neuron, depending on its initial weight vector, would select a nondominant component, this would be considered a flaw of the algorithm, as in that case, it could not be ensured that the most relevant components of the input are discovered.

Assuming that cluster centers are linearly independent and equally frequent.

The conditions are quite general yet very technical and can be found in Capasso and Bakstein (2015, theorem 4.56).

## Acknowledgments

We thank our anonymous referees for their constructive feedback, which greatly helped to improve this article.