## Abstract

Supplementing a differential equation with delays results in an infinite-dimensional dynamical system. This property provides the basis for a reservoir computing architecture, where the recurrent neural network is replaced by a single nonlinear node, delay-coupled to itself. Instead of the spatial topology of a network, subunits in the delay-coupled reservoir are multiplexed in time along one delay span of the system. The computational power of the reservoir is contingent on this temporal multiplexing. Here, we learn optimal temporal multiplexing by means of a biologically inspired homeostatic plasticity mechanism. Plasticity acts locally and changes the distances between the subunits along the delay, depending on how responsive these subunits are to the input. After analytically deriving the learning mechanism, we illustrate its role in improving the reservoir’s computational power. To this end, we investigate, first, the increase of the reservoir’s memory capacity. Second, we predict a NARMA-10 time series, showing that plasticity reduces the normalized root-mean-square error by more than 20%. Third, we discuss plasticity’s influence on the reservoir’s input-information capacity, the coupling strength between subunits, and the distribution of the readout coefficients.

## 1 Introduction

Reservoir computing (RC) (Jaeger, 2001; Maass, Natschläger, & Markram, 2002; Buonomano & Maass, 2009; Lukoševičius & Jaeger, 2009) is a computational paradigm that provides both a model for neural information processing (Häusler & Maass, 2007; Karmarkar & Buonomano, 2007; Yamazaki & Tanaka, 2007; Nikolić, Häusler, Singer, & Maass, 2009) and powerful tools for carrying out a variety of spatiotemporal computations. This includes time series forecasting (Jaeger & Haas, 2004), signal generation (Jaeger, Lukoševičius, Popovici, & Siewert, 2007), pattern recognition (Verstraeten, Schrauwen, & Stroobandt, 2006), and information storage (Pascanu & Jaeger, 2011). RC also affords a framework for advancing and refining our understanding of neuronal plasticity and self-organization in recurrent neural networks (Lazar, Pipa, & Triesch, 2007, 2009; Toutounji & Pipa, 2014).

This article presents a biologically inspired neuronal plasticity rule to boost the computational power of a novel RC architecture that is called a single node delay-coupled reservoir (DCR). The DCR realizes the same RC concepts using a single nonlinear node with delayed feedback (Appeltant et al., 2011). This simplicity makes the DCR particularly appealing for physical implementations, which has already been demonstrated on electronic (Appeltant et al., 2011), optoelectronic (Larger et al., 2012; Paquot et al., 2012), and all-optical hardware (Brunner, Soriano, Mirasso, & Fischer, 2013). The optoelectronic and all-optical implementations use a semiconductor laser diode as the nonlinear node and an optical fiber as a delay line, allowing them to maintain high sampling rates. They are also shown to compare in performance to standard RC architectures in benchmark computational tasks.

The DCR operates as follows. Different nonlinear transformations and mixing of stimuli from the past and the present are achieved by sampling the DCR’s activity at virtual nodes (v-nodes), along the delay line. While neurons of a recurrent network are mixing stimuli via their synaptic coupling, which forms a network topology, the v-nodes of a DCR are mixing signals via their (nonlinear) temporal interdependence. Therefore, the v-nodes’ temporal distances from one another, henceforth termed v-delays, are made shorter than the characteristic timescale of the nonlinear node. Thus, v-nodes become analogous to the connections of a recurrent network, providing the DCR with a certain network-like topology. In analogy to the spatial distribution of input in a classical reservoir, stimuli in a DCR are temporally multiplexed (see Figure 1). To process information, the external stimuli are applied to the dynamical system, thereby perturbing the reservoir dynamics. Here, we operate the DCR in an asymptotically stable fixed point regime. To render the response of the DCR transient (i.e., reflecting nonlinear combinations of past and present inputs), the reservoir dynamics must not converge to the fixed point, where it becomes dominated by the current stimulus. To ensure this, a random piecewise constant masking sequence is applied to the stimulus before injecting the latter to the reservoir (Appeltant et al., 2011). The positions where this mask may switch value match the positions of the v-nodes, which are initially chosen equidistant. However, given the fact that the v-delays directly influence the interdependence of the corresponding v-nodes states, and therefore the nonlinear mixing of the stimuli, it is immediately evident that v-delays are important parameters that may significantly influence the performance of the DCR.

To optimize the computational properties of the DCR, we employ neuroscientific principles using biologically inspired homeostatic plasticity (Davis & Goodman, 1998; Zhang & Linden, 2003; Turrigiano & Nelson, 2004) for adjusting the v-delays. Biologically speaking, homeostatic plasticity does not refer to a single particular process. It is, rather, a generic term for a family of adaptation mechanisms that regulate different components of the neural machinery, bringing these components to a functionally desirable operating regime. The choice of the operating regime depends on the functionality that a model of homeostatic plasticity aims to achieve. This resulted in many flavors of homeostatic plasticity for regulating recurrent neural networks in computational neuroscience (Somers, Nelson, & Sur, 1995; Soto-Treviño, Thoroughman, Marder, & Abbott, 2001; Renart, Song, & Wang, 2003; Lazar et al., 2007, 2009; Marković and Gros, 2012; Remme & Wadman, 2012; Naudé, Cessac, Berry, & Delord, 2013; Zheng, Dimitrakakis, & Triesch, 2013; Toutounji & Pipa, 2014), neurorobotics (Williams & Noble, 2007; Vargas, Moioli, Von Zuben, & Husbands, 2009; Hoinville, Siles, & Hénaff, 2011; Dasgupta, Wörgötter, & Manoonpong, 2013; Toutounji & Pasemann, 2014), and reservoir computing (Schrauwen, Wardermann, Verstraeten, Steil, & Stroobandt, 2008; Dasgupta, Wörgötter, & Manoonpong, 2013). Here, we use a homeostatic plasticity mechanism to regulate the v-delays so as to balance responsiveness to the input and its history on the one hand, against optimal expansion of its informational features into the high-dimensional phase space of the system, on the other hand. Furthermore, we show that this process can be understood as a competition between the v-nodes’ sensitivity and their entropy, resulting in a functional specialization of the v-nodes. This leads to a high increase in the DCR’s memory capacity and a significant improvement in its ability to carry out nonlinear spatiotemporal computations. We discuss the implications of the plasticity mechanism with respect to the DCR’s entropy, as well as the virtual network topology, and the resulting regression coefficients.

## 2 Model

In this section, we describe the RC architecture that is based on a single nonlinear node with delayed feedback. We then formulate this architecture using concepts from neural networks.

### 2.1 Single Node Delay-Coupled Reservoir

Generally, RC comprises a set of models where a large dynamical system called a reservoir (e.g., a recurrent neural network) nonlinearly maps a set of varying stimuli to a high-dimensional space (Jaeger, 2001; Maass et al., 2002). The recurrency allows a damped trace of the stimuli to travel within the reservoir for a certain period of time. This phenomenon is termed *fading memory* (Boyd & Chua, 1985). Then, random nonlinear motifs within the reservoir nonlinearly mix past and present inputs, allowing a desired output to be linearly combined from the activity of the reservoir, using a linear regression operation. As the desired output is usually a particular transformation of the temporal and spatial aspects of the stimuli, the operations that a RC architecture is trained to carry out are termed *spatiotemporal computations*.

*n*nonlinear units. This spatial distribution of the input is a mapping . The dynamics are modeled by a difference equation for discrete time, or an ordinary differential equation (ODE) for continuous time, where is the network activity and the activity’s time derivative.

Solving system 2.3 for requires specifying an appropriate initial value function . As is already suggested by the initial conditions, the phase space of system 2.3 is a Banach space which is infinite dimensional (Guo & Wu, 2013). Using a DDE as a reservoir, this phase space thus provides a high-dimensional feature expansion for the input signal, which is usually achieved by using an RNN with more neurons than input channels.

To inject a signal into the reservoir, it is multiplexed in time. The DCR receives a single constant input in each reservoir time step , corresponding to one -cycle of the system. During each -cycle, the input is again linearly transformed by a mask that is piecewise constant for short periods , representing the temporal spacing, or v-delays, between sampling points of virtual nodes, or v-nodes, along the delay line. Accordingly, the v-delays satisfy , where *n* is the effective dimensionality of the DCR. Here, the mask *M* is chosen to be binary with random mask bits , so that the v-node *i* receives a weighted input . In order to ensure that the DCR possesses fading memory of the input, system 2.3 is set to operate in a regime governed by a single fixed point in case the input is constant. Thus, the masking procedure effectively prevents the driven dynamics of the underlying system from saturating to the fixed point.

A sample is read out at the end of each , yielding *n* predictor variables (v-nodes) per time step . Computations are performed on the predictors using a linear regression model for some scalar target time series *y*, given by where *x _{i}* with denote the DCR’s v-nodes (see equation 2.6) and are the coefficients determined by regression, for example, using the least squares solution minimizing the sum of squared errors .

### 2.2 The DCR as a Virtual Network

The goal is to optimize the computational properties of the DCR as a network given a vector of v-delays of its *n* v-nodes. In the case of equidistant v-delays, approximate v-node equations were already derived by Appeltant et al. (2011), who also conceptualized the DCR with equidistant v-delays as a network. We extend this result to account for arbitrary v-node spacings on which our plasticity rule can operate. To that end, we first need to define the activity of the DCR given for .

*i*as a function of :

Equation 2.6 suggests that the activity of v-node *i* is a weighted sum of the nonlinear component of the preceding v-nodes’ activity, down to the last v-node *n* in the cyclic network, the activity of which is carried over from the previous reservoir time step. The resulting directed network topology is shown as a virtual weight matrix for equidistant v-nodes () in Figure 3A.

## 3 Plasticity

An important role of the randomly alternating mask *M* is to prevent the DCR dynamics from saturating, and thus losing history dependence and sensitivity to input. However, the random choice of the mask values and the equal v-delays do not guarantee an optimal choice of masking. A simple example that already illustrates this point is given by the occurrence of sequences of equal valued mask bits, as shown in Figure 2A, which leads to unwanted saturation. In general, many more factors exist that determine optimal computation in the reservoir and need balancing.

Our goal in this section therefore is to develop a plasticity mechanism that optimizes the resulting v-delays with respect to sensitivity, while retaining a suitable nonlinear feature expansion into the DCR’s phase space. As we show in section 5.1, this results in a trade-off between sensitivity and entropy of the v-nodes. Entropy and sensitivity counteract each other, thus forcing v-nodes to specialize. In a first step set out in section 3.1, we develop a partial plasticity mechanism that maximizes solely the sensitivity of individual v-nodes. In a second step in section 3.2, the mechanism is augmented by a counteracting regulatory term that tries to retain diverse feature expansion of the input. The delay , together with the number *n* of v-nodes, the mask *M*, and the parameters and of the delayed nonlinearity, are given hyperparameters that are kept constant, and they determine the particular DCR that is subjected to the optimization process.

### 3.1 Sensitivity Maximization

*i*depends on the v-delays of all the preceding v-nodes , where is a term independent of . However, since the term decays exponentially the farther the v-node

*i*is from the v-node

*j*, one can ignore the contribution of to the sensitivity of the v-node

*i*for . This simplifies the element-wise gradient to

### 3.2 Homeostatic Plasticity

The optimization problem 3.5 maximizes the sensitivity of a v-node *i* by decreasing , its temporal distance from the previous v-node, as is suggested by the element-wise gradient, equation 3.8. As a result, the v-node becomes more sensitive to the input history delivered from its predecessor. This, however, leads to a loss of diversity in expanding informational features of the input, since the smaller the time allotted to a v-node is, the less excitable by input it becomes. In addition, the optimization objective prefers small , many of which may even go to 0, despite the constraint , which leads to a reduction of the reservoir’s effective dimensionality.

We hypothesize that good spatiotemporal computational performance is achieved when diversity and sensitivity are balanced. To this end, we introduce a regulatory term into the sensitivity measure that punishes small v-delays, thus counteracting sensitivity by enforcing an increase in a v-node’s distance from its predecessor. The choice of the regulatory term is motivated by favorable analytical properties (mentioned later in this section) and by allowing flexibility in the choice of regulation between diversity and sensitivity. As entropy is a natural measure of informational diversity, we later support the current intuitions behind our choice of the regulatory term by a rigorous mathematical argumentation. Namely, we show in section 5.1 how a plasticity mechanism that solely maximizes entropy of the v-nodes leads to an unbounded increase of v-delays and therefore presents a proper counterforce to sensitivity.

*i*for is ignorable, following the argumentation from equation 3.7, the element-wise gradient is simplified to

*i*then reads where the term homeostatically balances between the v-delay’s increase and decrease, depending on the choice of the regulating term .

*V*defined by (see appendix B for details of the constraint satisfaction). The global maximum belongs to

*V*only when , which leads to the convergence to equidistant v-nodes. Otherwise, the constrained gradient leads to the point on

*V*closest to the global maximum.

## 4 Computational Performance

We next test the effect of the homeostatic plasticity mechanism, equation 3.12, on the performance of the DCR. Simulations are carried on 100 DCRs; the activity of each is sampled at v-nodes that are initially equidistant with . Each DCR is completely distinguishable from the other by its binary mask *M*, and the 600 mask values are randomly chosen from the set . Simulation starts with a short initial period for stabilizing the dynamics, followed by a plasticity phase of time steps, each corresponding to one . The learning rate is set to and the regulating parameter to . Afterward, readouts are trained on samples for both the original and modified v-delays and validated on another samples. The model parameters of the Mackey-Glass nonlinearity (see equation 2.4) are set to and . The DCR is subject to uniformly distributed scalar input . At this positive input range, the DCR dynamics resulting from the Mackey-Glass nonlinearity is saturating, as illustrated in Figure 2A. This condition ensures that the approximation, equation 2.6, is accurate enough that a decrease in a v-delay does increase a v-node’s sensitivity.

### 4.1 Memory Capacity

The memory capacity of a reservoir is a measure of its ability to retain, in its activity, a trace of its input history. Optimal linear classifiers are trained for reconstructing the uniformly distributed scalar input at different time lags . Figure 4 compares the memory capacity of DCRs before and after plasticity. For time lags , where the ability to reconstruct the input history starts to diverge from optimal (see Figure 4A), the increase of the DCR’s memory capacity can reach up to 70%. The improvement is measured as the relative change in at each time lag due to plasticity. Only 1 of the DCRs showed an approximately deterioration in memory capacity after plasticity for the largest time lag (see the inset in Figure 4B).

### 4.2 Nonlinear Spatiotemporal Computations

Thus, NARMA-10 requires modeling quadratic nonlinearities, and it shows a strong history dependence that challenges the DCR’s memory capacity. Figure 5 compares the performance in of DCRs before and after plasticity for different time lags. Even with no time lag , the task still requires the DCR to retain fading memory. This is in order to account for the dependence on inputs and outputs time steps in the past. The plasticity mechanism achieves an approximately improvement in performance on average, surpassing state-of-the-art values in both classical (Verstraeten et al., 2006) and delay-coupled reservoirs (Appeltant et al., 2011) with an average of SD. Only in five trials did the performance deteriorate (see the inset in Figure 5B). The improvement decreases for larger time lags due to the deterioration in the DCR’s memory capacity observed in Figure 4 but remains significant for .

## 5 Discussion: Effects of Plasticity

In order to explain the observed results, we analyze and discuss the effects of the homeostatic plasticity mechanism, equation 3.12, on the system’s entropy , virtual network topology, and the readout weights distribution . We also discuss the role of the regulating parameter .

### 5.1 Entropy

*i*, we maximize the quantity where is the entropy of the v-node’s response, while is the entropy of the v-node’s response conditioned on the input. In other words, is the entropy of the response that does not result from the input. Bell and Sejnowski (1995) argued that maximizing equation 5.1 with respect to some parameter is equivalent to maximizing , since the conditional entropy does not depend on ; that is, maximizing a v-node’s input-information capacity is equivalent to maximizing its self-information capacity or entropy.

*x*is given by , where is the probability density function (PDF) of the v-node’s response. Since

_{i}*x*is an invertible function of the Mackey-Glass nonlinearity

_{i}*f*(see equation 3.1), which is itself an invertible function of the input

_{i}*u*(if the nonlinearity is chosen appropriately, such as in equation 2.4), the PDF of

*x*can be written as a function of the PDF of

_{i}*f*:

_{i}Given the above, the homeostatic plasticity mechanism, equation 3.12, for a particular DCR with delay , improves spatiotemporal computations by leading v-nodes to specialize in function. This is mediated by a competition between the v-nodes’ sensitivity and their entropy. Some v-nodes become more sensitive to small fluctuations in input history, while others are brought closer to saturation where their entropy is higher and, as such, their ability for expanding informational features.

### 5.2 Virtual Network Topology

The effects of the homeostatic plasticity mechanism, equation 3.12, on the DCR’s network topology can be deduced from equation 2.6, according to which self-weights are given by , and the weights the v-node *i* receives from the preceding v-node are .

When decreases, so does the v-node’s self-excitation *w _{ii}*, which is consistent with less saturation of the v-node’s activity. In addition, the choice of the regulating parameter describes the tendency of the v-node

*i*to converge toward a particular self-excitation level . This entails that for higher , the v-node’s target activity level increases, which also corresponds to higher entropy, as discussed in section 5.1.

The decrease in also leads the corresponding v-node’s afferent *w _{ij}* to increase. This in turn increases the v-node

*j*’s influence on the activity of the v-node

*i*, which results in a higher correlation between the two (or higher anti-correlation, depending on the signs of the corresponding mask values

*M*and

_{j}*M*). The increase of correlation is in agreement with simulation results and in accord with the decrease of the v-node’s entropy as its v-delay decreases. This is the case since the influence of the current input is overshadowed by information from the input history that is delivered from the preceding v-node

_{i}*j*, which now drives the v-node

*i*. Figure 3B shows an exemplary virtual weight matrix following plasticity, which illustrates these changes in network topology due to the repositioning of v-nodes on the delay line.

### 5.3 Homeostatic Regulation Level

Introducing the parameter is necessary for regulating the trade-off between sensitivity and entropy, that is, increasing and decreasing , as discussed analytically in section 5.1. It is also the defining factor in the v-node’s tendency to collapse, as is evident from the form of the plasticity function, equation 5.6. The collapse of v-nodes is tantamount to a reduction in the DCR’s dimensionality, which may be unfavorable with regard to the DCR’s computational performance.

We test the latter hypothesis and the choice of the regulating parameter by running NARMA-10 trials for different values that range between 0 and 2. Each trial shares the same mask *M* and the same NARMA-10 time series. As shown in Figure 6, the average improvement in performance in comparison to the reference equidistant case increases for smaller values but drops again for . An increase in also increases the improvement of performance, but this increase saturates at . This is the case since the increase in v-delays favored by high values makes the collapse of other v-delays inevitable in order to preserve the DCR’s constant delay .

In a more detailed analysis, for each of the trials, we ranked different values according to the resulting improvement of performance in reference to the equidistant case . We then calculated the percentage of trials that achieved the highest improvement in performance (first rank) for some value, compared to all other values. We carried out the same procedure for the second and third ranks as well. Figure 7 confirms the previous results, as it shows that for , it is still possible to achieve the best improvement in performance, but it is less likely than other values. Figure 7 also illustrates a striking result. For none of the trials was the equidistant case, where no plasticity took place, the best choice regarding the computational power of the DCR. Only in of the trials did the nonplastic equidistant case rank third. As a result, for a given DCR setup, there always exists a choice of that results in nonequidistant v-nodes where spatiotemporal computations are enhanced. This is also summarized in Figure 7B, which shows the improvement in performance given the best choice in the regulating parameter for each trial. The nrmse is reduced by approximately , with an average performance that reaches an unprecedented value of nrmse approximately equal to .

We point out that the homeostatic plasticity mechanism, equation 3.12, also reduces the average absolute values of the readout coefficients (see Figure 6B), which is similar in effect to an *L*_{2}-regularized model fit. This is not only advantageous with respect to numerical stability, but *L*_{2}-regularization also allows for a lower mean-square error on the validation set, as compared to an unregularized fit (Hoerl & Kennard, 1970).

We now briefly discuss the effects of the homeostatic regulation level on the virtual network topology. As expected, and due to the simplex constraint, both smaller and larger values of lead to a more uniform distribution of v-delays. However, most of the distribution’s mass remains concentrated at , that is, most v-delays remain unchanged or change only slightly. This has no effect on the qualitative features of the virtual network topology as outlined in section 5.2, but quantitatively, more weights approach the extremes of the range .

## 6 Commentary on Physical Realizability

We demonstrated that the suggested plasticity mechanism, equation 3.12, leads to spatiotemporal computational performances that surpass those of state-of-the-art results. An intuitive alternative to the plasticity mechanism would be to increase the number *n* of v-nodes within the constant full delay of the DCR. This solution, however, suffers from major drawbacks, particularly in regard to its realizability on physical hardware. Namely, there exists a physical constraint on the sampling rate of the DCR’s activity, below which the speed and the feasibility of physical implementation is jeopardized. This imposes a minimal admissible v-delay within the full delay line and thus represents an upper bound on the number of equidistant v-nodes. This constraint is accounted for in the current approach by restricting the updates of v-delays due to plasticity to discrete step sizes . The parameter then corresponds to the minimal admissible v-delay (different from 0, which results in pruning the DCR). This is the case since is chosen such that is an integer, where this integer refers to the number of minimal v-delays that fit in one , the v-delay in the equidistant v-nodes case. In the current results, was chosen such that in order for the discretization to present a good approximation of continuous v-delay values. Nevertheless, simulations show that improved computational power persists even for , which corresponds to , that is, an order of magnitude larger than the minimal experimentally viable v-delay. A stringent comparison between the results for different values of is problematic, since rougher quantization of v-delays, resulting from higher values, leads to less predictable effects on the behavior of the optimization problem, equation 3.10, particularly of how the discretized v-delay grid relates to the global maximum, which itself depends on the choice of the regulating term . Nevertheless, the persistent improvement in performance stands in favor of the method’s applicability in physical realizations.

Furthermore, increasing the number of v-nodes poses a practical limitation, even when v-delays remain within the constraints of physical implementation. As expected, the average computational performance does increase for larger numbers of v-nodes, but it saturates at some point. The plasticity mechanism improves the computational performance and, most important, reaches the saturation point of performance with a smaller number of v-nodes than the equidistant case. Beyond the performance saturation point, plasticity becomes ineffective on average, that is, it leads some trials to an increase and others to deterioration in computational performance. However, the redundancy resulting from increasing the number of v-nodes, even within the constraint of physical implementation, is disadvantageous with regard to the computational resources of the DCR: the linear readout mechanism remains a bottleneck, since increasing the number of regressors by sampling more v-nodes demands storing and inverting larger matrices, a serious challenge for both simulation and physical implementations. Again, the comparison between the results for different numbers of v-nodes is problematic, since changing the number of v-nodes modifies the statistics of the mask pattern, which may affect the proper choice of the regulating term . With these considerations in mind, the plasticity mechanism is suitable for physical realization, since it saves resources by keeping the number of v-nodes smaller (and possibly pruning by leading some v-delays to collapse) and is computationally beneficial within the constraint of physical implementation, since it approaches the saturation point of computational performance using a smaller number of virtual nodes. Nevertheless, further detailed investigation remains necessary for addressing boundary conditions and applicability on physical implementation of the suggested plasticity mechanism.

## 7 Conclusion

We have introduced a plasticity mechanism for improving the computational capabilities of a DCR, a novel RC architecture where a single nonlinear node is delay-coupled to itself. The homeostatic nature of the derived plasticity mechanism, equation 3.12, relates directly to the information processing properties of the DCR in that it balances between sensitivity and informational expansion of input (see section 5.1). While the role of homeostasis in information processing and computation has only been discussed more recently, its function as a stabilization process of neural dynamics has acquired earlier attention (von der Malsburg, 1973; Bienenstock, Cooper, & Munro, 1982). From the perspective of the nervous system, pure Hebbian potentiation or anti-Hebbian depression would lead to destabilization of synaptic efficacies by generating amplifying feedback loops (Miller, 1996; Song, Miller, & Abbott, 2000), necessitating a homeostatic mechanism for stabilization (Davis & Goodman, 1998; Zhang & Linden, 2003; Turrigiano & Nelson, 2004). Similarly, as suggested by the effects of the plasticity mechanism (see equation 5.6) on the virtual network topology (see section 5.2), the facilitating sensitivity term is counteracted by the depressive entropy term , which prevents synaptic efficacies from overpotentiating or collapsing.

In addition, rewriting equation 3.12 as strongly relates the derived plasticity mechanism to normalization models of neural homeostatic plasticity. Normalization models consider plasticity rules that regulate the activity of the neuron toward a target firing rate. They are usually of the form , where *q* is some quantity of relevance for learning, such as synaptic weights or the neuron’s intrinsic excitability; *r* is an estimate of the neuron’s output firing rate; and is the target firing rate (Kempter, Gerstner, & Van Hemmen, 2001; Renart et al., 2003; Vogels, Sprekeler, Zenke, Clopath, & Gerstner, 2011; Lazar et al., 2007, 2009; Zheng et al., 2013; Toutounji & Pipa, 2014). In analogy, the v-delay estimates the v-node’s activity, since a larger results in higher self-excitation *w _{ii}*, while defines the target activity of the v-node (see section 5.2). Furthermore, entropy of a neuron’s output increases with its firing rate. As such, the increase of the v-delay , in response to the higher regulatory term , also increases the v-node’s entropy, as confirmed analytically in section 5.1.

Currently, and similar to the target firing rate , which is usually chosen according to biological constraints, the regulating parameter is left as a free parameter, and its optimal choice for a particular DCR configuration is decided by brute force (see section 5.3). However, the statistics in Figures 6 and 7 conclusively show that any choice of within the tested range leads to average and dominant improvement in computational performance in comparison to the equidistant case . Nevertheless, it is reasonable to assume that heuristics exist for the optimal choice of , given a particular mask structure *M*, since the alterations in the mask values influence a v-node’s sensitivity and entropy. A possible heuristic may relate the value of to properties of maximum length sequences, by which Appeltant, Van der Sande, Danckaert, & Fischer (2014) constructed mask sequences with equidistant v-nodes. Similarly, we speculate that the direction and amplitude of a v-delay’s change, which are computationally advantageous, depend on the corresponding and preceding v-node’s mask values . The main difficulty arises from the fact that within the current formulation of the DCR in equations 2.5 and 2.6, no terms exist for relating different mask values to one another and to corresponding v-delays. This is also the main obstacle facing the derivation of plasticity mechanisms for updating the mask *M* beyond the binary pattern . The appropriate choice of is complicated further by its dependence on the demands of the executed task in terms of memory, nonlinear computations, and entropy. Finding criteria that connect these aspects to the optimal choice of requires extensive research, which is a subject of our current endeavors.

Enhancing the temporal multiplexing of input to the nonlinear node was the main goal of this article. We speculate that similar multiplexing may suggest a further important functionality of the extensive dendritic trees in some neuron types. On the one hand, Izhikevich (2006) discussed the infinite dimensionality that dendritic propagation delays offer to recurrent neural networks. On the other hand, several studies investigated the computational role of the spatial distribution of active dendrites (Rumsey & Abbott, 2006; Gollo, Kinouchi, & Copelli, 2009; Graupner & Brunel, 2012). In this article, we advocate a unified computational account that may integrate both the temporal and spatial aspects of dendritic computations. In particular, the spatial location of dendritic arbors may be optimized to achieve computationally favorable temporal multiplexing of the soma’s input, in the fashion suggested by the DCR architecture. Consolidating this speculation will be the subject of future studies.

## Appendix A: Solving and Simulating the DCR

*I*to the inhomogeneous equation is then given by where denotes a solution to the associated homogeneous differential equation. Consequently, for and , the solution to subject to , is given by

This expression can be used right away in a numerical solution scheme, where the integral is solved using the cumulative trapezoidal rule. The resulting simulation of the DCR has been shown to be comparable in its accuracy and computational capabilities to adaptive numerical solutions, while saving considerable computation time (Schumacher, Toutounji, & Pipa, 2013).

Recall that , with being the temporal distances between consecutive virtual nodes. To arrive at a manageable analytical expression of the above solution for the sampling point *t _{i}* of virtual node

*i*during the th -cycle, we make the following approximation.

## Appendix B: Constraint Satisfaction

*V*. Due to the simple linear structure of

*V*, this strategy will converge onto the constrained optimum for .

*V*. The latter is computed from an orthogonal basis, which can be constructed by simple geometrical considerations from the simplex corner point vectors as where denotes the -dimensional unit matrix. It is easily verified that this basis spans

*V*and is indeed orthogonal. In conjunction with the inhomogeneity

*n*, a normal vector with respect to

_{V}*V*, any point on

*V*can be expressed via the

*v*. For some being the result of an unconstrained sensitivity update step, the constraint can be met by projecting

_{i}*x*orthogonally onto

*V*via the mapping

The addition and subtraction of *n _{V}* take care of the fact that

*V*, being a hyperplane, is translated out of the origin by the inhomogeneity

*n*. If the

_{V}*V*-plane was centered in the origin, would denote the orthogonal projection of

*x*onto the

*i*th orthonormal basis vector. Accordingly, the linear combination of these projections yields the representation of with respect to the basis .

## Acknowledgments

The contributions of Marcel Nonnenmacher and Anna-Birga Ostendorf to early stages of this work are gratefully acknowledged, as are the fruitful discussions with the members of the PHOCUS consortium. We acknowledge the financial support of the State of Lower Saxony, Germany, via the University of Osnabrück, and the European project PHOCUS in the Framework Information and Communication Technologie (FP7-ICT-2009-C/proposal 240763).

## References

*Vol.*

*Artificial Neural Networks and Machine Learning–ICANN 2013*