Free-running recurrent neural networks (RNNs), especially probabilistic models, generate an ongoing information flux that can be quantified with the mutual information I[x(t),x(t+1)] between subsequent system states x. Although previous studies have shown that I depends on the statistics of the network’s connection weights, it is unclear how to maximize I systematically and how to quantify the flux in large systems where computing the mutual information becomes intractable. Here, we address these questions using Boltzmann machines as model systems. We find that in networks with moderately strong connections, the mutual information I is approximately a monotonic transformation of the root-mean-square averaged Pearson correlations between neuron pairs, a quantity that can be efficiently computed even in large systems. Furthermore, evolutionary maximization of I[x(t),x(t+1)] reveals a general design principle for the weight matrices enabling the systematic construction of systems with a high spontaneous information flux. Finally, we simultaneously maximize information flux and the mean period length of cyclic attractors in the state-space of these dynamical networks. Our results are potentially useful for the construction of RNNs that serve as short-time memories or pattern generators.

Artificial neural networks form the central part in many current machine learning methods, and in particular, deep learning systems (LeCun et al., 2015) have found numerous industrial and scientific applications over the past decades (Alzubaidi et al., 2021). The neural networks in machine learning systems are typically structured as stacks of neural layers, and the information is usually passing unidirectionally from the input to the output layer.

By contrast, recurrent neural networks (RNNs) have feedback loops among their neuronal connections so that information can continuously circulate within the system (Maheswaranathan et al., 2019). RNNs are therefore autonomous dynamical systems in which the neurons show ongoing dynamical activity even without external input; moreover, they can be considered as universal approximators (Schäfer & Zimmermann, 2006). These and other intriguing properties have stimulated a recent boost in the research field of artificial RNNs, producing both new developments and interesting unsolved problems: Due to their recurrent connectivity, RNNs are ideally suited to process time series data (Jaeger, 2001), and to store sequential input over time (Büsing et al., 2010; Dambre et al., 2012; Gonon & Ortega, 2021; Schuecker et al., 2018; Wallace et al., 2013). For instance, it has been shown that RNNs learn robust representations by dynamically balancing compression and expansion (Farrell et al., 2022). In particular, a dynamical regime called the edge of chaos at the transition from periodic to chaotic behavior (Kadmon & Sompolinsky, 2015), has been extensively studied and demonstrated to be important for computation (Bertschinger & Natschläger, 2004; Boedecker et al., 2012; Kaneko & Suzuki, 1994; Langton, 1990; Legenstein & Maass, 2007; Natschläger et al., 2005; Schrauwen et al., 2009; Solé & Miramontes, 1995; Toyoizumi & Abbott, 2011; Wang et al., 2011), and short-term memory (Haruna & Nakajima, 2019; Ichikawa & Kaneko, 2021). Furthermore, numerous studies address the issue of how to control the dynamics of RNNs (Haviv et al., 2019; Jaeger, 2014; Rajan et al., 2010), in particular with external or internal noise (Bönsel et al., 2021; Ikemoto et al., 2018; Krauss, Prebeck et al., 2019; Metzner & Krauss, 2022; Molgedey et al., 1992). Finally, RNNs have been proposed to be a versatile tool in neuroscience research (Barak, 2017). In particular, very sparse RNNs, as they occur in the human brain (Song et al., 2005), have some remarkable properties (Folli et al., 2018; Gerum et al., 2020; Narang et al., 2017), such as superior information storage capacities (Brunel, 2016).

In previous studies, we systematically analyzed the structural and dynamical properties of very small RNNs, that is, three-neuron motifs (Krauss, Zankl et al., 2019), as well as large RNNs (Krauss, Schuster et al., 2019). Furthermore, we investigated resonance phenomena in RNNs. For instance, we discovered recurrence resonance (Krauss, Prebeck et al., 2019), where a suitable amount of added external noise maximizes information flux in the network. In addition, we investigated coherent oscillations (Bönsel et al., 2021) and import resonance (Metzner & Krauss, 2021, 2022), where noise maximizes information uptake in RNNs.

Here, we focus on the Boltzmann machine as a simple model of probabilistic RNNs with spiking neurons, in the sense that each neuron is either off or on in any given time step t. The momentary global state of such a network, assuming N neurons, can then be described by a vector x(t)=x1(t),x2(t),...,xN(t), where each component xn(t)0,1 is a binary number.

In order to quantify the ongoing information flux in such a system, an important quantity is the mutual information (MI) between subsequent global system states, here denoted by Ix(t),x(t+1). It can take the minimal value of zero if the neurons are mutually uncoupled and produce statistically independent and temporally uncorrelated random walks with p(0)=p(1). By contrast, the MI takes on the maximal possible value of N bits, where N is the number of binary neurons, if the next global system state can be perfectly predicted from the present state (deterministic behavior) and if, moreover, all possible states occur equally often (entropy of states H=N). An example of the latter extreme case would be a fully deterministic network that periodically counts through all 2N possible binary system states in a fixed order—in other words, a 2N-cycle.

In an actual RNN with N neurons, the MI will have some intermediate value between 0 and N bits, and much can be learned by studying how this state-to-state memory depends on various system parameters. For example, it has been investigated (Metzner & Krauss, 2022) in a simple deterministic RNN model how the information flux depends on the statistical properties of the weight matrix elements wij, which describe the coupling strength from the output of neuron j to the input of neuron i. It was found that in the two-dimensional parameter space spanned by the density d of nonzero connections and the balance b between excitatory and inhibitory connections, RNNs reproducibly show distinct dynamical phases, such as periodic, chaotic, and fix-point attractors. All these dynamical phases are linked to characteristic changes of the information flux and thus can be detected using the MI between subsequent states.

In order to compute this MI numerically, one first has to estimate the joint probability distribution p(xt,xt+1) for a pair of subsequent states. Unfortunately, the number of possible state pairs in an N-neuron system is 2N×2N, and this exponential growth of state space prevents the computation of the MI for systems much larger than N10. One goal of our study presented here is therefore to test if the full MI can be approximately replaced by numerically more efficient measures of state-to-state memory, in particular measures based on pair-wise neuron-to-neuron correlations. For the future investigation of RNN phase diagrams, we are not primarily interested in the numerical value of the measure, but mainly in whether it rises or falls as a function of certain system control parameters. For this reason, we test in the following to what extent the alternative measures are monotonic transformations of the full MI.

Furthermore, we use evolutionary optimization to find out which kinds of RNN weight matrices lead to a maximum state-to-state memory, that is, a maximum spontaneous flux of information. Related to this question, we also test if those fundamentally probabilistic networks can actually produce quasi-deterministic n-cycles with a period length that is comparable to the total number of states 2N.

2.1  Symmetrized Boltzmann Machine (SBM)

In the following, we consider a Boltzmann machine (BM) with N probabilistic logistic neurons that are fully connected to each other. In order to symmetrize the system, we set all biases to zero and convert the binary neuron states x0,1 to zero-mean states y-1,+1 before each network update.

Let neuron j at time step t be in the binary state xj(t). The zero-mean state is computed as
yj(t)=2·xj(t)-1.
(2.1)
The weighted sum of all input signals to neuron i is given by
zi(t)=j=1Nwijyj(t),
(2.2)
where wij is the real-valued positive or negative connection strength (weight) between the output of neuron j and the input of neuron i. The N×N-matrix W of all connection strengths is called the weight matrix.
The on-probability pi(t+1)=prob(xi(t+1)=1) that neuron i will be in the binary 1-state in the next time step is given by a logistic function of the weighted sum:
pi(t+1)=11+exp(-zi(t)).
(2.3)
All N neurons update simultaneously to their new binary state, and the global system state at time step t0,1,2,... is denoted as the vector:
x(t)=x1(t),x2(t),...,xN(t).
(2.4)

2.2  Mutual Information and Pearson Correlations

Assuming two signal sources U and V that emit discrete states with the joint probability P(u,v), the mutual information (MI) between the two sources is defined as
IU,V=uvP(u,v)logP(u,v)P(u)·P(v).
(2.5)

This formula is applied in two different ways to compute the MI between successive states of an SBM. One way is to directly compute the MI between the vectorial global states, so that ux(t) and vx(t+1). The resulting quantity is denoted as Ix(t),x(t+1), and in an SBM with N neurons, it can range between 0 and N bits.

In the other application of equation 2.5, we focus on a particular pair of neurons m,n and first compute the MI between the successive binary output values of these two neurons, so that uxm(t) and vxn(t+1). The resulting quantity is denoted as Ixm(t),xn(t+1), and it can range between 0 and 1 bit.

After repeating this computation for all N2 pairs of neurons, we aggregate the results to a single number by calculating the root-mean-square average RMSIxm(t),xn(t+1)mn, where
RMSAmnmn=1MNm=1Mn=1NAmn2.
(2.6)
A simpler way to quantify the state-to-state memory in SBMs is by using linear correlation coefficients. Here too we focus on particular pairs of neurons m,n and compute their normalized Pearson correlation coefficient,
Cxm(t),xn(t+1)=xm(t)-μm·xn(t+1)-μntσmσn,
(2.7)
where μi and σi denote the mean and the standard deviation of the neuron states xi(t). The symbol a(t)t denotes the temporal average of a time series a(t). In the cases σm=0 or σn=0, the correlation coefficient is set to zero.

The N2 resulting pairwise correlation coefficients are again aggregated to a single number using the RMS average as defined in formula 2.6.

Summing up, our three quantitative measures of state-to-state memory are the full mutual information Ix(t),x(t+1), the average pair-wise mutual information RMSIxm(t),xn(t+1)mn, and the average pair-wise Pearson correlations RMSCxm(t),xn(t+1)mn.

2.3  Numerical Calculation of the MI

In a numerical calculation of the full mutual information Ix(t),x(t+1), first a sufficiently long time series x(t) of SBM states is generated. From this simulated time series, the joint probability P(x(t),x(t+1)) is estimated by counting how often the different combinations of subsequent states occur. Next, from the joint probability, the marginal probabilities P(x(t)) and P(x(t+1)) are computed. Finally, the MI can be calculated using formula 2.5.

Unfortunately, even in systems with a moderate number of neurons, the time series of SBM states needs to be extremely long to obtain sufficient statistical accuracy. Too short state histories result in large errors of the resulting mutual information.

It turns out that some MI-optimized networks considered in this article spend an extremely long time within specific periodic state sequences and only occasionally jump to another periodic cycle. Even though the system visits all states equally often in the long run, this ergodic behavior shows up only after a huge number of time steps. It is not practical to compute the MI for those systems numerically.

For this reason, we have applied a semianalytical method for most of our simulations. This method computes the 2N×2N joint probability matrix directly from the weight matrix, without any simulation of state sequences, so that the accuracy problem is resolved. However, as N is increased, the joint probability matrix eventually becomes too large to be hold in memory.

2.4  Semianalytical Calculation of the MI

In a semianalytical calculation of the full mutual information Ix(t),x(t+1), we first compute the conditional state transition probabilities P(x(t+1)x(t)). This can be done analytically, because for each given initial state x(t), formula 2.3 gives the on-probabilities of each neuron in the next time step. Using these on-probabilities, it is straightforward to compute the probability of each possible successive state x(t+1).

The conditional state transition probabilities P(x(t+1)x(t)) define the transition matrix M of a discrete Markov process. It can therefore be used to numerically compute the stationary probabilities P(x)fin of the 2N global system states, which can also be written as a probability vector pfin. For this purpose, we start with a uniform distribution P(x)ini=1/2N and then iteratively multiply this probability vector with the Markov transition matrix M, until the change of P(x) becomes negligible, that is, until pfinMpfin.

Once we have the stationary state probabilities P(x)fin, we can compute the joint probability of successive states as P(x(t+1),x(t))=P(x(t+1)x(t))P(x)fin. After this, the MI is computed using formula 2.5.

2.5  Weight Matrix with Limited Connection Strength

In order to generate an N×N weight matrix where the modulus of the individual connection strength is restricted to the range |wmn|<Wmax, we first draw the matrix elements independently from a uniform distribution in the range 0,Wmax. Then each of the N2 matrix elements is flipped in sign with a probability of 1/2.

2.6  Comparing the Signum-of-Change (SOC)

Two real-valued functions f(x) and g(x) of a real-valued parameter x are called monotonic transformations of each other if the sign of their derivative is the same for all x in their domain:
sgndfdx(x)=sgndgdx(x)x.
(2.8)
In a plot of f(x) and g(x), the two functions will then always rise and fall simultaneously, albeit to a different degree. Accordingly, possible local maxima and minima will occur at the same x-positions, so that both f(x) and g(x) can be used equally well as objective functions for optimizing x.
In this work, we consider cases where two functions are only approximately monotonic transformations of each other, and our goal is to quantify the degree of this monotonic relation. For this purpose, we numerically evaluate the functions for an arbitrary sequence of M arguments x1,x2,...,xM in the domain of interest, yielding two time series f1,f2,... and g1,g2,.... We then compute the signum of the changes between subsequent function values,
Sn(f)=sgnfn-fn-1andSn(g)=sgngn-gn-1,
(2.9)
which yields two discrete series Sn(f,g)-1,0,+1 of length M-1. Next, we count the number Nc of corresponding entries, that is, the number of cases where Sn(f)=Sn(g). We finally compute the ratio rSOC=Nc/(M-1), which can range between rSOC=0, indicating anticorrelated behavior where minima of f(x) correspond to maxima of g(x), and rSOC=1, indicating that the two functions are perfect monotonic transformations of each other. A value rSOC=0.5 indicates that the two functions are not at all monotonically related. In the text, we refer to rSOC simply as the “SOC measure.”

2.7  Evolutionary Optimization of Weight Matrices

In order to maximize some objective function f(W) that characterizes the fitness of a weight matrix, we start with a matrix W0 in which all elements are zero (The neurons of the corresponding SBM will then produce independent and nonpersistent random walks so that the MI between successive states is zero.) The fitness f0=f(W0) of this starting matrix is computed:

  1. The algorithm is now initialized with W:=W0 and f:=f0.

  2. We then generate a mutation of the present weight matrix M by adding independent random numbers Δwmn to the N2 matrix elements. In our case, we draw these random fluctuations Δwmn from a normal distribution with zero mean and a standard deviation of 0.1. The fitness fmut=f(W+ΔW) of the mutant is computed.

  3. If fmut>f, we set W:=W+ΔW and f:=fmut. Otherwise the last matrix is retained. The algorithm then loops back to step 1.

We iterate the evolutionary loop until the fitness no longer increases significantly.

2.8  Visualization with Multidimensional Scaling (MDS)

A frequently used method to generate low-dimensional embeddings of high-dimensional data is t-distributed stochastic neighbor embedding (t-SNE; Van der Maaten & Hinton, 2008). However, in t-SNE, the resulting low-dimensional projections can be highly dependent on the detailed parameter settings (Wattenberg et al., 2016), sensitive to noise, and may not preserve, but rather often scramble, the global structure in data (Moon et al., 2019; Vallejos, 2019). In contrast to that, multidimensional-scaling (MDS; Cox & Cox, 2008; Kruskal, 1964, 1978; Torgerson, 1952) is an efficient embedding technique to visualize high-dimensional point clouds by projecting them onto a two-dimensional plane. Furthermore, MDS has the decisive advantage that it is essentially parameter free and all mutual distances of the points are preserved, thereby conserving both the global and local structures of the underlying data.

When interpreting patterns as points in high-dimensional space and dissimilarities between patterns as distances between corresponding points, MDS is an elegant method to visualize high-dimensional data. By color-coding each projected data point of a data set according to its label, the representation of the data can be visualized as a set of point clusters. For instance, MDS has already been applied to visualize, for instance, word class distributions of different linguistic corpora (Schilling, Tomasello et al., 2021), hidden layer representations (embeddings) of artificial neural networks (Krauss et al., 2021; Schilling, Maier et al., 2021), structure and dynamics of recurrent neural networks (Krauss, Prebeck et al., 2019; Krauss, Schuster et al., 2019; Krauss, Zankl et al., 2019), or brain activity patterns assessed during, for example, pure tone or speech perception (Krauss, Metzner et al., 2018; Schilling, Tomasello et al., 2021), or even during sleep (Krauss, Schilling et al., 2018; Metzner et al., 2023; Traxdorf et al., 2019). In all these cases, the apparent compactness and mutual overlap of the point clusters permit a qualitative assessment of how well the different classes separate.

In this article, we use MDS to visualize sets of high-dimensional weight matrices. In particular, we apply the metric MDS (from the Python package sklearn.manifold), based on Euclidean distances, using the standard settings (n_init=4, max_iter=300, eps=1e-3).

2.9  Finding the Cyclic Attractors of an SBM

Given a weight matrix W, we first calculate analytically the conditional state transition probabilities P(x(t+1)x(t)). We then find for each possible initial state x(t) the subsequent state x(t+1)max with the maximal transition probability. We thus obtain a map SUCCx(t) that yields the most probable successor for each given state.

This map now describes a finite, deterministic, discrete dynamical system. The dynamics of such a system can be described by a “state flux graph,” which has 2N nodes corresponding to the possible global states and links between those nodes that indicate the deterministic state-to-state transitions. Nodes can have self-links (corresponding to 1-cycles = fixed points), but each node can have only one outgoing link in a deterministic system. All states (nodes) are either part of a periodic n-cycle, or they are transient states that lead into an n-cycle. Of course, the maximum possible period length is nmax=2N.

In order to find the n-cycles from the successor map, we start with the first system state and follow its path through the state flux graph, until we arrive at a state that was already in that path (indicating that a complete cycle was run through). We then cut out from this path only the states that are part of the cycle, discarding possible transient states. The cycle is stored in a list.

The same procedure is repeated for all initial system states, and all the resulting cycles are stored in the list. We finally remove from the list all extra copies of cycles that have been stored multiple times. This leads to the complete set of all n-cycles, which can then be further evaluated. In particular, we compute the individual period lengths n for each cycle in the set, and from this, the mean cycle length MCL of the weight matrix W.

We note that, strictly speaking, the cycle lengths and their mean value MCL are well defined only in a deterministic system where each state has only one definite successor. However, in a quasi-deterministic system (with sufficiently large matrix elements), each state has one strongly preferred successor. As a result, the number of time steps the system spends in each of its n-cycles is finite, but it can be much longer than the cycle’s period length n, corresponding to several “round trips.” In this sense, cycle lengths are meaningful also in quasi-deterministic systems.

3.1  State-to-State Memory in a Single Neuron System

Mutual information (MI), in the following denoted by the mathematical symbol I, is a general way to quantify the relatedness of two random variables x and y. As illustrated in a well-known diagram (see Figure 1a), it is linked to the conditional and unconditional entropies by
H(x)-H(x|y)=I(x,y)=H(y)-H(y|x).
(3.1)
Because the MI depends on only the joint probability P(x,y) of the various value combinations, it can capture arbitrary nonlinear dependencies between the two variables. Indeed, the MI would remain invariant even if x and y were replaced by two arbitrary injective functions, f(x) and g(y).
Figure 1.

(a) Information-theoretic quantities related to the input x and output y of an information processing system. In our case, x corresponds to an initial RNN state at time t, and y to the successor state at time t+1. The quantities H(x) and H(y) describe the entropy of RNN states, I(x,y) the mutual information between successive states, H(x|y) a lossy compression to a smaller number of states, and H(y|x) a random expansion to a larger number of states. (b) Mutual information and correlation in a single-neuron SBM. A symmetrized Boltzmann machine with only one neuron is defined by the self-interaction weight W11 (sketch at the top). The binary output series of the system is antipersistent for W11<0, nonpersistent for W11=0, and persistent for W11>0 (printed example sequences on gray background). The mutual information between subsequent system states Ix(t),x(t+1) (orange curve) is close to its maximal possible value of one for W11=-5. As the weight is increased, the MI falls and reaches the minimal possible value of zero for W11=0. It then raises again toward the maximum in a symmetric way. By contrast, the Pearson correlation coefficient between subsequent system states Cx(t),x(t+1) (dashed black curve) is monotonically increasing from about -1, over zero, to about +1. However, the absolute value of the correlation Cx(t),x(t+1) (blue curve) resembles the mutual information in that it shares the asymptotic limits and the minimum. Note that the absolute value is equivalent to the RMS average in this one-neuron system.

Figure 1.

(a) Information-theoretic quantities related to the input x and output y of an information processing system. In our case, x corresponds to an initial RNN state at time t, and y to the successor state at time t+1. The quantities H(x) and H(y) describe the entropy of RNN states, I(x,y) the mutual information between successive states, H(x|y) a lossy compression to a smaller number of states, and H(y|x) a random expansion to a larger number of states. (b) Mutual information and correlation in a single-neuron SBM. A symmetrized Boltzmann machine with only one neuron is defined by the self-interaction weight W11 (sketch at the top). The binary output series of the system is antipersistent for W11<0, nonpersistent for W11=0, and persistent for W11>0 (printed example sequences on gray background). The mutual information between subsequent system states Ix(t),x(t+1) (orange curve) is close to its maximal possible value of one for W11=-5. As the weight is increased, the MI falls and reaches the minimal possible value of zero for W11=0. It then raises again toward the maximum in a symmetric way. By contrast, the Pearson correlation coefficient between subsequent system states Cx(t),x(t+1) (dashed black curve) is monotonically increasing from about -1, over zero, to about +1. However, the absolute value of the correlation Cx(t),x(t+1) (blue curve) resembles the mutual information in that it shares the asymptotic limits and the minimum. Note that the absolute value is equivalent to the RMS average in this one-neuron system.

Close modal

By contrast, correlation coefficients are essentially averages of the product x·y of the momentary variable values, and therefore they change when the values x and y are replaced by injective functions of themselves (such as x3 and y3). It is also well known that correlation coefficients can capture only linear relations.

Nevertheless, there are certain special cases where the MI and suitably modified correlation coefficients show a quite similar behavior. As a simple example, we consider an SBM that consists only of a single neuron with self-connection strength w11 (see Figure 1b, sketch on top).

For w11=0, the weighted sum of the neuron’s input is zero and the on-probability is 1/2. The neuron therefore produces a sequence of statistically independent binary states x0,1 with balanced probabilities prob(x=0)=prob(x=1)=1/2. This stochastic Markov process corresponds to an unbiased and uncorrelated random walk.

For w110, the on- and off-probabilities are still balanced, but now subsequent neuron states are statistically dependent. In particular, a positive self-connection generates a persistent random walk in which the same binary states tend to follow each other (such as 00011100111000011). Analogously, a negative self-connection generates an antipersistent random walk in which the system tends to flip between the two binary states (such as 10101001011010101). This stochastic Markov process corresponds to an unbiased but correlated random walk.

The degree of state-to-state persistence κ in the single-neuron SBM can be quantified by the conditional probability
κ=prob(x(t+1)=1x(t)=1)=prob(x(t+1)=0x(t)=0),
(3.2)
and it is obviously determined by the self-connection strength via
κ=1/1+exp(-w11).
(3.3)
Here, κ>1/2 (due to w11>0) indicates persistence and κ<1/2 (due to w11<0) antipersistence.

Summing up, the one-neuron system produces a random walk of binary (0,1) neuron states that changes from temporally antipersistent to persistent behavior as w11 is tuned from negative to positive values (see the times series at the top of Figure 1).

As both antipersistence and persistence are predictable behaviors, the MI between subsequent states, denoted by Ix(t),x(t+1) approaches the maximum possible value of 1 bit in the limit of both strongly negative and strongly positive connection weights w11 (see Figure 1, orange curve). It reaches the minimum possible value of 0 bit exactly for w11=0.

We next compute, again as a function of the control parameter w11, the Pearson correlation coefficient between subsequent states, denoted by Cx(t),x(t+1), and defined in section 2. For strongly negative w11, it approaches its lowest possible value of -1, passes through zero for w11=0, and finally approaches its highest possible value of +1 for strongly positive w11 (see Figure 1, blue curve).

It is possible to make the two functions more similar by taking the modulus of the Pearson correlation coefficient, ABSCx(t),x(t+1), which in this case is equivalent to the root-mean-square average RMSCx(t),x(t+1). This RMS-averaged correlation indeed shares with the MI the minimum and the two asymptotic maxima (see Figure 1, black dashed curve).

Based on this simple example and several past observations (Krauss et al., 2017; Krauss, Schuster et al., 2019; Metzner & Krauss, 2021, 2022), there is legitimate hope that the computationally expensive mutual information I can be replaced by the numerically efficient RMSC, at least in a certain subclass of systems. It is clear that when plotted as a function of some control parameter, the two functions will not have exactly the same shape, but they might at least be monotonic transformations of each other, so that local minima and maxima will appear at the same position on the parameter axis. If such a monotonic relation exists, RMSC could be used, in particular, as an I-equivalent objective function for the optimization of state-to-state memory in RNNs.

3.2  Comparing Global MI with Pairwise Measures

In multineuron networks, the momentary system states are vectors x(t). In principle, the standard definition of the correlation coefficient can be applied to this case as well (compare Krauss et al., 2021), but it effectively reduces to the mean of pair-wise neuron-to-neuron correlations. To see this, consider the average over products of states, which represents the essence of a correlation coefficient. For vectorial states, the multiplication can be interpreted as a dot product,
x(t)·x(t+1)t=n=1Nxn(t)·xn(t+1)t=N·MEANxn(t)·xn(t+1)tn,
(3.4)
which leads to a mean of single-neuron autocorrelations.
In order to also include cross-correlations between different neurons, the multiplication should better be interpreted as an outer product. Moreover, to make the measure more compatible with the MI, the mean can be replaced by a root-mean-square average. We thus replace
MEANxn(t)·xn(t+1)tnRMSxm(t)·xn(t+1)tmn.
(3.5)
More precisely, we use the quantity RMSCxm(t),xn(t+1)mn, an average of normalized cross-correlation coefficients, which is defined in section 2. In a similar way, we also define an average over pair-wise mutual information values, denoted by RMSIxm(t),xn(t+1)mn. We thus arrive at three different measures for the state-to-state memory, which are abbreviated as I, RMSC, and RMSI. Our next goal is to test how well these measures match when applied to the spontaneous flux of states in SBMs.

The best we can expect is that these measures are monotonic transformations of each other and thus share the locations of local maxima and minima as functions of some control parameter. We have used the matrix element w11 as such a continuous parameter. In the case of N-neuron networks, all N2 connection strengths can be simultaneously varied in a gradual way (see Figure 1 of the supplemental material for an example).

Note that the subsequent weight matrices used for the comparison of the three measures need not be correlated with each other. Indeed, using a time series of statistically independent matrices is even beneficial, as a larger part of matrix space can be sampled by this way. In this case, the subsequent values in each of the resulting time series of I, RMSC and RMSI are also statistically independent, but we can still test if the different measures raise and fall synchronously.

For this purpose we count, for a given time series of matrices, how often the signum of change (SOC, introduced in section 2) agrees among the three measures. The fraction rSOC of matching SOC values is a quantitative measure for how well the three quantities are monotonically related. For two unrelated measures, one would expect rSOC=0.5. Values rSOC0.5 indicate that the two measures are monotonically related to each other (with rSOC=1 corresponding to one being a perfect monotonic transform of the other). Values rSOC0.5 indicate that one measure tends to have minima where the other has maxima.

Since rSOC is a statistical quantity, it fluctuates with each considered time series of weight matrices. In the following numerical experiments, we therefore generate NS=100 different time series and then estimate the probability density distribution of the resulting 100 values of rSOC, using gaussian kernel density approximation. Each time series consists of NM=100 random weight matrices with uniformly distributed elements wmn. For each weight matrix, we simulate the dynamics of the corresponding Boltzmann machine (SBM) for NT=10,000 time steps t, and then compute the three measures of state-to-state memory based on these subsequent system states x(t).

We first consider 5×5 weight matrices where the magnitudes of the matrix elements are small (-0.1wmn+0.1). Since the neurons in such systems are nearly uncoupled, they produce random walks of binary states that are almost statistically independent and temporally uncorrelated (a random regime). As a result, the neuron’s global distribution of on-probabilities pi(t), in the following also called fire probabilities, is sharply peaked around 0.5 (see Figure 2a, black dashed curve). The mutual information and pairwise correlation between subsequent states is very small here (and it therefore requires many time steps to estimate those measures precisely), but we find that the pairwise measures RMSC and RMSI have a clear monotonic relation with the full mutual information I (orange and blue solid curves), with RMSC (orange) working slightly better.

Figure 2.

Comparing the pairwise averaged correlation (RCo) and mutual information (RMI) with the full mutual information (MI), using networks (or subnetworks) consisting of five neurons. (a) In the random regime, due to small weight matrix elements, the distribution of the neuron’s fire probabilities is sharply peaked around 0.5 (black dashed curve). The full MI is here monotonically related to both RCo and RMI, indicated by fractions of matching signs of change (SOC) that are distributed around 0.75 (blue and orange curves). (b) In the moderate coupling regime, fire probabilities have a flat distribution in the full range from zero to one. The monotonic relation of RCo with the full MI (orange curve) is even stronger in this case. (c) In the deterministic regime, the distribution of fire probabilities has two peaks around zero and around one. Due to the presence of higher-order correlations between subsequent system states, the pairwise measures are not suitable to measure the state-to-state memory in this regime. This is indicated by SOC fractions smaller than 0.5. (d) In large networks (here with 100 neurons), the global state-to-state memory can be estimated by measuring the three quantities RCo, RMI, and MI within small subunits of neurons (here with five neurons each), provided the system is not in the deterministic regime. Note that the RCo and RMI are always monotonically related to each other (gray curves peaked around 0.9).

Figure 2.

Comparing the pairwise averaged correlation (RCo) and mutual information (RMI) with the full mutual information (MI), using networks (or subnetworks) consisting of five neurons. (a) In the random regime, due to small weight matrix elements, the distribution of the neuron’s fire probabilities is sharply peaked around 0.5 (black dashed curve). The full MI is here monotonically related to both RCo and RMI, indicated by fractions of matching signs of change (SOC) that are distributed around 0.75 (blue and orange curves). (b) In the moderate coupling regime, fire probabilities have a flat distribution in the full range from zero to one. The monotonic relation of RCo with the full MI (orange curve) is even stronger in this case. (c) In the deterministic regime, the distribution of fire probabilities has two peaks around zero and around one. Due to the presence of higher-order correlations between subsequent system states, the pairwise measures are not suitable to measure the state-to-state memory in this regime. This is indicated by SOC fractions smaller than 0.5. (d) In large networks (here with 100 neurons), the global state-to-state memory can be estimated by measuring the three quantities RCo, RMI, and MI within small subunits of neurons (here with five neurons each), provided the system is not in the deterministic regime. Note that the RCo and RMI are always monotonically related to each other (gray curves peaked around 0.9).

Close modal

Next we consider the moderate coupling regime by choosing -1wmn1. The fire probabilities in this regime are distributed approximately uniformly between zero and one, and the system is still behaving in a rather stochastic way, yet with clear correlations between subsequent states. (Note that we have already analyzed this moderate coupling regime in some of our former papers on RNN dynamics Krauss, Schuster et al., 2019 and Metzner & Krauss, 2022). Here we find, again, that the pairwise measures have a clear monotonic relation with the full mutual information (see the blue and orange curves in Figure 2b).

In the case of large matrix elements, with -10wmn10, we are in the deterministic regime: the neurons are driven far into the saturation of their logistic activation function and, correspondingly, the distribution of the fire probabilities now peaks at zero and one. Here we find that the pairwise measures cannot be used to approximate the full mutual information, not even in the sense of monotonic transformation (see Figure 2c). This result was to be expected, as in a deterministic system (such as in a network that is counting up the binary numbers from 0 to 2N), the correlations between subsequent states are of higher (nonlinear) order and can only be captured by the full mutual information.

Finally, we consider a larger SBM with N=100 neurons and weight matrix elements uniformly distributed with -0.3wmn0.3. It is, of course, completely out of the question to compute the full mutual information for a system with 2100 global states. However, in order to find out how the system’s state-to-state memory is changing as a function of some control parameter, we can consider small subgroups of NsubN neurons and compute the three measures within these neuron subgroups, assuming that their changes represent the global trend (we shall call this method “subgroup sampling”). The method is tested using NS=100 different, random, predefined subgroups of tractable size Nsub=5, applied to a single time series of NM=100 large 100×100 weight matrices with the above stated properties. As before, NT=10,000 time steps are simulated for each weight matrix. The flat distribution of fire probabilities indicates that the systems are in the moderate coupling regime (black dashed curve in Figure 2d). We find that in this subgroup sampling scenario, the pairwise measures, in particular RMSC (orange), have a clear monotonic relation to the full mutual information.

3.3  Preliminary Considerations on MI Optimization

Next, we turn to the optimization of the mutual information in RNNs. We start with some fundamental principles based on the well-known diagram in Figure 1a.

In the present context, x corresponds to an initial RNN state at time t and y to the successor state at time t+1 (note that due to the recurrence, the output y is fed back as input in each update step). The quantity I(x,y) is the mutual information between successive states. The state entropies H(x) and H(y) are limited by the size of the RNN and can maximally reach the upper limit of 2N in an N-neuron binary network.

The conditional entropy H(x|y) describes the lossy compression that takes place whenever the same successor state y can be reached from different initial states x. Such a convergence of global system states happens necessarily in standard RNNs, where each neuron r is receiving inputs from several supplier neurons s. Since the output state of neuron r depends only on the weighted sum of the inputs, several input combinations can cause the same output.

Vice versa, H(y|x) describes the random expansion that occurs when a given initial state x can lead to different possible successor states y. Obviously, such a divergence of global system states can happen only in probabilistic systems.

According to the Figure 1a, the mutual state-to-state information I(x,y) is maximized under the following conditions:

  1. The state entropies H(x) and H(y) should approach the upper limit, so that ideally, all 2N possible system states are visited with equal probability.

  2. The lossy compression H(x|y) should be minimized, so that ideally, each successor state y is reachable by only a single initial state x.

  3. The random expansion H(y|x) should be minimized, so that ideally, each initial state x leads deterministically to a single successor state y.

All the above conditions are perfectly fulfilled, and thus the mutual information reaches its upper limit, I(x,y)=H(x)=H(y), in an N-neuron system that is running deterministically through a 2N cycle. In the special case when the sequence of these 2N global system states corresponds to the numerical order of the state’s binary representations, we could call such a system a “cyclic counting network.”

However, a major goal of this article is to discover design principles for probabilistic Boltzmann machines in which the mutual information is suboptimal but close to the upper limit. The above preliminary considerations suggest that cyclic attractors are desirable dynamical features for this purpose, as they avoid both convergence and compression (condition 2) and divergence and expansion (condition 3) in state space. Moreover, all states within a cyclic attractor are visited with equal probability (condition 1), thus helping to maximize the state entropy.

3.4  Evolutionary Optimization of State-to-State Memory

We now turn to our second major research problem of systematically maximizing the spontaneous information flux in free-running SBMs, or, in other words, maximizing state-to-state mutual information I. Obviously, a system optimized for this goal must simultaneously have a rich variety of system states (large state entropy H) and be highly predictable from one state to the next (quasi-deterministic behavior). It is, however, unclear how the connection strengths of the network must be chosen to achieve this goal.

We therefore perform an evolutionary optimization of the weight matrix in order to maximize I, subsequently collect some of the emerging solutions, and finally try to reverse-engineer the resulting weight matrices to extract general design principles. We generate the evolutionary variants simply by adding small random numbers to each of the weight matrix elements wmn (see section 2 for details), but we restrict the absolute values of the entries wmn to below 5 in order to avoid extremely deterministic behavior.

We start with a 5×5 weight matrix in which all elements are zero. This corresponds to a set of independent neurons without input and results in N=5 uncorrelated random walks (see Figure 3a, state-versus-time plot annotated with t=0). The objective function I, consequently, is zero at evolution time step t=0 (blue curve).

Figure 3.

Evolutionary optimization of state-to-state memory in an SBM with N=5 neurons. (a) Mutual information Ix(t),x(t+1) grows monotonically with the evolutionary time step t (blue curve). Starting from a weight matrix with all elements zero (bottom inset, left matrix), a pattern emerges with four strongly positive weights (red) and one strongly negative weight (blue). While neurons produce independent random walks for t=0 (top inset), the system shows a multiattractor dynamics at the final time step t=1900 (middle inset). (b) Network representation of the evolved system, ignoring small background matrix elements. (c) Transition probabilities between the 32 global states of the system (left matrix). Each state (matrix rows) has only one dominant successor state (matrix columns), leading to highly predictable dynamics. In particular, pairs of states form 2-cycles as unstable attractors (three of the 2-cycles are marked by colors). A closer look at the smaller matrix elements (right matrix) reveals that each state also has several low-probability successor states, allowing for a switching between the attractors. (d) The 32 global states are visited with comparable probability, thus maximizing the entropy of the system. The inset shows the state probability distributions for three different total numbers of time steps. (e) Four independently optimized weight matrices. As a general rule, there emerge only N=5 matrix elements with large positive (red) or negative (blue) values, arranged such that they do not share the same rows or columns (subsequently called the N-rooks principle). (f, g) Two further examples of N-rooks networks, together with their network representations (or wiring diagrams).

Figure 3.

Evolutionary optimization of state-to-state memory in an SBM with N=5 neurons. (a) Mutual information Ix(t),x(t+1) grows monotonically with the evolutionary time step t (blue curve). Starting from a weight matrix with all elements zero (bottom inset, left matrix), a pattern emerges with four strongly positive weights (red) and one strongly negative weight (blue). While neurons produce independent random walks for t=0 (top inset), the system shows a multiattractor dynamics at the final time step t=1900 (middle inset). (b) Network representation of the evolved system, ignoring small background matrix elements. (c) Transition probabilities between the 32 global states of the system (left matrix). Each state (matrix rows) has only one dominant successor state (matrix columns), leading to highly predictable dynamics. In particular, pairs of states form 2-cycles as unstable attractors (three of the 2-cycles are marked by colors). A closer look at the smaller matrix elements (right matrix) reveals that each state also has several low-probability successor states, allowing for a switching between the attractors. (d) The 32 global states are visited with comparable probability, thus maximizing the entropy of the system. The inset shows the state probability distributions for three different total numbers of time steps. (e) Four independently optimized weight matrices. As a general rule, there emerge only N=5 matrix elements with large positive (red) or negative (blue) values, arranged such that they do not share the same rows or columns (subsequently called the N-rooks principle). (f, g) Two further examples of N-rooks networks, together with their network representations (or wiring diagrams).

Close modal

As the evolutionary optimization proceeds, first, several nonzero matrix elements emerge (inset of Figure 3a), but eventually only N=5 elements survive, all with large absolute values close to the allowed limit but with different signs (+ in red, - in blue). During this development, the objective function I is monotonically rising to a final value of about 4.68 (blue curve), which is close to the theoretical maximum of 5. At the final evolution time step, t=1900, the five neurons show a complex and temporally heterogeneous behavior (state-versus-time plot).

Computing the state transition matrix from the final evolved weight matrix (see section 2 for how this is done analytically) reveals that each of the 25=32 system states has only one clearly dominating successor state, thus enabling quasi-deterministic behavior (see Figure 3c, left panel). However, a plot focusing on the small weights shows that each state also has a few alternative, low-probability successor states (see Figure 3c, right panel). The latter property makes sure that any overly deterministic behavior, such as being trapped in an n-cycle, is eventually broken. Note also that the dominant entries of the state transition matrix are arranged in a very regular way, which is surprising considering the random optimization process by which the system has been created. Finally, a closer inspection of the dominant successor states reveals that the dynamics of this specific SBN at least contains a few 2-cycles (entries marked by colors) as quasi-stable attractors.

We next compute for our evolved SBN the stationary probability of all 32 system states and find that they all occur about equally often (see Figure 3c). Besides the quasi-deterministic behavior, this large variety of states (entropy) represents the second expected property of a system with large state-to-state memory.

We finally repeat the evolutionary optimization with different seeds of the random number generator and thus obtain four additional solutions (see Figure 3d). Remarkably, they are all constructed according to the same design pattern. There are only N=5 large elements in the weight matrix (one for each neuron), which can be of either sign, but which never share a common row or column. Due to the latter property, we call this general design pattern, in analogy to certain chess problems, the “N-rooks principle.”

Although all N-rooks networks follow the same simple rule regarding their weight matrix, the resulting connectivity structure of the neural units can be very different. This becomes apparent when the networks are represented in the form of “wiring diagrams:” For example, the network in Figure 3f corresponds to isolated, mutually independent neurons with self-connections, whereas the network in Figure 3g has all its neurons arranged in a loop. There are also more complex cases, like the network in Figure 3b, which consists of one isolated neuron with self-connection and two neuron pairs. As a general rule, each neuron of an N-rooks network is part of exactly one closed linear chain of neurons, or k-neuron-loop, where k can range between 1 and N.

3.5  N-Rooks Matrices Maximize State-to-State Memory

In this section, we establish that N-rooks networks are local optima of state-to-state memory in the N2-dimensional space of weight matrices.

At the end of the evolutionary optimization, the weight matrix contains N large entries, whereas the remaining N2-N elements are very small but not zero. In order to test the relevance of these small background matrix elements, we start with artificially constructed, perfect N-rooks matrices where all background matrix elements are zero (the ones shown in Figures 3f and 3g), and then gradually fill the background elements with normally distributed random numbers of increasing standard deviation. We find that I, on average, is decreasing by this perturbation (see Figures 4a and 4b).

Figure 4.

N-rooks weight matrices as MI optima. (a,b) Mutual information I as a function of the standard deviation of nondominating matrix elements, with the five dominating elements fixed as in Figures 3f and 3g. On average, I is decreasing when the nondominating matrix elements are filled with nonzero values. (c) Mutual information I as a function of the Euclidean distance d from various N-rooks matrices within 25-dimensional weight matrix space. Large blue dots are averages, and gray dots show the 1000 individual random samples for each distance d. The plot suggests that N-rooks matrices are local optima of I. (d) Mutual information I as a function of the magnitude v of the five nonzero matrix elements. The MI increases monotonically with v, asymptotically approaching the theoretical maximum of Imax=5. (e) Probability density functions of the MI in N-rooks systems with v=5 (orange), compared to random matrices with elements distributed uniformly between -v and v (blue). (f) Like panel e, but for the entropy H of the 25 global system states. The N-rooks systems approach the theoretical maxima of both I and H. (g) To further demonstrate that the N-rooks matrices are local maxima of mutual information in the N2-dimensional weight matrix space, we start from 10 perfect N-rooks matrices (red) and then diffuse away from these points by iteratively adding small random numbers to the matrix elements. The plot shows a two-dimensional MDS visualization of the resulting diffusive random walks, with the mutual information color coded. (h) We next produce a larger test set of weight matrices with high mutual information by generating a cluster of random variants around each perfect central N-rook matrix with a restricted distance to the cluster center (dark red spots). If now the positions of the large elements in the 10 N-rook matrices are randomly shuffled and variants are again generated from the resulting non-N-rooks matrices, we obtain a control set of matrices with smaller mutual information (weak red and weak blue spots) but with the same statistical distribution of weights.

Figure 4.

N-rooks weight matrices as MI optima. (a,b) Mutual information I as a function of the standard deviation of nondominating matrix elements, with the five dominating elements fixed as in Figures 3f and 3g. On average, I is decreasing when the nondominating matrix elements are filled with nonzero values. (c) Mutual information I as a function of the Euclidean distance d from various N-rooks matrices within 25-dimensional weight matrix space. Large blue dots are averages, and gray dots show the 1000 individual random samples for each distance d. The plot suggests that N-rooks matrices are local optima of I. (d) Mutual information I as a function of the magnitude v of the five nonzero matrix elements. The MI increases monotonically with v, asymptotically approaching the theoretical maximum of Imax=5. (e) Probability density functions of the MI in N-rooks systems with v=5 (orange), compared to random matrices with elements distributed uniformly between -v and v (blue). (f) Like panel e, but for the entropy H of the 25 global system states. The N-rooks systems approach the theoretical maxima of both I and H. (g) To further demonstrate that the N-rooks matrices are local maxima of mutual information in the N2-dimensional weight matrix space, we start from 10 perfect N-rooks matrices (red) and then diffuse away from these points by iteratively adding small random numbers to the matrix elements. The plot shows a two-dimensional MDS visualization of the resulting diffusive random walks, with the mutual information color coded. (h) We next produce a larger test set of weight matrices with high mutual information by generating a cluster of random variants around each perfect central N-rook matrix with a restricted distance to the cluster center (dark red spots). If now the positions of the large elements in the 10 N-rook matrices are randomly shuffled and variants are again generated from the resulting non-N-rooks matrices, we obtain a control set of matrices with smaller mutual information (weak red and weak blue spots) but with the same statistical distribution of weights.

Close modal

Next we start from perfect N-rooks matrices and generate, around each of them, random imperfect matrices at a fixed Euclidean distance within 25-dimensional weight matrix space. We find that the state-to-state memory of these imperfect matrices is fluctuating strongly (in particular, at larger distances from the perfect center), but on average the mutual information is monotonically decreasing with the distance (see Figure 4c).

We now again consider perfect N-rooks networks, but systematically increase the magnitude v of the nonzero matrix elements. We find that the state-to-state memory is increasing with v monotonically from zero to 5 bits, the theoretical maximum in a five-neuron system (see Figure 4d). Note that the sigmoidal shape of this curve can be derived analytically as well (see the supplemental material, section 3).

To demonstrate how strongly the properties of N-rooks networks differ from nonoptimized networks, we compute the probability density distribution of the mutual information I (see Figure 4e) and of the state entropy H (see Figure 4f) for a large ensemble of N-rooks matrices (orange curves), as well as for random matrices with uniformly distributed elements (blue curves). Remarkably, all N-rooks networks have the same large values for I4.710 to a very high precision (the visible width of the orange bars corresponds to the bin width of the histogram). Most of the random matrices also have considerable mutual information, but nevertheless a large gap is separating them from the performance of the N-rooks networks. The entropy H of the N-rooks networks is close to the theoretical maximum.

To further demonstrate that n-rook matrices are local maxima of state-to-state memory, we perform a direct visualization of those matrices as points in N2-dimensional weight matrix space (W-space), using the method of multidimensional scaling (MDS; see section 2 for details) and color-coding of their I-values. We first generate 10 perfect N-rooks matrices as initial points (the red dots in Figure 4g) and then iteratively add to their matrix elements independent, normally distributed random numbers with a fixed, small standard deviation. This corresponds to a random walk in W-space, and it leads with high probability away from the initial points. As expected for local maxima, we observe that I tends to decrease during this diffusion process (compare the color bar). At the same time, the matrices tend to move away from each other in this two-dimensional, distance-preserving MDS visualization. This is due to the fact that the initial perfect N-rooks matrices are all part of an N-dimensional submanifold of W-space. They are relatively close together because their nondiagonal matrix elements are zero and thus do not contribute to their Euclidean distance. As the matrices are diffusing out of the submanifold, their nondiagonal elements are increasing, so their mutual Euclidean distance is increasing as well.

We next produce a larger test set of weight matrices with high mutual information by generating a cluster of random variants around each perfect central N-rook matrix with a small distance to the cluster center (see the dark red spots in Figure 4h). If the positions of the large elements in the 10 N-rook matrices are now randomly shuffled and variants are again generated from the resulting ‘non-N-rooks’ matrices, we obtain a control set of matrices with smaller mutual information (weak red and weak blue spots), but with the same statistical distribution of weights.

Taken together, our numerical results strongly indicate that N-rooks networks are indeed local maxima of state-to-state memory in W-space. We will further corroborate this statement by theoretical analysis of the N-rooks mechanism.

3.6  Maximizing the Cycle Length of Periodic Attractors

It is important to remember that the N-rooks matrices are optima of state-to-state memory under the constraint of a limited magnitude of connection weights. (In our case, the limit was set to |wmn|<5.) Due to this constraint, the value of I=4.710 achieved in five-neuron networks is large but still below the theoretical maximum of I=5.

As we pointed out in section 1, this theoretical maximum could be realized in an N-neuron system only if it would run deterministically through a 2N-cycle. It is, however, not clear if this is actually possible with an SBM. We therefore now turn to the problem of maximizing the cycle length evolutionarily.

In order to quantify the cycle length for a given weight matrix, we first compute analytically the state transition matrix. By considering only the most probable successor for each of the 2N states, we obtain a perfectly deterministic flux of states, which corresponds to a directed graph with only one out-link per node (see Figure 5d for some examples). It is straightforward to find all n-cycles in this graph, finally yielding the mean cycle length MCL of the given weight matrix (see section 2 for details).

Figure 5.

Simultaneous optimization of state-to-state memory I and mean cycle length (MCL) of periodic attractors, demonstrated in two independent evolutionary runs (left and right columns). (a) The fitness (red), defined as the product I×MCL, is increasing monotonously. The individual optimization factors (blue and olive) show an overall growth to larger values, but they also drop temporarily. (b) The resulting evolved networks no longer follow the N-rooks principle, and as a result, the state-to-state memory is large but suboptimal (I=4.042 in the left and I=4.138 in the right example). (c) The state transition matrices are not as sparse as in the case of N-rooks networks, so that each global system state can randomly switch between several successor states. (d) Nevertheless, there exists one dominating path through state space in the form of a single long cyclic attractor (consisting of 24 states in the left and 20 states in the right example). Note that states belonging to the cyclic attractor itself are colored in orange and transient states in gray.

Figure 5.

Simultaneous optimization of state-to-state memory I and mean cycle length (MCL) of periodic attractors, demonstrated in two independent evolutionary runs (left and right columns). (a) The fitness (red), defined as the product I×MCL, is increasing monotonously. The individual optimization factors (blue and olive) show an overall growth to larger values, but they also drop temporarily. (b) The resulting evolved networks no longer follow the N-rooks principle, and as a result, the state-to-state memory is large but suboptimal (I=4.042 in the left and I=4.138 in the right example). (c) The state transition matrices are not as sparse as in the case of N-rooks networks, so that each global system state can randomly switch between several successor states. (d) Nevertheless, there exists one dominating path through state space in the form of a single long cyclic attractor (consisting of 24 states in the left and 20 states in the right example). Note that states belonging to the cyclic attractor itself are colored in orange and transient states in gray.

Close modal

When the mean cycle length is maximized evolutionarily, we obtain top values of up to MCL = 18 in a five-neuron system (corresponding to a single 18-cycle with many transient states leading into it). However, the resulting weight matrices are densely populated with relatively small matrix elements (data not shown), and thus each state has multiple successors with comparable transition probabilities. In other words, the optimization has only produced a quasi-stable 18-cycle with a rather short lifetime. It is therefore necessary to not only maximize the mean cycle length (MCL) but to simultaneously make sure the system behaves deterministically to a large degree.

We therefore choose as a new objective function the product of the state-to-state memory I and the MCL. Here we find that, at least over sufficiently many time steps, it is indeed possible to maximize both quantities in parallel (see Figure 5a). For example, one evolutionary run produces a matrix with I=4.042 and MCL = 24 (left figure column), and another run yields I=4.138 and MCL = 20 (right figure column). Due to the now incomplete optimization of I, the resulting weight matrices deviate from the N-rooks principle (see the insets of Figure 5a and their network representations, Figure 5b). Correspondingly, the state transition matrices have more than one nonzero entry in each row; however, each state now has one clearly dominating successor, leading to a relatively long lifetime of the periodic attractor.

3.7  The Mechanism of N-Rooks Systems

In a perfect N-rooks weight matrix, there are only N nonzero matrix elements, either positive or negative, but all with the same large magnitude, v1, and arranged so that they have no row or column in common. We now theoretically analyze the implications of this design principle:

  1. Each row r of the weight matrix wrc, whose elements describe the input strengths of neuron r, has only one nonzero entry. In other words, each neuron receives input from only one supplier neuron, and so these networks obey a single supplier rule.

  2. This single supplier rule prevents the possibility that different combinations of supplier states can lead (via weighted summation) to the same activation of neuron r. The rule eventually prevents the convergence of multiple global system states into one and thus fulfills the second fundamental condition for a large MI.

  3. The fact that v is large means that the neurons behave almost deterministically: Each neuron r either transmits the state of its single supplier (in the case of excitatory coupling) or inverts it (in the case of inhibitory coupling), with a probability close to one. The resulting deterministic system dynamics prevents the random expansion of states and thus fulfills the third fundamental condition for a large MI.

  4. Since the nonzero matrix elements in wrc are all positioned in different columns, no two neurons receive input from the same supplier neuron. And because there are n neurons in total, it follows that each of them is feeding its output to only one unique consumer neuron, corresponding to a single consumer rule.

  5. Together, the single-supplier and single-consumer rules imply that the system is structured as a set of linear strings of neurons. Since each neuron definitely has a consumer (that is, within the weight matrix it appears in the input row of some other neuron), these strings do not end, and so they must be closed loops. Consequently, the N-rooks weight matrix corresponds to a set of closed neuron loops.

  6. In a closed loop of m neurons with only excitatory large connections in between, an initial bit pattern would simply circulate around, creating a trivial periodic sequence of states within the subspace of this m-neuron group. But also with mixed excitatory and inhibitory connections, closed neuron loops produce periodic state sequences within their subspace, and the period length can vary between m and 2m.

  7. On the level of the global state flux, the periodic behavior of each local m-neuron loop creates a set of cyclic attractors of different cycle lengths. Even though all N-rooks matrices look very similar, the combination of cycle lengths can vary greatly.

  8. Because the single-supplier rule prevents convergence of global states, N-rooks systems do not have any transient states. Instead, each of the 2N possible global system states belongs to one of the cyclic attractors. As transient states would be visited less often than states within cycles, this property helps to make state probabilities more uniform and thus increases the state entropy.

  9. An N-rooks system will spend a very long time in one of its cyclic attractors, as the neurons work almost deterministically. However, eventually an “error” happens in one of the neurons, one bit of the state vector updates the “wrong” way, and consequently the system jumps to a global state that is out of the periodic order, belonging either to the same or another cyclic attractor. These infrequent random jumps are essential to maximize the state entropy.

  10. Each state within an n-cycle is visited only 1nth of the time. One might think that for this reason, states in large n-cycles are visited less often, thus contradicting the desired uniform probability of each state (the first condition of achieving a large MI). However, it turns out that the occasional out-of-order jumps of the system end up proportionally more frequently in large n-cycles. We have actually verified that all 2N global states in an N-rooks system are visited equally often, leading to the maximum possible state entropy.

This work focused on the measurement and enhancement of the spontaneous information flux in RNNs. We used the symmetrized Boltzmann machine (SBM) as our probabilistic RNN model system and quantified the information flux by the mutual information (MI) between subsequent global system states, a quantity that we call state-to-state memory.

As we have shown in previous work, it is possible to enhance and optimize the state-to-state memory of an RNN by various methods, for example, by tuning the statistics of the weight matrix elements (Krauss, Schuster et al., 2019), or by adding an appropriate level of noise to the neuron inputs (Krauss, Prebeck et al., 2019). Such optimization methods obviously rely on the measurement of the MI, a task that unfortunately becomes intractable for large systems.

We have therefore considered numerically more efficient measures for the state-to-state memory, such as root-mean-square (RMS) averaged pair-wise correlations RMSC or the averaged pair-wise mutual information RMSI.

These alternative measures yield values that are numerically different from the full MI, but for optimization purposes, the only concern is that all quantities rise or fall synchronously as some system parameters are changed. In order to assess this degree of monotonous relatedness, we compute the fraction of matching signs of change, or the SOC fraction between the outputs of two given measures. This novel SOC method can be used to quantify the monotonous relatedness of two arbitrary time series, even in cases with nonstationary statistics, where linear correlation coefficients fail.

Using the SOC method, we have shown that the two pairwise measures are indeed monotonous transforms of the full MI in a wide range of practically relevant cases: As long as the magnitudes of the RNN’s weight matrix elements are not too large, the system is operating in a predominantly probabilistic regime, so that the fire probabilities of the neurons are peaked around 1/2. We find that in this regime, for optimization purposes, the full MI can be replaced by the pairwise measures, in particular by RMSC, which can be efficiently computed even in very large RNNs.

By contrast, too large matrix elements drive the neurons deeply into the nonlinear saturation regime of their activation functions, leading to a bimodal distribution of fire probabilities that has its peaks at zero and one. In this quasi-deterministic regime, subsequent RNN states are often related in complex ways that cannot be captured by the simpler measures.

While the theoretical maximum of state-to-state memory I in an N-neuron network would correspond to a system that cycles deterministically through a single periodic attractor that comprises all 2N possible states, this extreme case cannot be realized with a probabilistic SBM. Instead, evolutionary maximization of I, while enforcing a limited magnitude for the neural connections, led to weight matrices that are built according the N-rooks principle: there are only N nonzero matrix elements of large magnitude v, arranged so that they do not have any rows or columns in common.

We have demonstrated that networks built according to this principle have peculiar dynamical properties: all of their 2N possible states are parts of periodic cycles of different period lengths k. If the magnitude v of the nonzero weights is large, the network is dwelling for very long times in one of these cyclic attractors, until it randomly jumps to a new state that may be part of another cycle. In this way, the system visits all states equally often, thus maximizing the state entropy. As it also behaves in a highly (although not perfectly) deterministic way, N-rooks networks are maximizing the state-to-state memory, asymptotically approaching the theoretical limit of I=N bits.

As N-rooks networks do not necessarily have very long cycles, we have also evolutionary optimized the product of state-to-state memory I and the MCL. This led to weight matrices that deviate from the strict N-rooks principle but that combine large cycle lengths of up to 24 (in a five-neuron system with only 32 possible global system states in total) with a relatively high stability of this periodic attractor.

5.1  Measuring and Controlling Information Flux

As has been discussed extensively in the literature, the computational abilities of RNNs are crucially determined by the dynamical regime they are operating in Bertschinger & Natschläger (2004), Langton (1990), and Legenstein & Maass (2007).

5.1.1  Spectral Properties of the Connectivity Matrix

The dynamical behavior of RNNs, in turn, is intricately linked to the eigenvalues of their connectivity matrix (Ganguli et al., 2008; Hennequin et al., 2014; Rivkind & Barak, 2017), a fundamental concept in the field of neural network analysis. These eigenvalues serve as critical indicators of the network’s stability, convergence properties, and overall functionality. By examining the spectral properties of the connectivity matrix, valuable insights are gained into how information flows and evolves within the network over time. Understanding this connection between RNN dynamics and eigenvalues is essential for optimizing network architectures, training procedures, and predicting their performance in various tasks, making it a pivotal aspect of both theoretical and practical research in the realm of recurrent neural networks.

5.1.2  Fast Control of Network Dynamics

However, there may be situations where an RNN has initially a proper weight matrix and spectral properties, but is nevertheless driven into a computationally unfavorable regime by unexpected input signals or, during learning, by gradual changes of its weights. It is then essential to quickly assess the dynamical state of the system in order to apply regulating control signals that bring it back into a healthy dynamical state.

5.1.3  Fast Measurement of State-to-State Memory

For this reason, we focus in this work on the measurement of state-to-state memory, a central indicator of a network’s dynamical regime. The full mutual information between subsequent states is a computational demanding measure of state-to-state memory, and it requires a long history of states to yield sufficient statistical accuracy. By contrast, the average pair-wise correlation is much simpler and requires much shorter state histories. It may even be possible to train a multilayer perceptron, as a regression task, to estimate the momentary state-to-state memory of an RNN, based on just a few subsequent network states. Such a perceptron could then be used as a fast probe for the dynamical state of the RNN, forming a part of an automatic control loop that keeps the RNN continuously within the optimal computational regime.

5.2  Optimizing Information Flux

N-rooks networks turned out to achieve the optimal state-to-state memory. The intriguing dynamical properties of these systems networks, with or without cycle-length optimization, may offer a new perspective on a several other neural systems, both artificial and biological.

5.2.1  Edge-of-Chaos Networks

Free-running N-rooks networks appear to share certain features with networks situated at the edge-of-chaos (EoC; Bertschinger & Natschläger, 2004; Langton, 1990; Legenstein & Maass, 2007). Both systems delicately balance between stability and unpredictability, a hallmark for optimal computation and adaptability. In N-rooks networks, while the deterministic cycles provide a semblance of order, the random transitions between states echo the spontaneous changes often seen at the EoC. This juxtaposition might offer new insights into the emergent properties of systems teetering on the brink of chaos and their potential for sophisticated information processing.

5.2.2  Amari-Hopfield Networks

When comparing N-rooks networks with the deterministic dynamics of Amari-Hopfield networks (Amari, 1977; Amari & Maginu, 1988), intriguing contrasts and parallels emerge. Both are founded on principles of energy landscapes and attractor states, though their approaches to memory retrieval and storage diverge. While Amari-Hopfield networks are renowned for their associative memory capabilities and stable attractor states, the k-cycle sequences in N-rooks networks suggest a more transient yet extended memory retention mechanism. Exploring the interplay between these models could yield novel hybrid architectures that combine the robustness of associative memory with the dynamic flexibility of transient state sequences.

5.2.3  Central Pattern Generators

The cyclic behaviors inherent in N-rooks networks show intriguing parallels with the rhythmic activities orchestrated by central pattern generators (CPGs) in biological organisms (Grillner, 2006; Harris-Warrick, 2010; Marder & Calabrese, 1996). At their core, both systems autonomously generate rhythmic patterns. CPGs, specialized neural circuits, underpin many essential rhythmic behaviors in animals—most notably, orchestrating locomotion patterns in walking or swimming and managing the rhythmic contractions and expansions of breathing. Similarly, N-rooks networks, through their structured connections, consistently produce deterministic k-cycles, emphasizing a form of artificial rhythmicity. The pronounced persistence within these cycles in N-rooks networks bears resemblance to the sustained rhythmic outputs manifested by CPGs, such as the continuous stride pattern in walking or the consistent tempo of breathing. However, an added layer of complexity in the N-rooks networks is their sporadic jumps between cycles. This feature can be likened to the adaptive and transitional behaviors seen in biological systems, where CPGs might adjust or shift rhythms in response to external stimuli or changing conditions. While the origins and implementations of their rhythmic patterns may differ, an in-depth exploration into the shared principles, transitional behaviors, and disparities of N-rooks networks and CPGs could yield valuable insights. Such an understanding may pave the way for the development of artificial neural systems that not only replicate but also enhance rhythmic behaviors, combining the robustness of biological systems with the adaptability and versatility inherent in artificial networks.

5.2.4  Place and Grid Cells

The spatiotemporal dynamics of N-rooks networks may have parallels with the functioning of place and grid cells (Eichenbaum, 2015; Moser et al., 2008; Rowland et al., 2016), that are vital for spatial navigation and representation in mammals. Just as place cells fire in specific locations and grid cells display a hexagonal firing pattern across spaces, N-rooks networks, with their distinct state transitions, might encode specific locations or patterns in their state space. Drawing inspiration from these biological counterparts, N-rooks networks could potentially be harnessed for tasks related to spatial representation and navigation in artificial systems, merging principles from both domains for enhanced performance.

5.2.5  Practical Applications and Limitations of N-Rooks Networks

N-rooks networks, with their unique dynamical properties, open up many practical applications. Their inherent high mutual information between successive states positions them as candidates for advanced memory storage systems, potentially offering more robust retention and recall mechanisms than traditional architectures. However, the property that, given unbounded connectivity strength, a network spends a disproportionate amount of time in cyclic attractors and seldom switches between them poses some intriguing challenges and insights from both an information processing and memory storage perspective. On the one hand, prolonged dwelling in cyclic attractors can be seen as an effective way to preserve information. In biological systems, certain memories or behaviors might need to be stable and resistant to noise or external perturbations, and such a mechanism could be beneficial. On the other hand, a network that gets trapped in specific cyclic attractors for an extended period can be less responsive to new inputs. This rigidity can be problematic for real-time information processing where adaptability and rapid response to changing conditions are crucial. Moreover, the long dwelling time in cyclic attractors means that the network takes practically infinite time to explore its entire state space. This is inefficient for some applications, as the richness of a network’s behavior often lies in its capacity to explore and traverse a vast range of states. If a network gets trapped in specific cycles, it becomes less versatile and may fail to recognize or generate novel patterns or responses. It is therefore advantageous that the cycle dwelling time in N-rooks networks can be adjusted by the magnitude v of the nonzero matrix elements.

5.2.6  Large N-Rooks Networks are Sparse

Very large systems (those with a large number of neurons N) constructed according to the N-rooks principle, would have a vanishingly small density of nonzero connections, given by d=N/N2=1/N0. Interestingly, we found in one of our former studies on deterministic RNNs a highly unusual behavior at the very edge of the density-balance phase diagram, precisely in the regime that corresponds to these extremely sparsely connected networks (Metzner & Krauss, 2022). It might therefore be worthwhile to study this low-density regime more comprehensively in future work, considering that the brain also has a very low connection density (Miner & Triesch, 2016; Song et al., 2005; Sporns, 2011).

This work was funded by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) with grants KR 5148/2-1 (project number 436456810), KR 5148/3-1 (project number 510395418), and GRK 2839 (project number 468527017) to P.K., as well as grant SCHI 1482/3-1 (project number 451810794) to A.S.

The study was conceived, designed, and supervised by C.M. and P.K. Numerical experiments were performed by C.M. and D.V. Results were discussed and interpreted by C.M., M.Y., A.S., and P.K. The article was written by C.M. and P.K. All authors critically read the manuscript before submission.

The authors declare no competing interests. Data and analysis programs will be made available on reasonable request.

Alzubaidi
,
L.
,
Zhang
,
J.
,
Humaidi
,
A. J.
,
Al-Dujaili
,
A.
,
Duan
,
Y.
,
Al-Shamma
,
O.
, . . .
Farhan
,
L
. (
2021
).
Review of deep learning: Concepts, CNN architectures, challenges, applications, future directions
.
Journal of Big Data
,
8
(
1
),
1
74
.
Amari
,
S.-I
. (
1977
).
Dynamics of pattern formation in lateral-inhibition type neural fields
.
Biological Cybernetics
,
27
(
2
),
77
87
.
Amari
,
S.-I.
, &
Maginu
,
K
. (
1988
).
Statistical neurodynamics of associative memory
.
Neural Networks
,
1
(
1
),
63
73
.
Barak
,
O
. (
2017
).
Recurrent neural networks as versatile tools of neuroscience research
.
Current Opinion in Neurobiology
,
46
,
1
6
.
Bertschinger
,
N.
, &
Natschläger
,
T
. (
2004
).
Real-time computation at the edge of chaos in recurrent neural networks
.
Neural Computation
,
16
(
7
),
1413
1436
.
Boedecker
,
J.
,
Obst
,
O.
,
Lizier
,
J. T.
,
Mayer
,
N. M.
, &
Asada
,
M
. (
2012
).
Information processing in echo state networks at the edge of chaos
.
Theory in Biosciences
,
131
(
3
),
205
213
.
Bönsel
,
F.
,
Krauss
,
P.
,
Metzner
,
C.
, &
Yamakou
,
M. E.
(
2021
).
Control of noise-induced coherent oscillations in time-delayed neural motifs
.
arXiv:2106.11361
.
Brunel
,
N.
(
2016
).
Is cortical connectivity optimized for storing information?
Nature Neuroscience
,
19
(
5
),
749
755
.
Büsing
,
L.
,
Schrauwen
,
B.
, &
Legenstein
,
R
. (
2010
).
Connectivity, dynamics, and memory in reservoir computing with binary and analog neurons
.
Neural Computation
,
22
(
5
),
1272
1311
.
Cox
,
M. A.
, &
Cox
,
T. F
. (
2008
).
Multidimensional scaling
. In
C.
Chen
,
W.
Härdle
, &
A.
Unwin
(Eds.),
Handbook of data visualization
(pp.
315
347
).
Springer
.
Dambre
,
J.
,
Verstraeten
,
D.
,
Schrauwen
,
B.
, &
Massar
,
S
. (
2012
).
Information processing capacity of dynamical systems
.
Scientific Reports
,
2
(
1
),
1
7
.
Eichenbaum
,
H
. (
2015
).
The hippocampus as a cognitive map . . . of social space
.
Neuron
,
87
(
1
),
9
11
.
Farrell
,
M.
,
Recanatesi
,
S.
,
Moore
,
T.
,
Lajoie
,
G.
, &
Shea-Brown
,
E
. (
2022
).
Gradient-based learning drives robust representations in recurrent neural networks by balancing compression and expansion
.
Nature Machine Intelligence
,
4
(
6
),
564
573
.
Folli
,
V.
,
Gosti
,
G.
,
Leonetti
,
M.
, &
Ruocco
,
G.
(
2018
).
Effect of dilution in asymmetric recurrent neural networks
.
Neural Networks
,
104
,
50
59
.
Ganguli
,
S.
,
Huh
,
D.
, &
Sompolinsky
,
H
. (
2008
).
Memory traces in dynamical systems
.
Proceedings of the National Academy of Sciences
,
105
(
48
),
18970
18975
.
Gerum
,
R. C.
,
Erpenbeck
,
A.
,
Krauss
,
P.
, &
Schilling
,
A
. (
2020
).
Sparsity through evolutionary pruning prevents neuronal networks from overfitting
.
Neural Networks
,
128
,
305
312
.
Gonon
,
L.
, &
Ortega
,
J.-P
. (
2021
).
Fading memory echo state networks are universal
.
Neural Networks
,
138
,
10
13
.
Grillner
,
S
. (
2006
).
Biological pattern generation: The cellular and computational logic of networks in motion
.
Neuron
,
52
(
5
),
751
766
.
Harris-Warrick
,
R. M
. (
2010
).
General principles of rhythmogenesis in central pattern generator networks
.
Progress in Brain Research
,
187
,
213
222
.
Haruna
,
T.
, &
Nakajima
,
K
. (
2019
).
Optimal short-term memory before the edge of chaos in driven random recurrent networks
.
Physical Review E
,
100
(
6
), 062312.
Haviv
,
D.
,
Rivkind
,
A.
, &
Barak
,
O
. (
2019
).
Understanding and controlling memory in recurrent neural networks
. In
Proceedings of the International Conference on Machine Learning
(pp.
2663
2671
).
Hennequin
,
G.
,
Vogels
,
T. P.
, &
Gerstner
,
W
. (
2014
).
Optimal control of transient dynamics in balanced networks supports generation of complex movements
.
Neuron
,
82
(
6
),
1394
1406
.
Ichikawa
,
K.
, &
Kaneko
,
K
. (
2021
).
Short-term memory by transient oscillatory dynamics in recurrent neural networks
.
Physical Review Research
,
3
(
3
), 033193.
Ikemoto
,
S.
,
DallaLibera
,
F.
, &
Hosoda
,
K
. (
2018
).
Noise-modulated neural networks as an application of stochastic resonance
.
Neurocomputing
,
277
,
29
37
.
Jaeger
,
H.
(
2001
).
The “echo state” approach to analysing and training recurrent neural networks—with an erratum note
.
German National Research Center for Information Technology
,
GMD Technical Report
148
(
34
), 13.
Jaeger
,
H.
(
2014
).
Controlling recurrent neural networks by conceptors.
arXiv:1403.3369
.
Kadmon
,
J.
, &
Sompolinsky
,
H
. (
2015
).
Transition to chaos in random neuronal networks
.
Physical Review X
,
5
(
4
), 041030.
Kaneko
,
K.
, &
Suzuki
,
J.
(
1994
).
Evolution to the edge of chaos in an imitation game.
arXiv
.
Krauss
,
P.
,
Metzner
,
C.
,
Joshi
,
N.
,
Schulze
,
H.
,
Traxdorf
,
M.
,
Maier
,
A.
, &
Schilling
,
A
. (
2021
).
Analysis and visualization of sleep stages based on deep neural networks
.
Neurobiology of Sleep and Circadian Rhythms
,
10
, 100064.
Krauss
,
P.
,
Metzner
,
C.
,
Schilling
,
A.
,
Schütz
,
C.
,
Tziridis
,
K.
,
Fabry
,
B.
, &
Schulze
,
H
. (
2017
).
Adaptive stochastic resonance for unknown and variable input signals
.
Scientific Reports
,
7
(
1
),
1
8
.
Krauss
,
P.
,
Metzner
,
C.
,
Schilling
,
A.
,
Tziridis
,
K.
,
Traxdorf
,
M.
,
Wollbrink
,
A.
, . . .
Schulze
,
H
. (
2018
).
A statistical method for analyzing and comparing spatiotemporal cortical activation patterns
.
Scientific Reports
,
8
(
1
),
1
9
.
Krauss
,
P.
,
Prebeck
,
K.
,
Schilling
,
A.
, &
Metzner
,
C
. (
2019
).
“Recurrence resonance” in three-neuron motifs
.
Frontiers in Computational Neuroscience
,
13
.
Krauss
,
P.
,
Schilling
,
A.
,
Bauer
,
J.
,
Tziridis
,
K.
,
Metzner
,
C.
,
Schulze
,
H.
, &
Traxdorf
,
M
. (
2018
).
Analysis of multichannel EEG patterns during human sleep: A novel approach
.
Frontiers in Human Neuroscience
,
12
, 121.
Krauss
,
P.
,
Schuster
,
M.
,
Dietrich
,
V.
,
Schilling
,
A.
,
Schulze
,
H.
, &
Metzner
,
C
. (
2019
).
Weight statistics controls dynamics in recurrent neural networks
.
PLOS One
,
14
(
4
), e0214541.
Krauss
,
P.
,
Zankl
,
A.
,
Schilling
,
A.
,
Schulze
,
H.
, &
Metzner
,
C
. (
2019
).
Analysis of structure and dynamics in three-neuron motifs
.
Frontiers in Computational Neuroscience
,
13
, 5.
Kruskal
,
J. B.
(
1964
).
Nonmetric multidimensional scaling: A numerical method
.
Psychometrika
,
29
(
2
),
115
129
.
Kruskal
,
J. B.
(
1978
).
Multidimensional scaling
.
Sage
.
Langton
,
C. G
. (
1990
).
Computation at the edge of chaos: Phase transitions and emergent computation
.
Physica D: Nonlinear Phenomena
,
42
(
1–3
),
12
37
.
LeCun
,
Y.
,
Bengio
,
Y.
, &
Hinton
,
G
. (
2015
).
Deep learning
.
Nature
,
521
(
7553
),
436
444
.
Legenstein
,
R.
, &
Maass
,
W
. (
2007
).
Edge of chaos and prediction of computational performance for neural circuit models
.
Neural Networks
,
20
(
3
),
323
334
.
Maheswaranathan
,
N.
,
Williams
,
A. H.
,
Golub
,
M. D.
,
Ganguli
,
S.
, &
Sussillo
,
D
. (
2019
).
Universality and individuality in neural dynamics across large populations of recurrent networks
. In
H.
Wallach
,
H.
Larochelle
,
A.
Beygelzimer
,
F.
d’Alché-Buc
,
E.
Fox
, &
R.
Garnett
(Eds.),
Advances in neural information processing systems
,
32
(pp.
15629
15641
).
Curran
.
Marder
,
E.
, &
Calabrese
,
R. L
. (
1996
).
Principles of rhythmic motor pattern generation
.
Physiological Reviews
,
76
(
3
),
687
717
.
Metzner
,
C.
, &
Krauss
,
P.
(
2021
).
Dynamical phases and resonance phenomena in information-processing recurrent neural networks.
arXiv:2108.02545
.
Metzner
,
C.
, &
Krauss
,
P
. (
2022
).
Dynamics and information import in recurrent neural networks
.
Frontiers in Computational Neuroscience
,
16
.
Metzner
,
C.
,
Schilling
,
A.
,
Traxdorf
,
M.
,
Schulze
,
H.
,
Tziridis
,
K.
, &
Krauss
,
P.
(
2023
).
Extracting continuous sleep depth from EEG data without machine learning.
arXiv:2301.06755
.
Miner
,
D.
, &
Triesch
,
J
. (
2016
).
Plasticity-driven self-organization under topological constraints accounts for non-random features of cortical synaptic wiring
.
PLOS Computational Biology
,
12
(
2
), e1004759.
Molgedey
,
L.
,
Schuchhardt
,
J.
, &
Schuster
,
H. G
. (
1992
).
Suppressing chaos in neural networks by noise
.
Physical Review Letters
,
69
(
26
), 3717.
Moon
,
K. R.
,
van Dijk
,
D.
,
Wang
,
Z.
,
Gigante
,
S.
,
Burkhardt
,
D. B.
,
Chen
,
W. S.
, . . .
Krishnaswamy
,
S
. (
2019
).
Visualizing structure and transitions in high-dimensional biological data
.
Nature Biotechnology
,
37
(
12
),
1482
1492
.
Moser
,
E. I.
,
Kropff
,
E.
, &
Moser
,
M.-B
. (
2008
).
Place cells, grid cells, and the brain’s spatial representation system
.
Annual Review of Neuroscience
,
31
,
69
89
.
Narang
,
S.
,
Elsen
,
E.
,
Diamos
,
G.
, &
Sengupta
,
S.
(
2017
).
Exploring sparsity in recurrent neural networks
.
arXiv:1704.05119
.
Natschläger
,
T.
,
Bertschinger
,
N.
, &
Legenstein
,
R
. (
2005
).
At the edge of chaos: Real-time computations and self-organized criticality in recurrent neural networks
. In
I.
Guyon
,
Y. V.
Luxburg
,
S.
Bengio
,
H.
Wallach
,
R.
Fergus
,
S.
Vishwanathan
, &
R.
Garnett
(Eds.),
Advances in neural information processing systems
,
17
(pp.
145
152
).
Curran
.
Rajan
,
K.
,
Abbott
,
L.
, &
Sompolinsky
,
H
. (
2010
).
Stimulus-dependent suppression of chaos in recurrent neural networks
.
Physical Review E
,
82
(
1
), 011903.
Rivkind
,
A.
, &
Barak
,
O
. (
2017
).
Local dynamics in trained recurrent neural networks
.
Physical Review Letters
,
118
(
25
), 258101.
Rowland
,
D. C.
,
Roudi
,
Y.
,
Moser
,
M.-B.
, &
Moser
,
E. I
. (
2016
).
Ten years of grid cells
.
Annual Review of Neuroscience
,
39
,
19
40
.
Schäfer
,
A. M.
, &
Zimmermann
,
H. G
. (
2006
).
Recurrent neural networks are universal approximators
. In
Proceedings of the International Conference on Artificial Neural Networks
(pp.
632
640
).
Schilling
,
A.
,
Maier
,
A.
,
Gerum
,
R.
,
Metzner
,
C.
, &
Krauss
,
P
. (
2021
).
Quantifying the separability of data classes in neural networks
.
Neural Networks
,
139
,
278
293
.
Schilling
,
A.
,
Tomasello
,
R.
,
Henningsen-Schomers
,
M. R.
,
Zankl
,
A.
,
Surendra
,
K.
,
Haller
,
M.
, . . .
Krauss
,
P
. (
2021
).
Analysis of continuous neuronal activity evoked by natural speech with computational corpus linguistics methods
.
Language, Cognition and Neuroscience
,
36
(
2
),
167
186
.
Schrauwen
,
B.
,
Buesing
,
L.
, &
Legenstein
,
R
. (
2009
).
On computational power and the order-chaos phase transition in reservoir computing
. In
Y.
Benjio
,
D.
Schuurmans
,
J.
Lafferty
,
C.
Williams
, &
A.
Culotta
(Eds.),
Advances in neural information processing systems, 21
(pp.
1425
1432
).
Curran
.
Schuecker
,
J.
,
Goedeke
,
S.
, &
Helias
,
M
. (
2018
).
Optimal sequence memory in driven random networks
.
Physical Review X
,
8
(
4
), 041029.
Solé
,
R. V.
, &
Miramontes
,
O
. (
1995
).
Information at the edge of chaos in fluid neural networks
.
Physica D: Nonlinear Phenomena
,
80
(
1–2
),
171
180
.
Song
,
S.
,
Sjöstrom
,
P. J.
,
Reigl
,
M.
,
Nelson
,
S.
, &
Chklovskii
,
D. B
. (
2005
).
Highly nonrandom features of synaptic connectivity in local cortical circuits
.
PLOS Biology
,
3
(
3
), e68.
Sporns
,
O
. (
2011
).
The non-random brain: Efficiency, economy, and complex dynamics
.
Frontiers in Computational Neuroscience
,
5
, 5.
Torgerson
,
W. S
. (
1952
).
Multidimensional scaling: I. Theory and method
.
Psychometrika
,
17
(
4
),
401
419
.
Toyoizumi
,
T.
, &
Abbott
,
L
. (
2011
).
Beyond the edge of chaos: Amplification and temporal integration by recurrent networks in the chaotic regime
.
Physical Review E
,
84
(
5
), 051908.
Traxdorf
,
M.
,
Krauss
,
P.
,
Schilling
,
A.
,
Schulze
,
H.
, &
Tziridis
,
K
. (
2019
).
Microstructure of cortical activity during sleep reflects respiratory events and state of daytime vigilance
.
Somnologie
,
23
(
2
),
72
79
.
Vallejos
,
C. A
. (
2019
).
Exploring a world of a thousand dimensions
.
Nature Biotechnology
,
37
(
12
),
1423
1424
.
Van der Maaten
,
L.
, &
Hinton
,
G.
(
2008
).
Visualizing data using t-SNE
.
Journal of Machine Learning Research
,
9
(
11
).
Wallace
,
E.
,
Maei
,
H. R.
, &
Latham
,
P. E
. (
2013
).
Randomly connected networks have short temporal memory
.
Neural Computation
,
25
(
6
),
1408
1439
.
Wang
,
X. R.
,
Lizier
,
J. T.
, &
Prokopenko
,
M
. (
2011
).
Fisher information at the edge of chaos in random Boolean networks
.
Artificial Life
,
17
(
4
),
315
329
.
Wattenberg
,
M.
,
Viégas
,
F.
, &
Johnson
,
I
. (
2016
).
How to use t-SNE effectively
.
Distill
,
1
(
10
), e2.

Supplementary data