Recently multineuronal recording has allowed us to observe patterned firings, synchronization, oscillation, and global state transitions in the recurrent networks of central nervous systems. We propose a learning algorithm based on the process of information maximization in a recurrent network, which we call recurrent infomax (RI). RI maximizes information retention and thereby minimizes information loss through time in a network. We find that feeding in external inputs consisting of information obtained from photographs of natural scenes into an RI-based model of a recurrent network results in the appearance of Gabor-like selectivity quite similar to that existing in simple cells of the primary visual cortex. We find that without external input, this network exhibits cell assembly–like and synfire chain–like spontaneous activity as well as a critical neuronal avalanche. In addition, we find that RI embeds externally input temporal firing patterns to the network so that it spontaneously reproduces these patterns after learning. RI provides a simple framework to explain a wide range of phenomena observed in in vivo and in vitro neuronal networks, and it will provide a novel understanding of experimental results for multineuronal activity and plasticity from an information-theoretic point of view.
Recent advances in multineuronal recording have allowed us to observe phenomena in the networks of the central nervous system (CNS) that are much more complex than previously thought to exist. The existence of interesting types of neuronal activity, such as patterned firings, synchronization, oscillation, and global state transitions, has been revealed by multielectrode recording and calcium imaging (Nadasdy, Hirase, Czurko, Csicsvari, & Buzsaki, 1999; Cossart, Aronov, & Yuste, 2003; Ikegaya et al., 2004; Fujisawa, Matsuki, & Ikegaya, 2006; Sakurai & Takahashi, 2006). However, in contrast to the rapidly accumulating experimental data, theoretical work attempting to account for this wide range of data has been slower to materialize. These new data are partly explained by the classical hypotheses proposed purely on theoretical grounds, such as the “cell assembly” of Hebb (1949). However, to explain a wider range of data, we have to extend the classical hypotheses on the basis of mathematics and information sciences.
More specifically, a learning algorithm based on infomax in feedforward networks generates information-efficient representation of the input in the output neurons of the feedforward network (see Figures 2A1 and 2A2). This algorithm adjusts the connection weights to realize the most efficient information transfer from the input to the output. In this way, a network with small mutual information of input and output, that is, large information loss (see Figure 2A1), evolves into a network that preserves a larger percentage of information (see Figure 2A2) through this algorithm. If the optimization based on infomax is applied to a recurrent network in which the input to the neurons at time t consists of only their own output at time t − 1, the mutual information of two successive states, , is maximized; that is, the information loss through time is minimized. We call this form of infomax recurrent infomax (RI). An algorithm based on RI readjusts the connection weights of the recurrent network to change a random network with large information loss (see Figure 2B1) into an information-efficient network (see Figure 2B2). The role of RI is to allow a recurrent network to optimize the synaptic connection weight in order to maximize information retention and thereby minimize information loss by maximizing the mutual information of the temporally successive states of the network.
In this letter, proposing a learning algorithm based on RI, we find that feeding in external inputs consisting of information obtained from photographs of natural scenes into an RI-based model of a recurrent network results in the appearance of Gabor-like selectivity quite similar to that existing in simple cells of the primary visual cortex (V1). More important, we find that without external input, this network exhibits cell assembly–like and synfire chain–like stereotyped spontaneous activity (Hebb, 1949; Abeles, 1991; Diesmann et al., 1999) and a critical neuronal avalanche (Beggs & Plenz, 2003; Teramae & Fukai, 2007; Abbott & Rohrkemper, 2007). RI provides a simple framework to explain a wide range of phenomena observed in in vivo and in vitro neuronal networks, and it should provide a novel understanding of experimental results for multineuronal activity and plasticity from an information-theoretic point of view.
Input xi(0) to the neurons at the first step t = 1 of the simulation was set to 0, and in the following steps, xi(t) was determined stochastically with equation 2.1. Unless otherwise stated, the neurons in the model network do not have other inputs than their outputs at the previous step, and thereby the dynamics of the network are completely determined by equations 2.1 and 2.2 (see Figure 3A).
We performed simulations in blocks consisting of 20,000 to 100,000 time steps, updated Wij at the end of each block, and then started the calculation for the next block (see Figure 3B). Outputs of the neurons at the last step of block b − 1 were given as inputs to the neurons at the first step of block b. A simulation consists of 500 to 15,000 blocks.
All models in this letter can be fully characterized by parameters N (50– 432), (0.002–0.05), pmax (0.25–0.95), η (0.2–20), ϵ (0.01), and wlimit (100–1000). Parameter values used in simulations are included in figure captions. At the beginning of the simulation, Wij was drawn from a uniform distribution on [ − 0.5, 0.5] and hi was set to 0.
We first observed the behavior of this model network under external input. Image patches from a photograph preprocessed by a high-pass filter were used as the external input (see Figure 5A). The neurons in this network were divided into three groups: 144 on-input and 144 off-input neurons, and the 144 output neurons were randomly selected from the network (see Figure 5B1). Pixels with positive and negative values in a randomly selected 12 × 12 image patch excited the corresponding on-input and off-input neurons, respectively. The states of the input neurons were stochastically set to 1 or 0 with firing probabilities proportional to the intensities of the corresponding pixels, whereas the states of the output neurons were not set by the external input (see Section 2 for details). Instead, the firings of these neurons were determined by equation 2.1 with pmax = 0.95. Initially the connection weight Wij was a random matrix (see Figure 5C1), and we found that output neurons did not exhibit clear selectivity with respect to the external input from the input neurons (see Figure 5D1) upon averaging the image patches that evoked firings in an output neuron. After learning, however, the network self-organized a feedforward structure from the on-input and off-input neurons to the output neurons (see Figures 5B2 and 5C2). The output neuron became highly selective to Gabor function-like stimuli (see Figure 5D2), exhibiting behavior quite similar to the selectivity of simple cells in the V1 cortex (Hubel & Wiesel, 1959). Our optimization algorithm based on RI hence caused the model network to become organized into a feedforward network containing simple cell–like output neurons. It has been proven that the infomax accounts for the selectivity of simple cells (Bell & Sejnowski, 1995, 1997). Bell and Sejnowski (1997) argued that the natural image patches are composed of independent localized edges such as Gabor functions and that these components can be recovered by maximizing the mutual information of the input and the output. We thus see that this result is consistent with the previous studies based on information theory.
In the simulation described above, the external input was fed into a network with high response reliability (pmax = 0.95). Next, we examined the evolution of the spontaneous activity in a neuronal network without external input. In this network, the approximate mutual information I(1) of two successive states was maximized, and the approximate mutual information I(n) of two states interleaved with n − 1 steps after learning became larger than I(n) before learning (see Figure 6A). We supposed that this improvement in information retention was a result of the emergence of repeated activity in the network. To identify repeated activity in the model network, we defined a repeated pattern as a spatial pattern of neuronal firings that occurs at least twice in the latter half of a test block (see Figure 6B). Coloring repeated patterns consisting of three or more firing neurons in raster plots of the network (see Figures 6D1 and 6D2), we found that the number of repeated patterns increased after learning. Several patterns were repeated in a sample of 250 steps, as seen in Figure 6D2, where the repeated patterns are indicated by consistently colored circles and connected by lines. Moreover, some patterns appeared to constitute repeated sequences. For example, sequence A, composed of the magenta, orange, and purple patterns, appears three times in Figure 6D2. To quantify the increase in repetition, we tabulated the numbers of occurrences of repeated patterns and sequences and compared these numbers before and after learning (see Figure 6C). We found that both repeated patterns and repeated sequences increased significantly after learning. This indicates that the algorithm embeds not only repeated patterns but also repeated sequences of firings into the network structure as a result of the optimization.
When a pattern in a sequence is activated at one step, it is highly probable that the next pattern in that sequence will be activated at the next step. This predictability means that the state of the network at one time step shares much information with the state at the next time step. In contrast, when the dynamics of a network is highly stochastic and thereby repeated patterns are rare, we cannot predict which pattern follows a given pattern or reduce the uncertainty of the next pattern by using the knowledge of the pattern. In this case, mutual information of two successive states is low. Sequences must be repeatedly activated, and the network must be deterministic in order to efficiently retain information in a recurrent network. Hence, we conclude that the repeated activation of an embedded sequence is an efficient way to maximize information retention in a recurrent network. These repeated patterns and sequences have been experimentally observed in vivo (Skaggs & McNaughton, 1996; Sakurai & Takahashi, 2006; Yao, Shi, Han, Gao, & Dan, 2007) and in vitro (Cossart et al., 2003; Ikegaya et al., 2004), and their existence is suggested by the theory of cell assemblies proposed by Hebb (1949) and the theory of synfire chains proposed by Abeles (1991). We thus see that RI accounts for the appearance of cell assemblies, sequences, and synfire chains in neuronal networks.
In the simulations shown above, a small fraction of connections grew especially strong in the network after learning (see Figure 6E2). So we ask, Is the existence of a small number of strong connections a sufficient condition for efficient information transfer? To answer this, we randomly shuffled the components of the weight matrix of the network after learning shown in Figure 6, and we found that shuffled networks exhibited lower mutual information and a smaller number of occurrences of repeated sequences (see Figures 7A and 7B). Thus, the existence of strong connections does not necessarily imply that the network is efficient in retaining information. RI improves information retention in recurrent networks, while randomly introducing strong connections does not.
We next examined the behavior of the same spontaneous model in the case that the maximal firing probability was small (pmax = 0.5). For small pmax, the number of identically repeated sequences is small, and the network seems to lose structured activity. However, we found characteristic network activity consisting of firing in bursts (see Figure 8A2), which are defined as consecutive firing steps that are immediately preceded and followed by “silent” steps, with no firing. We found that after learning, the distribution P(s) of the burst size s, which is the total number of firings in a burst, obeys a power law distribution P(s) ∝ sγ with γ ≈ −1.5, whereas before learning, we have P(s) ∝ exp(−αs) (Figure 8C). This result is consistent with experimental results. Recently Beggs and Plenz (2003) recorded the spontaneous activity of an organotypic culture from the cortex using multielectrode arrays. Defining an avalanche similarly to our bursts following a period of inactivity, they found that the size distribution of avalanches is accurately fit by a power law distribution with exponent −1.5. To explain this, they argued that a neuronal network is tuned to minimize the information loss and that this is realized when one firing induces an average of one firing at the next step. They showed that this condition yields the universal exponent −3/2, using the self-organized criticality of the sandpile model (Bak, Tang, & Wiesenfeld, 1987; Harris, 1989). This condition also holds for the present network because, after learning, each neuron with pmax = 0.5 had two strong input connections and two strong output connections on average (see Figure 8B2). The universal exponent −3/2 was observed in the network for small pmax (see Figure 8C), but not for pmax = 0.95. Actually the size distribution of bursts P(s) in the system did not exhibit a power law distribution and displayed several peaks, reflecting the existence of stereotyped sequences (data not shown). We thus conclude that RI embeds information-efficient structures in which one firing induces on average one firing at the next step in a network with small pmax.
To reveal the essential mechanism responsible for the behavior described above, we returned to the recurrent network with an external input (see Figure 9). It has been observed that the hippocampal firing sequences in the awake state are repeated during sleep (Skaggs & McNaughton, 1996; Louie & Wilson, 2001) and that the spontaneous spiking activity in the visual cortex mimics the movie-evoked response after repeated exposure to a movie (Yao et al., 2007). We investigated whether the firings presented during the learning period are replayed by the model after the learning. In the learning blocks, we repeatedly stimulated neurons 1, 3, and 2 in sequence (see Figures 9A1 and 9B1). In the learning blocks, the state of neuron 1 was set to 1 (fire) at random intervals ranging from 50 to 99 steps (time step t). At t + 2, the state of neuron 3 was set to 1, and at t + 6, the state of neuron 2 was set to 1. In the successive test block, in which only neuron 1 was stimulated externally (see Figure 9A2), the firing of neuron 1 was followed by spontaneous firings of neurons 3 and 2 (see Figure 9B2, arrows). In addition, the spontaneous firing of neuron 1 triggers the sequence containing the firings of neurons 3 and 2 (see Figure 9B2, double arrows). The form of the weight matrix after learning reveals that a feedforward structure starting from neuron 1 (1 → 7, 34 → 3, 5 → 49 → 18 → 11, 28 → 2) was embedded in the network (see Figure 9C). This structure self-organizes in the network because, as we saw above, embedding a sequence of firings into the network structure is an efficient way to retain information. It is thus seen that RI embeds externally input temporal firing patterns into the network by producing feedforward structures, and, as a result, the network can spontaneously reproduce the patterns.
In this study, we have found that infomax in recurrent networks acts to optimize the network structure by maximizing the information retained in the recurrent network. Much previous work concerning infomax in feedforward networks (Linsker, 1988; Atick, 1992; Bell & Sejnowski, 1995, 1997; Lewicki, 2002) has suggested that the stimulus selectivity of neurons in CNSs is accounted for by infomax in feedforward networks. In contrast, although infomax in recurrent networks has been studied, infomax is applied to only small recurrent networks that can be studied by using a random search (Ay, 2002). This is because the analysis of recurrent networks is complicated by history-dependent dynamics due to the recurrent connections. In the model presented here, approximating the mutual information of two successive states with second-order correlations of neuronal firings, we succeeded in deriving an algorithm that maximizes information retention in recurrent networks. The model reproduced the self-organization of simple cell-like selectivity shown in the previous models, and we successfully extended these previous results to the spontaneous activity characteristic of recurrent networks. In the context of a simple maze task, for example, these repeated patterns can be regarded as memory traces representing spatial cues and relationships between successive items, and they have been supposed to help an animal in solving the maze task (Dragoi & Buzsaki, 2006). An internal representation of the external input is essential in adaptation to environments, and the internal representation is constructed by RI in the form of feedforward structures.
We have found that infomax in recurrent networks reproduces self-organization of cell assemblies and neuronal avalanches. In contrast, most previous theoretical studies on cell assemblies, synfire chains, and neuronal avalanches investigated the dynamics of neuronal firings on a network in which a feedforward structure underlying this characteristic type of activity had been embedded (Diesmann et al. 1999; Beggs & Plenz, 2003; Teramae & Fukai, 2007). Although these models successfully reproduced experimental results, they could not explain how the embedded network structure emerges. A recent theoretical study suggested that neuronal avalanches are accounted for by a simple model for the growth of dendritic and axonal processes (Abbott & Rohrkemper, 2007). It seems that this model self-organizes a network structure that maximizes retained information as in our model.
In our model, the network structure self-organized by the optimization algorithm resulted in simple cell-like activity, repeated sequences, and neuronal avalanches. Through evolution, animals have acquired CNSs, which are extremely efficient information processing devices that improve an animal's adaptability to various environments. It is thus quite natural that these phenomena can be regarded as a result of the optimization of information retention. Thus, in this letter and our model, we have focused on information retention in a recurrent network, although CNSs should be optimized not only for information retention but also for categorization and generalization. On the other hand, previous studies showed that synaptic plasticity rules experimentally observed and theoretically proposed optimize the information transmission of individual synapses (Toyoizumi, Pfister, Aihara, & Gerstner, 2005; Pfister, Toyoizumi, Barber, & Gerstner, 2006). Thus, neuronal networks with local plasticity rules optimized to retain information could reproduce the experimental results of repeated activity patterns and avalanches. However, the learning rule of our model is not local and requires global information. We can optimize the activity of, for example, the half of the neurons in the network if we approximate the mutual information of these N/2 neurons using the N/2 × N/2 correlation matrix and update the connection weights among these neurons, leaving other connection weights unchanged. Then we observe that the occurrences of repeated sequences increases after this learning but not as much as in the simulation shown in Figure 6 (data not shown). Although this learning rule requires the information on only the half the neurons in the network, this rule is not local and requires global information on the activity of these N/2 neurons in the system. To overcome this problem, our next goal is to derive a biologically plausible plasticity rule in a bottom-up way employing RI and to compare this rule with experimentally obtained plasticity rules. We believe that RI will help us understand the meaning of in vivo and in vitro experimental results, particularly to characterize the spontaneous activity of neurons in the context of information theory.
Appendix A: Algorithm
Here we describe the algorithm to maximize the mutual information of the present state, X, and the next state, , of the network.
N neurons receive as input an output x = [xi(t)] at time t and generate an output at time t + 1. Neuron i takes two states: a firing state, xi = 1, and a nonfiring state, xi = 0. The firing probability of neuron i at time t + 1 is given by equation 2.1. We assume that Wij can take positive and negative values, with positive and negative Wij corresponding to excitatory and inhibitory connections, respectively. The threshold hi(t) evolves according to equation 2.2 and fixes the mean firing probability of neuron i to .
Although the distributions of x and are not gaussian because of the discreteness of the neuronal states, this approximation gives a good estimate of the mutual information. We compared the mutual information of two consecutive steps with this approximation. Figure 10 shows that mutual information is fit quite well by the form . Because this approximation requires only correlation matrices, it enables us to estimate the mutual information of N neurons, whose calculation in its original form requires the joint probability distribution of 22N realizations of the firing states.
In addition, the quantity in equation A.1 is a good index of the information retained in a recurrent network even when it deviates significantly from the value of the mutual information. Maximizing equation A.1 results in the decorrelation of the state x due to log |C|, as well as in the increase of the correlation between the the state x and the next state , owing to . A strong correlation between the states of the network at two successive steps increases the amount of information transmitted over time, and strong decorrelation among the neurons at a step increases the information capacity of the network. Thus, equation A.1 is an effective measure of the information retained in the recurrent network. Another advantage of using equation A.1 as the value function is that this function can be calculated by using only the second-order correlations. Although higher-order correlations are useful in estimating the mutual information, calculating higher-order correlations is time-consuming in numerical simulations and complicates the theoretical analysis. In the following derivation of the algorithm, we use equation A.1 and thus employ an approximation of the mutual information in which the contribution of the higher-order correlations is not taken into account.
Appendix B: Method for Counting the Repeated Patterns and Sequences
In Figure 6, we presented the number of repeated patterns and sequences before and after learning. Defining repeated patterns as exact patterns that occur multiple times, we excluded incompletely matched patterns from the definition of repeated patterns. This is because we wanted to simplify the definition in order to make the result clear. Of course, we could define it such that patterns with small differences can be regarded as a single repeated pattern. For example, if two patterns with one mismatch or less, such as patterns a and b in Figure 12A, are regarded as the same pattern, patterns b and c would also be regarded as the same pattern. Patterns a and c, however, cannot be regarded as the same pattern, because the states of two of their neurons differ. Thus, in general, even if some patterns, a and b, are considered to be the same pattern and some patterns, b and c, are considered to be the same, patterns a and c may not be the same according to this definition. Thus, classifying two slightly different patterns into one repeated pattern makes the definition of the repeated patterns less meaningful.
We defined a repeated sequence as an exact series of patterns that occurs more than once in a block. A repeated sequence is thus composed of repeated patterns. Moreover, a repeated sequence is composed of shorter repeated sequences. For example, each repeated sequence of length 4 contains three repeated sequences of length 2 (see Figure 12B). In general, a repeated sequence of length l1 contains l1 − l2 + 1 repeated sequences of length l2 < l1. At first glance, it might seem that this way of counting repeated sequences overestimates the number of occurrences of repeated sequences and should be replaced by some more sophisticated method, such as a definition that does not count the short sequences contained in a longer repeated sequence as a repeated sequence. Such a method of counting, however, underestimates the number of repeated sequences. If a sequence B of length 2 occurs three times, twice in a repeated sequence D of length 4 (B2 in D1 and B3 in D2 of Figure 12C) and once outside longer sequences (B1 in Figure 12C), this modified way of counting fails to count the sequence B1 as an occurrence of the repeated sequence of length 2 even though this sequence is indeed repeated. To avoid this kind of failure, we counted sequences as repeated even when they were contained in longer repeated sequences. Thus, each of the sequences A, B, C, and D occurs twice in Figure 12B, and the sequences B and D occur three times and twice, respectively, in Figure 12C.
This work was supported by grants-in-aid from the Ministry of Education, Science, Sports, and Culture of Japan: Grant numbers 16200025, 17022020, 17650100, 18019019, 18047014, and 18300079.