Living organisms must actively maintain themselves in order to continue existing. Autopoiesis is a key concept in the study of living organisms, where the boundaries of the organism are not static but dynamically regulated by the system itself. To study the autonomous regulation of a self-boundary, we focus on neural homeodynamic responses to environmental changes using both biological and artificial neural networks. Previous studies showed that embodied cultured neural networks and spiking neural networks with spike-timing dependent plasticity (STDP) learn an action as they avoid stimulation from outside. In this article, as a result of our experiments using embodied cultured neurons, we find that there is also a second property allowing the network to avoid stimulation: If the agent cannot learn an action to avoid the external stimuli, it tends to decrease the stimulus-evoked spikes, as if to ignore the uncontrollable input. We also show such a behavior is reproduced by spiking neural networks with asymmetric STDP. We consider that these properties are to be regarded as autonomous regulation of self and nonself for the network, in which a controllable neuron is regarded as self, and an uncontrollable neuron is regarded as nonself. Finally, we introduce neural autopoiesis by proposing the principle of stimulus avoidance.
Learning is an important aspect of neural systems, and it is crucial for animals, as embodied neural systems, to learn adaptive behavior to survive. One of the key concepts in the study of adaptive behavior is homeostasis. Ashby argued that adaptive behavior is an outcome of the homeostatic property of living systems . Ikegami and Suzuki have proposed the concept of homeodynamics, where an autonomous self-moving state emerges from a homeostatic state , and Iizuka and Di Paolo have reported that adaptive behavior is an indispensable outcome of homeostatic neural dynamics [11, 20]. However, those models are still too abstract to compare with the homeostatic property of biological neural dynamics. In this article, we study neural homeodynamic responses to environmental changes, using both biological neural cells and a more biologically plausible computational model of neurons.
Biological neural cells cultured in vitro have been used to study neural systems [2, 12, 27, 37], because such cultured neurons are easier to study than neurons in vivo, since they are composed of fewer neurons, and cultured in a more stable environment. Using cultured neurons is also advantageous because unknown, complex features in neural cells, which are still difficult to implement in artificial neural networks, can potentially be used. Although cultured neural systems are much simpler than real brains, they have the essential properties, including spontaneous activity, various types and distributions of cells, high connectivity, and rich and complex controllability . Homeostatic adaptivity may be one such property.
Artificial neural networks are often used as models of biological neural networks to understand learning mechanisms . Recently, the simulation of the more realistic artificial neural networks has become computationally efficient through the introduction of models of spiking neurons [4, 23, 25] and of synaptic plasticity [6, 9, 44]. These more realistic models can lead to theoretical understanding of biological neural networks.
Following the organization of biological experiments and the development of a computational model, we propose the homeodynamic control of a neural network, based on the new learning principle of synaptic plasticity, by which neural networks learn to avoid stimulation from the environment.
1.1 Learning by Stimulation Avoidance
Shahaf and Marom demonstrated that a cultured neural network can learn a desired behavior as if the network avoided stimulation using the following protocols . First, an electrical stimulation with a fixed low frequency (e.g., 1–2 Hz) was delivered to a predefined input zone of the network. When the desired behavior appeared, the stimulation was removed. After this protocol was repeated, the network learned to produce the expected behavior in response to the stimulation. In practice, the authors showed that the networks learned to produce spikes in predefined output zones, in a predefined time window (40–60 ms after each stimulus), in response to the stimulation applied in the input zone.
Marom and Shahaf explained these results by invoking the stimulus regulation principle (SRP) . The SRP is composed of the following two functions at the neural network level: (i) modifiability, by which stimulation drives the network to try to form different topologies by modifying neuronal connections, and (ii) stability, by which removing the stimulus stabilizes the last configuration of the network. They argued that their preliminary experiments suggested that cultured neural networks had these two functions in the network . However, the two properties that constitute the SRP are macroscopic phenomenological explanations, which do not form a concrete mechanism. Furthermore, assuming that modifiability is correct, learned configurations are destroyed at every stimulation; if stability is correct, it should not be necessary to repeat such a cycle as in the experiment explained above.
In a previous study, we proposed a mechanism, termed learning by stimulation avoidance (LSA), on the micro scale of neural dynamics, using small simulated networks to explain Shahaf and Marom's results . LSA is based simply on a classical form of spike-timing-dependent plasticity (STDP) . When a presynaptic neuron and a postsynaptic neuron fire successively within a certain time window, the synaptic weight increases (long-term potentiation, or LTP). If the timing of the two neurons' firing is reversed, the weight decreases (long-term depression, or LTD). STDP has been found in both in vivo and in vitro networks .
The following two dynamics emerge for the avoidance of stimulation in LSA, based on STDP. The first dynamic is the reinforcement of behavior that decreases the stimulation by LTP. If the firing of the postsynaptic neuron terminates the cause of the presynaptic neuron, then the synaptic weight from the presynaptic neuron to the postsynaptic neuron will increase. Thus the behavior leading to the decrease in stimulation is reinforced. The second dynamic is the weakening of behavior that increases the stimulation by LTD. If the firing of the postsynaptic neuron initiates the cause of the presynaptic neuron, then the synaptic weight from the presynaptic neuron to the postsynaptic neuron will decrease. Thus the behavior leading to the increase in stimulation is weakened. This is the most basic structure of LSA.
LSA states that the network learns to avoid external stimuli by learning available behaviors. In the follow-up studies, we further showed that LSA scales up to larger networks . LSA works as long as the following two conditions are met: (i) the plasticity of the embodied neural network is driven by STDP, and (ii) the network constitutes a closed loop with the environment. We claim that LSA is an emergent property of a spiking network with Hebbian rules  and its environment.
Such stimulation avoidance can produce homeostasis: Since the unexpected stimulation of an agent resulting from its environment represents environmental changes, avoiding the stimulation decreases the influence of environmental changes on the internal state of the agent. It is interesting that such homeostatic global behavior directly emerges from simple synaptic dynamics, such as STDP. This property of stimulus avoidance can also be regarded as an intrinsic motivation that emerged from the local dynamics of neurons in a bottom-up manner.
Shahaf and Marom's results are promising in that they showed that behaviors that avoid stimulation will be learned autonomously in cultured neurons . However, there have been no further studies on this learning dynamics. Although Shahaf and Marom used cultures with 10,000–50,000 neurons, a smaller number of cultured neurons would be able to learn in the same manner, if the same dynamics as LSA works in cultured neurons. Thus we first performed learning experiments using a smaller number of cultured neurons than previous works have used, to demonstrate how such a learning mechanism scales from small to large cultured neurons.
In addition, the results of the experiments showed that if the network cannot learn a behavior that removes external stimuli, its response to stimulation is gradually suppressed, as if it isolates the uncontrollable input neurons. This means that the second property, where the network avoids the effects of stimulation through weakening the connection from uncontrollable input neurons, is used alongside LSA to avoid stimulation.
We also found that such a behavior can be reproduced by simulated spiking neural networks with asymmetric STDP, where the functions of LTP and LTD are rotationally asymmetric (e.g., the working window for LTD is larger than the window for LTP). These dynamics can be interpreted as the embodied neural network autonomously regulating the boundary between self and nonself.
Furthermore, it should be noted here that updating the concept of autopoiesis was a broad motivation for this study. Autopoiesis has been proposed by  as a fundamental principle of living systems. A microscopic level of a system's organization is iteratively compensated for by the emerging macro state; taking cell dynamics as an example, the resultant organization is a cellular boundary organized in a chemical space that distinguishes between self and nonself. This perspective is commonly applied to both the immune and neural systems of vertebrates. Immune systems can distinguish between the material self and nonself, and neural systems can distinguish between “informational” self and nonself. The neural autopoiesis demonstrated in this article is based on the principle of stimulus avoidance observed in neural networks.
2 Materials and Methods
We used both cultured neurons and spiking neural networks for studying the homeostatic properties in embodied neural networks. Below, we first describe the methods for using cultured neurons, then the methods for using spiking neural networks, and finally each experimental setup.
2.1 Cultured Neurons
2.1.1 Cell Culture
The neural cultures were prepared from the cerebral cortex of E18 Wistar rats, as previously reported [3, 51, 52]. The cortex region was trypsinized with 0.25% trypsin, and the dissociated cells were plated and cultured on a recording device. The surfaces of the electrodes on the device were coated with 0.05% polyethylenimine and laminin to improve plating efficiency. The cells were cultured in neurobasal medium (Life Technologies, California, USA) containing 10% l-glutamine (Life Technologies, California, USA) and 2% B27 supplement (Life Technologies, California, USA) for the first 24 h. After the first 24 h, half the plating medium was replaced with growth medium (Dulbeccos modified Eagles medium (Life Technologies, California, USA)) that contained 10% horse serum, 0.5 mM GlutaMAX (Life Technologies, California, USA), and 1 mM sodium pyruvate. The cultures were placed in an incubator at 37°C with an H2O-saturated atmosphere that consisted of 95% air and 5% CO2. During cell culturing, half the medium was replaced once a week with the growth medium. All cultures used in our experiments consisted of 47–110 cells and were sufficiently matured to show global burst synchronization.
2.1.2 CMOS-based High-Density Microelectrode Arrays
A high-density microelectrode array based on complementary metal-oxide semiconductor (CMOS) technology  was used to measure the extracellular electrophysiological activity of the cultured neurons (Figure 1). This CMOS-based electrode array is superior to the conventional multi-electrode array (MEA) used previously , in that it has a far higher spatiotemporal resolution. The number of electrodes in conventional MEAs is small (e.g., 64), and the locations of the recording electrodes are predetermined using a large interelectrode distance (e.g., 200 μm). Thus, it is difficult to identify signals from an individual cell. In contrast, the CMOS arrays have 11,011 electrodes; the diameter of the electrode is 7 μm, with an interelectrode distance of 18 μm over an area of 1.8 mm × 1.8 mm. Thus this method can identify signals from an individual cell in a small culture. The device can simultaneously record the electrical activity of 126 electrodes at a sampling rate of 20 kHz.
2.1.3 Estimation of Neuronal Somata Locations
Before recording the neural activities, the 11,011 electrodes were scanned to obtain an electrical activity mapping to estimate the locations of the neuronal somata (i.e., identify the positions of neural cells). The scanning session consisted of 95 recordings. In each recording, the electrical activities for 110–120 electrodes were simultaneously recorded for 60 s. In the recordings, the sampling frequency was set to 20 kHz and the band path filter was set to 0.5–20 kHz. An electrical activity map was obtained by averaging the height of the action potentials for each electrode. We applied a Gaussian filter to the map and assumed that the neuronal somata were located near the local peaks on the Gaussian-filtered map. At most, 126 peaks were selected, in descending order, as the positions of neural cells, and the electrodes nearest the peaks were used to record the neural activity. If the number of local peaks was smaller than 126, then all the peaks were used. Using this method, one electrode can ideally represent a single neural state.
2.1.4 Estimation of Excitatory and Inhibitory Synaptic Conductances
It was important that we were able to identify the neuron's type for these experiments, as the neural activity was recorded, and stimulation was applied, for each neuron, not for a group of neurons. The neuronal cell type (i.e., excitatory or inhibitory) was estimated using spike shapes, which were recorded for 10 min before the main experiment . The shapes of the action potential of these two neural types differ: In the action potential of excitatory neurons, the distance between a maximum potential and a minimum potential is longer than that of inhibitory neurons. We classified the type of neuronal cell by using k-means clustering  according to the difference in shapes. Here, we used the average length between the two peaks of the spike shape to classify those groups into two classes (k = 2).
2.1.5 Recording and Preprocessing of Neural Activity
To detect and record the spike in cultured neurons, we used MEABench software developed in . All recordings were performed at a 20-kHz sampling rate using the real-time spike detection algorithm LimAda in MEABench. As the LimAda algorithm detects spikes that exceed the threshold, but without distinction between positive and negative values, unexpected double detection of spikes can occur. These double-detected spikes were removed from the data before analysis. On sending the electrical stimuli to a neuronal cell through the electrodes, artefacts might occur. In our experiments, we need to detect the action potential and stimulate the neuronal cell at the same time. The Salpa filter in MEABench was used to remove the artefacts in real time .
2.2 Spiking Neural Networks
2.2.1 Neuron Model
Here, v represents the membrane potential of the neuron, u represents a variable related to the repolarization of the membrane, I represents the input current from outside the neuron, t is the time, and a, b, c, and d are other parameters that control the shape of the spike . The neuron is assumed to be firing when the membrane potential v exceeds 30 mV. The parameters for excitatory neurons (regular-spiking neurons) were set as a = 0.02, b = 0.2, c = −65 mV, and d = 8, and for inhibitory neurons (fast-spiking neurons) as a = 0.1, b = 0.2, c = −65 mV, and d = 2 (Figure 2). The simulation time step Δt is 0.5 ms. The parameter values were chosen to reflect biological relevance .
STP is not necessarily required for LSA, but it is efficient in suppressing strong global burst synchronization in an initialization phase of spiking neural networks , and it stabilizes the firing rate . LSA and the second property of stimulation avoidance studied in this article can be achieved only by tuning parameters (e.g., strength of noise input, initial values of weights, or learning rate of STDP) without STP. However, it can be easily achieved with STP (e.g., results of experiments without STP can show almost the same tendency as the results in this article, but require more simulation time). Moreover, applying STP also makes the networks more realistic. We thus used the STP model in this study.
2.2.2 Spike-Timing Dependent Plasticity
The maximum possible weight is fixed to wmax = 20, and if w > wmax, then w is reset to wmax. The minimum possible weight is fixed to wmin = 0, and if w < wmin, then w is reset to wmin.
2.3 Experimental Setup
We performed learning experiments using the neuronal cultures with the same settings as before [29, 30]. Then we repeated the same experiments with the simulation model using the spiking neural network and examined whether the results from the experiments using real neurons could be reproduced by the model.
For the neuronal cultures, we performed two types of experiments. Experiment 1 used a robot in one-dimensional virtual space (Figure 4); experiment 2 used a robot in two-dimensional real space (Figure 5). In both cases, the stimulation was applied as the sensor input when the robot approached the wall, and the sensor input stopped when the robot moved away from the wall. Those input neurons were randomly selected from a population. We explain each experimental setup in detail in the rest of this section.
2.3.1 Embodied Cultured Neurons in One-Dimensional Virtual Space
In experiment 1, a virtual robot was coupled to the neuronal culture via the CMOS arrays, which could detect neural activity and inject electrical stimuli as explained above. Input and output neurons were determined in the following way. Input neurons were randomly chosen from excitatory neurons. The number of input neurons (2 or 10) depended on the experiment. Before starting the learning experiments, 20 stimuli at 1 Hz were applied to the input neurons and the neural activity was recorded. Based on the recorded data, 10 output neurons were chosen from excitatory neurons to satisfy the following requirement: Following the 20 stimuli, the mean number of spiked neurons within 20 to 40 ms following each stimulus is less than 5. With these input and output neurons, the network cannot behave so to as avoid stimulation in the initial state. This procedure was performed using 10 selected input neurons, and if there was no combination of such output neurons, then this procedure was performed using the 2 selected input neurons.
The virtual robot moved forward at a constant speed; if the robot approached a wall, the sensors stimulated the input neurons, and if more than 5 out of 10 output neurons fired within the specific time window following the stimulation, then the robot would turn away from the wall, rotating at 180 degrees . We call this setup the closed-loop condition. This cycle was repeated 10 times per experiment. We performed six experiments with three cultures (10–43 days in vivo) with the settings. We also performed an experiment with an open-loop condition, where the stimulus input stopped at random, regardless of the network's neural activity (the other settings were the same as under the closed-loop conditions).
2.3.2 Embodied Cultured Neurons in Two-Dimensional Real Space
In experiment 2, Elisa-3 (GCtronic, Ticino, Switzerland) was used as the mobile robot in real space (Figure 5). Elisa-3 is a small, circular robot with a 2.5-cm radius and has two independently controllable wheels. The front right and front left distance sensors were used as sensory signals to stimulate the neuronal cells. The refresh rate of the robot was 10 frames per second.
If sensor value SL,R (0–950) was less than the threshold T, then PL,R became zero. Otherwise, PL,R was calculated using SL,R/Smax. Here Smax denotes the maximum value of the sensor input (950). Whether the stimulus would be delivered to the input neuron or not was determined using this probability every 100 ms. The threshold T was set to 100. According to this form, the distance from the robot to the wall was encoded as the stimulation frequency.
The positive integers vi were equal to the numbers of spikes of the output neurons over a given time interval (100 ms), and we summed them with the negative constant weight k, and added a positive constant C as a default wheel speed. NL and NR were the sets of left- and right-output neurons. Here, as k was negative and C was positive, the robot moved forward when the output neurons were not active. k was set to −0.3. The default values of CL,R were 12.5, and the values were adjusted individually before the experiments so the robot would go straight without the firing of output neurons. As the activity of the output neurons increased, the speed of the forward movement decreased, until finally the robot moved backwards. Since the two wheels of the robot were independent, the robot could turn when the wheel speeds were different.
Because of these settings, as the robot approached the wall, the sensor values became higher and the stimulus input was applied. The number of spikes of the output neurons was the coefficient of the left and right motor speeds of the robot; thus, to avoid the wall, the robot needed to control the motor speeds.
These conditions made it more difficult for the network to learn an action to avoid the wall than in experiment 1. We performed six experiments with three cultures (26–61 days in vivo), but a technical failure occurred in one of the experiments, so the recorded data for the other five experiments were analyzed.
2.3.3 Embodied Spiking Neural Networks in One-Dimensional Virtual Space
We also performed a simulation experiment similar to experiment 1, which used the simulated spiking neural network. The model for spiking neurons, proposed by Izhikevich  and explained above, was used to simulate excitatory neurons and inhibitory neurons. The network consisted of 80 excitatory neurons and 20 inhibitory neurons. This ratio of inhibitory neurons is standard in simulations [22, 23] and similar to biological values . The excitatory neurons were divided into three groups: input (10 neurons), output (10 neurons), hidden (60 neurons). The networks were fully connected; the weight values w for neurons were randomly initialized with uniform distributions as 0 < w < 5 for excitatory neurons, −5 < w < 0 for inhibitory neurons. Only connections between excitatory neurons had synaptic plasticity based on STDP, and the weight values of other connections did not change. Both symmetric and asymmetric STDP, explained above, were used for the synaptic dynamics.
In the simulation experiment, two types of external stimulation conditions were applied: closed-loop and open-loop. In the closed-loop condition, stimulation was delivered at a fixed frequency (100 Hz) with 10 mV, and if more than 5 out of 10 output neurons fired within 10 ms after the stimulation, then the stimulation was removed for 1,000–2,000 ms (randomly chosen each time). Under these conditions, it was possible for the network to learn a behavior to avoid the stimulation. In the open-loop condition, the stimulation was randomly removed, regardless of the firing of output neurons, and any other settings of the stimulation were the same as in the closed-loop condition. In this condition, the network could not learn any behavior that avoided the stimulation.
We first show the results of experiments 1 and 2, then the results of the simulation experiments. In the analysis of experiments 1 and 2, we investigated the neural dynamics in conditions where it was difficult to learn stimulus-avoidance behavior; we focused on the results of the open-loop condition in experiment 1, in which it is impossible for the network to learn that behavior, as well as the results of experiment 2, in which more networks failed to learn the behavior than in experiment 1.
3.1 Cultured Neurons Learn Stimulus-Avoiding Behaviors
3.1.1 Experiment 1
We evaluated the learning results using the reaction time (i.e., the time from the beginning of the stimulation to the time of wall avoidance). Figure 6 shows the learning curves from experiment 1. The vertical axis represents the reaction time; lower values indicate higher learning ability. As shown in Figure 6, in the closed-loop condition, the reaction time rapidly decreased and stabilized, indicating higher learning ability. On the other hand, in the open-loop condition, where the stimulus was randomly applied, the reaction time did not stabilize at lower values, and the variance was higher than that in the closed-loop condition.
These results are similar to the results from previous experiments involving large numbers of cultured neurons (10,000–50,000) . This suggests that such learning behaviors scale from small to large numbers of cultured neurons. In addition, we found that the stimulus-evoked firing considerably decreased in some cases of the open-loop conditions where the stimulus was randomly applied (Figure 7), although the stimulus-evoked firing increased in closed-loop conditions . Below, we focus on the data from the open-loop conditions.
Figure 8 shows mean evoked firing rates. The evoked firing rate, except for input neurons (Figure 8(a)), consisted of spikes within 200 ms after each stimulus. The evoked firing rate of input neurons (Figure 8(b)) consisted of spikes 50 ms after each stimulus; this time window differs from the window mentioned above because we focused on the spikes evoked by each stimulus, excluding the spikes evoked by feedback from other neurons.
Besides the qualitative results in Figure 7, the statistical results show that the mean evoked firing rates for all neurons, except for input neurons, significantly decreased in the last 5 min of the experiments, relative to the first 5 min (Wilcoxon signed-rank test, n = 6, p = 0.012) (Figure 8(a)). On the other hand, the evoked firing rates of input neurons did not change significantly (Figure 8(b)). These results imply that the decrease in the number of evoked spikes was not caused by a decrease in the firing rates of the input neurons. This suggests that the cultured neurons tried to learn a behavior to avoid an external stimulation, but if the network could not avoid the stimulus, it tended to ignore the external stimulation.
3.1.2 Experiment 2
In experiment 2, the learning of wall avoidance behavior succeeded in only two out of five experiments. Here, success was defined as the reaction time (i.e., the time from the start of stimulation to the time of wall avoidance) decreasing by 30% or more. The average reaction times from the first 10 min and the last 10 min of the experiments were used to calculate the success rate. We focused on dynamics where LSA failed, examining whether dynamics similar to those from the open-loop conditions in experiment 1 were observed in this more difficult task.
As shown in Figure 9, the stimulus-evoked spikes decreased in all failure cases. These results are similar to the results from the open-loop conditions in experiment 1.
The statistical results showed that the mean evoked firing rate of all neurons, except for input neurons, significantly decreased in the last 5 min of the experiment relative to the first 5 min (Wilcoxon signed-rank test, n = 3, p = 0.023) (Figure 10(a)). On the other hand, the mean evoked firing rate of the input neurons did not change significantly (Figure 10(b)).
Figure 11 shows the time evolution of the evoked firing rates of input neurons as well as other neurons. As shown in Figure 11, the evoked firing rates of other neurons gradually decreased during the experiments, but the evoked firing rates of input neurons did not decrease.
These results imply that the decreased firing rates for all neurons, except for input neurons, were not caused by a decrease in the evoked spikes from the input neurons, but by the weakening of the synaptic connection from the input neurons to the others. Therefore, this suggests that embodied cultured neural networks try to learn an action to avoid external stimulation, but, if the synaptic connections that control the motor output cannot stop the stimulus, the networks tend to ignore the external stimulation, thereby weakening the connection strength from the inputs.
According to the results of the experiments, we found that if the cultured neurons cannot learn a behavior that avoids stimulation, the neural dynamics work to isolate the uncontrollable neurons that receive stimuli that the network cannot learn an action to avoid.
3.2 Spiking Neural Networks Reproduce Stimulus-Avoiding Behaviors
In the simulation experiments, we examined whether simulated networks reproduce the stimulus-avoiding behaviors observed in the experiments above, that is, when the network cannot learn an action to avoid input stimuli, the network tends to ignore the uncontrollable input by weakening the connections from the uncontrollable input neurons.
When applying symmetric STDP where the dynamics of LTP and LTD are symmetric (ALTP, ALTD = 1.0; τLTP, τLTD = 20), the synaptic weight from the input neurons increased in both closed-loop and open-loop conditions (Figure 12(a)).
In the open-loop condition, the stimulation was randomly removed; thus, the network cannot learn the behavior to avoid the stimulation. The results show no dynamics leading to the isolation of the uncontrollable neurons.
On the other hand, when applying asymmetric STDP where the dynamics of LTP and LTD are asymmetric (ALTP = 1.0, ALTD = 1.1; τLTP = 20, τLTD = 24; the peak of LTD is higher and the tail of LTD is longer than that of LTP), the synaptic weights from the input neurons increased in the closed-loop conditions, but decreased in the open-loop conditions (Figure 12(b)). This tendency is the same as that observed in the experiments using the neuronal cultures explained above. Therefore, we found that spiking neural networks with asymmetric STDP reproduce the dynamics to isolate the uncontrollable input neurons as observed in the experiments using cultured neurons.
The results of the asymmetric STDP with the parameters ALTP = 1.0, ALTD = 1.1; τLTP = 20, τLTD = 24 (Figure 3(b)) show the maximum value of SI in the parameter space. With ALTP = 1.0, ALTD = 0.95; τLTP = 20, τLTD = 28 (Figure 3(c)), the shape of the STDP function is similar to that of the classical STDP function observed in vitro and in vivo . The peak of LTD is lower than the peak of LTP, and the working time window of LTD is longer than that of LTP. The value of SI in this region was still positive, implying the networks isolate the uncontrollable input neurons. This suggests that the dynamics can also work in biological neural networks.
Figure 13(b) shows the integral values of the STDP function with the same parameters as in Figure 13(a). In the blue regions, LTD is stronger than LTP; thus, in theory random spikes in presynaptic neurons should decrease the weight from the neurons. The weight selection occurred in the blue regions. However, if such a decrease was too strong (e.g., with the parameter: ALTP = 1.0, ALTD = 1.4; τLTP = 20, τLTD = 30; Figure 3(d)), both Wci and Woi decreased almost to zero. Thus, weight selection did not occur, because LTD was too strong compared to LTP; thus, almost all the weights of connections between neurons decreased, and the networks could not learn anything. Therefore, we found that weight selection occurred in balanced regions where the integral value of LTD is stronger, but not much stronger, than that of LTP.
We examined what kind of connections were weakened by this weight selection dynamics in the spiking neural networks. To simplify, we considered a minimal case with two neurons and one connection (Figure 14) to compare asymmetric STDP with symmetric STDP. In the asymmetric STDP case, if Δtp ≈ Δtd and Δtp < τLTP, Δtd < τLTD, then the connection weight decreases because with the asymmetric STDP, LTD has a stronger effect than LTP (Figure 14(a)). In the symmetric STDP case, the connection does not change much, because LTP and LTD affect the connection equally in that case (Figure 14(b)). Thus, with the asymmetric STDP, if the mean value of the spike intervals that cause LTP (Δtp) and the mean value of the spike intervals that cause LTD (Δtd) are close, as in ≈ , the connection disappears.
In large networks, such situations occur if a presynaptic neuron fires independently from the other neurons at a high frequency. The connections from the input neurons, which are stimulated at a high frequency (e.g., more than 20 Hz when τLTP + τLTD = 50 ms), should decrease according to the dynamics of the asymmetric STDP. The stimulation frequency SF [Hz] for the weight selection is as follows: SF ≥ k, where k ≈ 103/(τLTP + τLTD) for a minimal case with 2 neurons. This is inconsistent with our results regarding neuronal cultures, in which the networks showed a weight selection behavior at a low frequency (1 Hz). This suggests that this condition on the stimulation frequency should be modified for larger networks.
In our experiments, we focused on the weight selection for input neurons; however, these dynamics should also work inside the networks. A synaptic weight from an active neuron with a high firing rate should decrease, as the conditions are consistent with that of input neurons with the high-frequency stimulation explained above. However, in the case of hidden neurons, since there should be a feedback loop, this decrease leads to a decrease in the firing rate of the active neuron itself; if the firing rate becomes lower than a certain threshold, this weight depression should end. Thus, the synaptic weight should decrease but remain over zero, as in the case of the input neurons observed in our simulation experiments. This effect should stabilize the network through pruning connections from neurons with a high firing rate.
Our previous studies showed that embodied spiking neural networks with STDP learn a behavior as if they avoid stimulation, and the learning dynamics of the stimulus-avoiding behaviors scales from small networks to large networks (from two cells  to approximately 50,000 cells ). If LSA also works in cultured neurons, a smaller number of cultures than obtained by  can learn an action to avoid stimulation. We found that a small number of cultured neurons (42–110 cells) can do so. This suggests that the learning dynamics are based on LSA. In addition, we found that if the network could not learn the behavior to avoid the stimulus, plasticity worked to suppress the influence of the uncontrollable stimulus on the network by weakening the connection from the input neurons. We also demonstrated that spiking neural networks with asymmetric STDP reproduced the stimulus-avoiding behaviors observed in the cultured neurons. Below, we further discuss the stimulus-avoiding behaviors.
In this study, we found that if a network cannot learn the behavior needed to avoid a stimulus, plasticity works to suppress the influence of an uncontrollable stimulus on the network by weakening the connection strength from the input neurons. In neuroscience, this kind of phenomena, in which constant sensor inputs are ignored, is known as neural adaptation, sensory adaptation, or stimulus-specific adaptation. These phenomena are observed in many regions in the brain (e.g., in the auditory system [34, 36, 41, 46]) and also in vitro . Such adaptation can be divided into two types: fast adaptation (less than one second) and slow adaptation (more than a few minutes) . The mechanism of slow adaptation is considered to be synaptic plasticity, as with LTD. One of the possible mechanisms of fast adaptation is synaptic fatigue , in which repeated stimulation depletes the neurotransmitters in the synapse, weakening the response of neurons to the stimulation. Some studies suggest that synaptic fatigue is caused by high-frequency stimulation, and occurs in presynaptic neurons .
In our experiment on neuronal cultures, low-frequency stimulation (1 Hz) was used, and the results showed that the evoked spikes of the presynaptic neuron (input neurons) did not decrease, and the evoked spikes of all the other neurons decreased for more than 20 min. Therefore, our results with neuronal cultures suggest that the observed behavior resembles slow adaptation caused by LTD. The spiking neural networks with asymmetric STDP reproduced the observed slow adaptation (although the networks have STP that has similar dynamics to that of synaptic fatigue, STP stabilizes the firing rate within 1 s; thus, slow adaptation cannot be caused by STP). Moreover, we found that the phenomena were observed in parameter spaces where the shape of the STDP function was similar to the shape broadly observed in vitro. Therefore, we argue that the mechanism based on asymmetric STDP should work in biological neural networks.
In the field of artificial life, adaptive behavior is and has been a major theme and should be explored further (e.g., see [10, 11, 20]). Many researchers have studied the emergence of adaptive behavior as an outcome of neural homeostasis. Although their models are very insightful, they are too abstract to explain homeostatic properties in biological neural networks. In our previous studies, we found that bio-inspired spiking neural networks with STDP have homeostatic properties that allow networks to learn a behavior to avoid external stimulation from the environment (e.g., wall avoidance behavior) . Our previous and present studies also showed that LSA can work in biological neural networks in vitro [29, 30], suggesting that LSA can explain homeostatic adaptation in vivo.
Recently, Friston proposed the free-energy principle (FEP) [14, 16] based on Bayesian inference by extending the predictive coding model . To minimize free energy (i.e., surprise) in FEP, the way of reconfiguring the internal model is called perceptual inference; on the other hand, reducing surprises using actions is called active inference. Active inference has attracted much attention from the viewpoint of autonomous behavior. Indeed, Friston discussed the relationship between the homeostatic adaptive behavior of animals and active inference [15, 17, 18]. In our framework of stimulus avoidance in neural networks, regarding external stimulation as a surprise, the neural dynamics used to learn actions in order to avoid surprise is similar to an intuitive interpretation of active inference, where an agent behaves to minimize surprise. To examine this insight, we need to calculate the free energy in our framework in future research.
In addition, in this article, we proposed a new principle, saying that if embodied neural networks cannot learn actions to avoid a stimulation, the networks can work to isolate the neurons that are receiving uncontrollable stimulation from the environment. This two-layered homeostatic principle is similar to Ashby's theory of ultrastability, in which the system has two types of homeostasis; if the first regular homeostasis is unstable and its essential variables exceed the limits, then the second homeostasis works to rearrange the system dramatically . The system will reconstruct itself by trial and error until a stable homeostasis is acquired. Ashby suggests that biological systems are ultrastable with these two types of homeostasis . Our results suggest that such a behavior can emerge thanks to the local dynamics of neurons in both biological and bio-inspired artificial neural networks.
The simulation results showed that ignoring an uncontrollable constant stimulus is a strong feature of spiking neural networks with asymmetric STDP. Almost all the connections from the input neurons with uncontrollable stimuli decreased to zero. However, the synaptic weights from the input neurons with controllable inputs (i.e., sensor inputs that the agent can learn to avoid) increased. This suggests that the networks can isolate the input neurons with uncontrollable stimulus inputs, and it can be regarded as dynamics trying to regulate self and nonself. A closed loop of the sensor and motor, in which the motor outputs control the sensor stimulation as in sensorimotor contingency , is regarded as self, while an open loop of the sensor and motor is regarded as nonself. The open loop collapses after isolating the sensor neuron. Thus the self-boundary is not limited to the network, but extended to the environment through its body. It is interesting that the dynamics emerge from just the simple local dynamics of neurons.
How to discriminate self from nonself reminds us of a theory of autopoiesis proposed by Maturana and Varela . In addition to the structural viewpoint of regulating the self-boundary explained above, here we discuss the results of self-regulating behavior from an autopoietic point of view.
In autopoiesis, discrimination comes with the boundary between self and nonself. It is not a physical rigid boundary, but a dynamic one: It should be constantly produced and maintained by its system's own processes.
Varela reported a simple mathematical model using artificial chemistry featuring autopoiesis . Two metabolite particles (S) generate one boundary particle (L) catalyzed by a catalytic particle (C). Those boundary particles connect to form a connected boundary, which encloses C and L. The boundary constantly decays and is repaired by the free boundary particles L. This self-organizing process of encapsulating C and L defines self-discrimination. No single particle defines the self-boundary; rather, self-entity only emerges at a certain collective level.
This picture becomes much clearer on taking the immune systems as an example. Vertebrates establish self-nonself discrimination by forming an idiotype network, in which an antibody-antigen chain reaction exists among antibodies according to Jerne's hypothesis . The current understanding of self-nonself discrimination has a molecular-biological basis; however, the acquired immunity still needs to be exploited. A candidate for the explanation is the autopoietic picture. Each antibody can adaptively change the self-boundary. By self-organizing an idiotype network, self-nonself discrimination emerges as a result of the network reactions: The antigen-antibody reaction is suppressed locally for the self antigens, but the reaction is percolated for the nonself antigens. The reaction network determines self-nonself discrimination similarly to how Varela's simple artificial chemistry  is determined.
Coming back to the present study, no single neural response determines the self-nonself boundary. Self-nonself boundaries emerge only in neural networks of a certain size. A neural network determines the self-nonself boundary in the same way the immune system does. The boundaries for immune systems and for neural systems are processed dynamically. It is not explicit in the network whether a certain firing pattern of neurons depends on what comes from the outside or from the inside of the network. As in the immune system, a pattern that makes the network's response stronger and causes structural changes in the network is regarded here as nonself.
For example, the controllable input above is initially regarded as a pattern from the outside of the network (nonself). However, as the change of the network progresses and the network learns the behavior to control it, the input will no longer cause large changes in the network and will be regarded as a pattern from the inside of the network (self). The uncontrollable input is initially regarded as a pattern from the outside (nonself) as in the case of controllable inputs; however, by weakening the connections from the sensor according to the dynamics proposed in this article, explicit boundaries like Varela's cellular boundary  can be created. Thereby, the inputs are explicitly isolated from the inside and no longer affect the internal network (self).
Further, our previous results, with spiking neural networks consisting of 100 neurons with almost the same models as in this article, could predict a simple causal sequence of stimuli . In that situation, like controllable input, predictable input is initially regarded as an external pattern (nonself) that causes structural changes in the network; however, when the network learns to predict the input, the input no longer affects the network and is regarded as a pattern from the inside (self).
Although we have only discussed the patterns from the environment, this dynamics should also work inside the network (although regulation by action does not occur inside the network and requires coupling with the environment). Inside the network, the dynamics of isolation of uncontrollable patterns and prediction of predictable patterns regulate the boundaries, and the network converges to stable states in which it shows transitions of several patterns. In this way, the neural network can also be regarded as a system that acquires its own stability through the autopoietic process by means of action, prediction, and selection.
We have presented neural homeodynamic responses to environmental changes in experiments using both biological neural networks and artificial neural networks. As a result of the experiments, we found that the embodied neural networks show two kinds of stimulus-avoiding behaviors: (i) when input stimuli are controllable via actions, the embodied networks learn an action to avoid the stimulation, and (ii) when input stimuli are uncontrollable, the connections from neurons with the uncontrollable input are weakened to avoid influences of the stimulation on other neurons. We argued that these stimulus-avoiding behaviors can be regarded as the dynamics of an autonomous regulation of self and nonself, in which controllable neurons are regarded as self, and uncontrollable neurons are regarded as nonself. This article has introduced neural autopoiesis by proposing the principle of stimulus avoidance. We have thus extended the notion of autopoiesis to neural networks.
The high-density CMOS array used in this study was provided by Professor Andreas Hierlemann, ETH Zürich. This work was supported by a Grant-in-Aid for JSPS Fellows (16J09357), KAKENHI (17K20090), AMED (JP18dm0307009), the Asahi Glass Foundation, and the Kayamori Foundation of Information and Science Advancement. This work is partially supported by the MEXT project “Studying a Brain Model based on Self-Simulation and Homeostasis” in the Grant-in-Aid for Scientific Research on Innovative Areas “Correspondence and Fusion of Artificial Intelligence and Brain Science” (19H04979).