## Abstract

Living organisms must actively maintain themselves in order to continue existing. Autopoiesis is a key concept in the study of living organisms, where the boundaries of the organism are not static but dynamically regulated by the system itself. To study the autonomous regulation of a self-boundary, we focus on neural homeodynamic responses to environmental changes using both biological and artificial neural networks. Previous studies showed that embodied cultured neural networks and spiking neural networks with spike-timing dependent plasticity (STDP) learn an action as they avoid stimulation from outside. In this article, as a result of our experiments using embodied cultured neurons, we find that there is also a second property allowing the network to avoid stimulation: If the agent cannot learn an action to avoid the external stimuli, it tends to decrease the stimulus-evoked spikes, as if to ignore the uncontrollable input. We also show such a behavior is reproduced by spiking neural networks with asymmetric STDP. We consider that these properties are to be regarded as autonomous regulation of self and nonself for the network, in which a controllable neuron is regarded as self, and an uncontrollable neuron is regarded as nonself. Finally, we introduce neural autopoiesis by proposing the principle of stimulus avoidance.

## 1 Introduction

Learning is an important aspect of neural systems, and it is crucial for animals, as embodied neural systems, to learn adaptive behavior to survive. One of the key concepts in the study of adaptive behavior is homeostasis. Ashby argued that adaptive behavior is an outcome of the homeostatic property of living systems [1]. Ikegami and Suzuki have proposed the concept of homeodynamics, where an autonomous self-moving state emerges from a homeostatic state [21], and Iizuka and Di Paolo have reported that adaptive behavior is an indispensable outcome of homeostatic neural dynamics [11, 20]. However, those models are still too abstract to compare with the homeostatic property of biological neural dynamics. In this article, we study neural homeodynamic responses to environmental changes, using both biological neural cells and a more biologically plausible computational model of neurons.

Biological neural cells cultured in vitro have been used to study neural systems [2, 12, 27, 37], because such cultured neurons are easier to study than neurons in vivo, since they are composed of fewer neurons, and cultured in a more stable environment. Using cultured neurons is also advantageous because unknown, complex features in neural cells, which are still difficult to implement in artificial neural networks, can potentially be used. Although cultured neural systems are much simpler than real brains, they have the essential properties, including spontaneous activity, various types and distributions of cells, high connectivity, and rich and complex controllability [5]. Homeostatic adaptivity may be one such property.

Artificial neural networks are often used as models of biological neural networks to understand learning mechanisms [39]. Recently, the simulation of the more realistic artificial neural networks has become computationally efficient through the introduction of models of spiking neurons [4, 23, 25] and of synaptic plasticity [6, 9, 44]. These more realistic models can lead to theoretical understanding of biological neural networks.

Following the organization of biological experiments and the development of a computational model, we propose the homeodynamic control of a neural network, based on the new learning principle of synaptic plasticity, by which neural networks learn to avoid stimulation from the environment.

### 1.1 Learning by Stimulation Avoidance

Shahaf and Marom demonstrated that a cultured neural network can learn a desired behavior as if the network avoided stimulation using the following protocols [40]. First, an electrical stimulation with a fixed low frequency (e.g., 1–2 Hz) was delivered to a predefined input zone of the network. When the desired behavior appeared, the stimulation was removed. After this protocol was repeated, the network learned to produce the expected behavior in response to the stimulation. In practice, the authors showed that the networks learned to produce spikes in predefined output zones, in a predefined time window (40–60 ms after each stimulus), in response to the stimulation applied in the input zone.

Marom and Shahaf explained these results by invoking the stimulus regulation principle (SRP) [28]. The SRP is composed of the following two functions at the neural network level: (i) modifiability, by which stimulation drives the network to try to form different topologies by modifying neuronal connections, and (ii) stability, by which removing the stimulus stabilizes the last configuration of the network. They argued that their preliminary experiments suggested that cultured neural networks had these two functions in the network [40]. However, the two properties that constitute the SRP are macroscopic phenomenological explanations, which do not form a concrete mechanism. Furthermore, assuming that modifiability is correct, learned configurations are destroyed at every stimulation; if stability is correct, it should not be necessary to repeat such a cycle as in the experiment explained above.

In a previous study, we proposed a mechanism, termed learning by stimulation avoidance (LSA), on the micro scale of neural dynamics, using small simulated networks to explain Shahaf and Marom's results [43]. LSA is based simply on a classical form of spike-timing-dependent plasticity (STDP) [44]. When a presynaptic neuron and a postsynaptic neuron fire successively within a certain time window, the synaptic weight increases (long-term potentiation, or LTP). If the timing of the two neurons' firing is reversed, the weight decreases (long-term depression, or LTD). STDP has been found in both in vivo and in vitro networks [6].

The following two dynamics emerge for the avoidance of stimulation in LSA, based on STDP. The first dynamic is the reinforcement of behavior that decreases the stimulation by LTP. If the firing of the postsynaptic neuron terminates the cause of the presynaptic neuron, then the synaptic weight from the presynaptic neuron to the postsynaptic neuron will increase. Thus the behavior leading to the decrease in stimulation is reinforced. The second dynamic is the weakening of behavior that increases the stimulation by LTD. If the firing of the postsynaptic neuron initiates the cause of the presynaptic neuron, then the synaptic weight from the presynaptic neuron to the postsynaptic neuron will decrease. Thus the behavior leading to the increase in stimulation is weakened. This is the most basic structure of LSA.

LSA states that the network learns to avoid external stimuli by learning available behaviors. In the follow-up studies, we further showed that LSA scales up to larger networks [31]. LSA works as long as the following two conditions are met: (i) the plasticity of the embodied neural network is driven by STDP, and (ii) the network constitutes a closed loop with the environment. We claim that LSA is an emergent property of a spiking network with Hebbian rules [19] and its environment.

Such stimulation avoidance can produce homeostasis: Since the unexpected stimulation of an agent resulting from its environment represents environmental changes, avoiding the stimulation decreases the influence of environmental changes on the internal state of the agent. It is interesting that such homeostatic global behavior directly emerges from simple synaptic dynamics, such as STDP. This property of stimulus avoidance can also be regarded as an intrinsic motivation that emerged from the local dynamics of neurons in a bottom-up manner.

### 1.2 Motivation

Shahaf and Marom's results are promising in that they showed that behaviors that avoid stimulation will be learned autonomously in cultured neurons [40]. However, there have been no further studies on this learning dynamics. Although Shahaf and Marom used cultures with 10,000–50,000 neurons, a smaller number of cultured neurons would be able to learn in the same manner, if the same dynamics as LSA works in cultured neurons. Thus we first performed learning experiments using a smaller number of cultured neurons than previous works have used, to demonstrate how such a learning mechanism scales from small to large cultured neurons.

In addition, the results of the experiments showed that if the network cannot learn a behavior that removes external stimuli, its response to stimulation is gradually suppressed, as if it isolates the uncontrollable input neurons. This means that the second property, where the network avoids the effects of stimulation through weakening the connection from uncontrollable input neurons, is used alongside LSA to avoid stimulation.

We also found that such a behavior can be reproduced by simulated spiking neural networks with asymmetric STDP, where the functions of LTP and LTD are rotationally asymmetric (e.g., the working window for LTD is larger than the window for LTP). These dynamics can be interpreted as the embodied neural network autonomously regulating the boundary between self and nonself.

Furthermore, it should be noted here that updating the concept of autopoiesis was a broad motivation for this study. Autopoiesis has been proposed by [32] as a fundamental principle of living systems. A microscopic level of a system's organization is iteratively compensated for by the emerging macro state; taking cell dynamics as an example, the resultant organization is a cellular boundary organized in a chemical space that distinguishes between self and nonself. This perspective is commonly applied to both the immune and neural systems of vertebrates. Immune systems can distinguish between the material self and nonself, and neural systems can distinguish between “informational” self and nonself. The neural autopoiesis demonstrated in this article is based on the principle of stimulus avoidance observed in neural networks.

## 2 Materials and Methods

We used both cultured neurons and spiking neural networks for studying the homeostatic properties in embodied neural networks. Below, we first describe the methods for using cultured neurons, then the methods for using spiking neural networks, and finally each experimental setup.

### 2.1 Cultured Neurons

#### 2.1.1 Cell Culture

The neural cultures were prepared from the cerebral cortex of E18 Wistar rats, as previously reported [3, 51, 52]. The cortex region was trypsinized with 0.25% trypsin, and the dissociated cells were plated and cultured on a recording device. The surfaces of the electrodes on the device were coated with 0.05% polyethylenimine and laminin to improve plating efficiency. The cells were cultured in neurobasal medium (Life Technologies, California, USA) containing 10% l-glutamine (Life Technologies, California, USA) and 2% B27 supplement (Life Technologies, California, USA) for the first 24 h. After the first 24 h, half the plating medium was replaced with growth medium (Dulbeccos modified Eagles medium (Life Technologies, California, USA)) that contained 10% horse serum, 0.5 mM GlutaMAX (Life Technologies, California, USA), and 1 mM sodium pyruvate. The cultures were placed in an incubator at 37°C with an H2O-saturated atmosphere that consisted of 95% air and 5% CO2. During cell culturing, half the medium was replaced once a week with the growth medium. All cultures used in our experiments consisted of 47–110 cells and were sufficiently matured to show global burst synchronization.

#### 2.1.2 CMOS-based High-Density Microelectrode Arrays

A high-density microelectrode array based on complementary metal-oxide semiconductor (CMOS) technology [13] was used to measure the extracellular electrophysiological activity of the cultured neurons (Figure 1). This CMOS-based electrode array is superior to the conventional multi-electrode array (MEA) used previously [37], in that it has a far higher spatiotemporal resolution. The number of electrodes in conventional MEAs is small (e.g., 64), and the locations of the recording electrodes are predetermined using a large interelectrode distance (e.g., 200 μm). Thus, it is difficult to identify signals from an individual cell. In contrast, the CMOS arrays have 11,011 electrodes; the diameter of the electrode is 7 μm, with an interelectrode distance of 18 μm over an area of 1.8 mm × 1.8 mm. Thus this method can identify signals from an individual cell in a small culture. The device can simultaneously record the electrical activity of 126 electrodes at a sampling rate of 20 kHz.

Figure 1.

The high-density CMOS electrode array used in this experiment. This recording device has 11,011 recording sites, a diameter of 7 μm, and an inter-electrode distance of 18 m.

Figure 1.

The high-density CMOS electrode array used in this experiment. This recording device has 11,011 recording sites, a diameter of 7 μm, and an inter-electrode distance of 18 m.

#### 2.1.3 Estimation of Neuronal Somata Locations

Before recording the neural activities, the 11,011 electrodes were scanned to obtain an electrical activity mapping to estimate the locations of the neuronal somata (i.e., identify the positions of neural cells). The scanning session consisted of 95 recordings. In each recording, the electrical activities for 110–120 electrodes were simultaneously recorded for 60 s. In the recordings, the sampling frequency was set to 20 kHz and the band path filter was set to 0.5–20 kHz. An electrical activity map was obtained by averaging the height of the action potentials for each electrode. We applied a Gaussian filter to the map and assumed that the neuronal somata were located near the local peaks on the Gaussian-filtered map. At most, 126 peaks were selected, in descending order, as the positions of neural cells, and the electrodes nearest the peaks were used to record the neural activity. If the number of local peaks was smaller than 126, then all the peaks were used. Using this method, one electrode can ideally represent a single neural state.

#### 2.1.4 Estimation of Excitatory and Inhibitory Synaptic Conductances

It was important that we were able to identify the neuron's type for these experiments, as the neural activity was recorded, and stimulation was applied, for each neuron, not for a group of neurons. The neuronal cell type (i.e., excitatory or inhibitory) was estimated using spike shapes, which were recorded for 10 min before the main experiment [45]. The shapes of the action potential of these two neural types differ: In the action potential of excitatory neurons, the distance between a maximum potential and a minimum potential is longer than that of inhibitory neurons. We classified the type of neuronal cell by using k-means clustering [26] according to the difference in shapes. Here, we used the average length between the two peaks of the spike shape to classify those groups into two classes (k = 2).

#### 2.1.5 Recording and Preprocessing of Neural Activity

To detect and record the spike in cultured neurons, we used MEABench software developed in [48]. All recordings were performed at a 20-kHz sampling rate using the real-time spike detection algorithm LimAda in MEABench. As the LimAda algorithm detects spikes that exceed the threshold, but without distinction between positive and negative values, unexpected double detection of spikes can occur. These double-detected spikes were removed from the data before analysis. On sending the electrical stimuli to a neuronal cell through the electrodes, artefacts might occur. In our experiments, we need to detect the action potential and stimulate the neuronal cell at the same time. The Salpa filter in MEABench was used to remove the artefacts in real time [49].

### 2.2 Spiking Neural Networks

#### 2.2.1 Neuron Model

The model for spiking neurons proposed in [23] was used to simulate excitatory neurons and inhibitory neurons. This model is well known, as it can be regulated to reproduce the dynamics of many variations of cortical neurons, and it is computationally efficient. The equations of the neural model are defined as follows:
$dvdt=0.04v2+5v+140−u+I,$
(1)
$dudt=abv−u,$
(2)
$ifv≥30mV,thenv←c,u←u+d.$
(3)

Here, v represents the membrane potential of the neuron, u represents a variable related to the repolarization of the membrane, I represents the input current from outside the neuron, t is the time, and a, b, c, and d are other parameters that control the shape of the spike [22]. The neuron is assumed to be firing when the membrane potential v exceeds 30 mV. The parameters for excitatory neurons (regular-spiking neurons) were set as a = 0.02, b = 0.2, c = −65 mV, and d = 8, and for inhibitory neurons (fast-spiking neurons) as a = 0.1, b = 0.2, c = −65 mV, and d = 2 (Figure 2). The simulation time step Δt is 0.5 ms. The parameter values were chosen to reflect biological relevance [22].

Figure 2.

Dynamics of regular-spiking and fast-spiking neurons simulated using the Izhikevich model. Regular-spiking neurons were used as excitatory neurons, and fast-spiking neurons were used as inhibitory neurons.

Figure 2.

Dynamics of regular-spiking and fast-spiking neurons simulated using the Izhikevich model. Regular-spiking neurons were used as excitatory neurons, and fast-spiking neurons were used as inhibitory neurons.

The input current Ii was added for each neuron ni at each time step t as
$Ii=Ii*+ei+mi,Ii*=∑j=0nfjwjisj,fj=1ifneuronjisfiring,0otherwise.$
(4)
Here, m represents zero-mean Gaussian noise with standard deviation σ = 3 mV delivered to each neuron at each time step as internal noise input; e represents external stimulation (conditions, frequency, and strength of the external stimulation are described in a later section); s represents short-term plasticity variables. A phenomenological model of short-term plasticity (STP) [33] was used here. STP is a reversible plasticity rule that decreases the intensity of neuronal spikes if they are too close in time. As in the original article, we applied STP to the output weights from excitatory neurons to both excitatory and inhibitory neurons. Practically, s varies for each neuron nj as
$sj=ujxj,dxdt=1−xjτd−ujxjfj,dudt=U−ujτf+U1−ujfj.$
(5)
Here, x represents the amount of available resources, and u represents the resource used by each spike [33]. The parameters were set to τd = 200 ms, τf = 600 ms, and U = 0.2 mV.

STP is not necessarily required for LSA, but it is efficient in suppressing strong global burst synchronization in an initialization phase of spiking neural networks [43], and it stabilizes the firing rate [29]. LSA and the second property of stimulation avoidance studied in this article can be achieved only by tuning parameters (e.g., strength of noise input, initial values of weights, or learning rate of STDP) without STP. However, it can be easily achieved with STP (e.g., results of experiments without STP can show almost the same tendency as the results in this article, but require more simulation time). Moreover, applying STP also makes the networks more realistic. We thus used the STP model in this study.

#### 2.2.2 Spike-Timing Dependent Plasticity

A computational model of the classical form of STDP was used as a model for synaptic plasticity. It updates the synaptic weight between two connected neurons depending on the relative timing of their spikes; when the presynaptic neuron fires immediately before the postsynaptic neuron, the synaptic weight increases, and in the opposite case the synaptic weight decreases. The amount of weight change Δw obeys the following dynamics:
$Δw=ALTP1−1τLTPΔtifΔt>0,−ALTD1−1τLTD−ΔtifΔt<0.$
(6)
Here, Δt represents the relative spike timing between the presynaptic neuron a and the postsynaptic neuron b: Δt = tbta (ta represents the time of the spike of presynaptic neuron a, and tb represents the timing of the spike of postsynaptic neuron b). ALTP and ALTD are the parameters for the strength of the effects of LTP and LTD in STDP. τLTP and τLTD are the parameters for the working windows of LTP and LTD in STDP. The parameters ALTP, ALTD, τLTP, and τLTD were varied with the experiment, to use various forms of STDP. For symmetric STDP, the parameters of LTP and LTD that were rotationally symmetrical (ALTP, ALTD = 1.0; τLTP, τLTD = 20) were used; for asymmetric STDP, the dynamics of LTP and LTD that were rotationally asymmetrical (ALTP = 1.0, ALTD = 0.8–1.5; τLTP = 20, τLTD = 20–30) were used. Figure 3 shows the variation of Δw with Δt in the symmetric STDP, along with three examples of asymmetric STDP.
Figure 3.

Parametric variations of the STDP curve. (a) Symmetric STDP: ALTP, ALTD = 1.0; τLTP, τLTD = 20. (b) Example of asymmetric STDP: ALTP = 1.0, ALTD = 1.1; τLTP = 20, τLTD = 24; the peak of LTD is higher and the tail of LTD is longer than that of LTP. (c) Example of asymmetric STDP: ALTP = 1.0, ALTD = 0.95; τLTP = 20, τLTD = 28; the peak of LTD is smaller and the tail of LTD is longer than that of LTP. (d) Example of asymmetric STDP: ALTP = 1.0, ALTD = 1.4; τLTP = 20, τLTD = 30; the peak of LTD is higher and the tail of LTD is longer than that of LTP.

Figure 3.

Parametric variations of the STDP curve. (a) Symmetric STDP: ALTP, ALTD = 1.0; τLTP, τLTD = 20. (b) Example of asymmetric STDP: ALTP = 1.0, ALTD = 1.1; τLTP = 20, τLTD = 24; the peak of LTD is higher and the tail of LTD is longer than that of LTP. (c) Example of asymmetric STDP: ALTP = 1.0, ALTD = 0.95; τLTP = 20, τLTD = 28; the peak of LTD is smaller and the tail of LTD is longer than that of LTP. (d) Example of asymmetric STDP: ALTP = 1.0, ALTD = 1.4; τLTP = 20, τLTD = 30; the peak of LTD is higher and the tail of LTD is longer than that of LTP.

STDP was applied only between excitatory neurons; thus, the weights of other connections did not change from their initial values for all experiments. The weight value w between excitatory neurons varies as
$wt=wt−1+Δw.$
(7)

The maximum possible weight is fixed to wmax = 20, and if w > wmax, then w is reset to wmax. The minimum possible weight is fixed to wmin = 0, and if w < wmin, then w is reset to wmin.

In addition to STDP, a weight decay function was applied to the weights between excitatory neurons. The decay function is defined as follows:
$wt+1=1−μwt.$
(8)
The parameter μ was fixed as μ = 5 × 10−7.

### 2.3 Experimental Setup

We performed learning experiments using the neuronal cultures with the same settings as before [29, 30]. Then we repeated the same experiments with the simulation model using the spiking neural network and examined whether the results from the experiments using real neurons could be reproduced by the model.

For the neuronal cultures, we performed two types of experiments. Experiment 1 used a robot in one-dimensional virtual space (Figure 4); experiment 2 used a robot in two-dimensional real space (Figure 5). In both cases, the stimulation was applied as the sensor input when the robot approached the wall, and the sensor input stopped when the robot moved away from the wall. Those input neurons were randomly selected from a population. We explain each experimental setup in detail in the rest of this section.

Figure 4.

Experimental environment of the robot experiment in one-dimensional virtual space. Either end of the space is a wall; when the robot contacts the wall, the robot receives stimulation. The robot moves from one side to the other side for 5 min.

Figure 4.

Experimental environment of the robot experiment in one-dimensional virtual space. Either end of the space is a wall; when the robot contacts the wall, the robot receives stimulation. The robot moves from one side to the other side for 5 min.

Figure 5.

Experimental environment of the robot experiment in two-dimensional real space. The robot was placed in a flat square arena (60 cm × 60 cm).

Figure 5.

Experimental environment of the robot experiment in two-dimensional real space. The robot was placed in a flat square arena (60 cm × 60 cm).

#### 2.3.1 Embodied Cultured Neurons in One-Dimensional Virtual Space

In experiment 1, a virtual robot was coupled to the neuronal culture via the CMOS arrays, which could detect neural activity and inject electrical stimuli as explained above. Input and output neurons were determined in the following way. Input neurons were randomly chosen from excitatory neurons. The number of input neurons (2 or 10) depended on the experiment. Before starting the learning experiments, 20 stimuli at 1 Hz were applied to the input neurons and the neural activity was recorded. Based on the recorded data, 10 output neurons were chosen from excitatory neurons to satisfy the following requirement: Following the 20 stimuli, the mean number of spiked neurons within 20 to 40 ms following each stimulus is less than 5. With these input and output neurons, the network cannot behave so to as avoid stimulation in the initial state. This procedure was performed using 10 selected input neurons, and if there was no combination of such output neurons, then this procedure was performed using the 2 selected input neurons.

The virtual robot moved forward at a constant speed; if the robot approached a wall, the sensors stimulated the input neurons, and if more than 5 out of 10 output neurons fired within the specific time window following the stimulation, then the robot would turn away from the wall, rotating at 180 degrees [29]. We call this setup the closed-loop condition. This cycle was repeated 10 times per experiment. We performed six experiments with three cultures (10–43 days in vivo) with the settings. We also performed an experiment with an open-loop condition, where the stimulus input stopped at random, regardless of the network's neural activity (the other settings were the same as under the closed-loop conditions).

#### 2.3.2 Embodied Cultured Neurons in Two-Dimensional Real Space

In experiment 2, Elisa-3 (GCtronic, Ticino, Switzerland) was used as the mobile robot in real space (Figure 5). Elisa-3 is a small, circular robot with a 2.5-cm radius and has two independently controllable wheels. The front right and front left distance sensors were used as sensory signals to stimulate the neuronal cells. The refresh rate of the robot was 10 frames per second.

A simple sensorimotor mapping was applied to the robot and the cultured neurons. We randomly selected 2 neurons, estimated to be excitatory neurons, as the left- and right-input neurons. At given time intervals (100 ms), the probability PL,R of sending electrical stimulation to the input neuron was controlled by the sensory value of the mobile robot. Specifically, the probability was calculated as
$PL,R=0,SL,R
(9)

If sensor value SL,R (0–950) was less than the threshold T, then PL,R became zero. Otherwise, PL,R was calculated using SL,R/Smax. Here Smax denotes the maximum value of the sensor input (950). Whether the stimulus would be delivered to the input neuron or not was determined using this probability every 100 ms. The threshold T was set to 100. According to this form, the distance from the robot to the wall was encoded as the stimulation frequency.

We also randomly selected 20 neurons, estimated to be excitatory neurons, as output neurons: 10 neurons were used for calculating the speeds of the left and right wheels, respectively. The wheel speeds were calculated based on the number of spikes of the output neurons, which were summed every 100 ms. The left and right wheel speeds VL,R were calculated as
$VL,R=k∑i∈NL,Rvi+CL,R.$
(10)

The positive integers vi were equal to the numbers of spikes of the output neurons over a given time interval (100 ms), and we summed them with the negative constant weight k, and added a positive constant C as a default wheel speed. NL and NR were the sets of left- and right-output neurons. Here, as k was negative and C was positive, the robot moved forward when the output neurons were not active. k was set to −0.3. The default values of CL,R were 12.5, and the values were adjusted individually before the experiments so the robot would go straight without the firing of output neurons. As the activity of the output neurons increased, the speed of the forward movement decreased, until finally the robot moved backwards. Since the two wheels of the robot were independent, the robot could turn when the wheel speeds were different.

Because of these settings, as the robot approached the wall, the sensor values became higher and the stimulus input was applied. The number of spikes of the output neurons was the coefficient of the left and right motor speeds of the robot; thus, to avoid the wall, the robot needed to control the motor speeds.

These conditions made it more difficult for the network to learn an action to avoid the wall than in experiment 1. We performed six experiments with three cultures (26–61 days in vivo), but a technical failure occurred in one of the experiments, so the recorded data for the other five experiments were analyzed.

#### 2.3.3 Embodied Spiking Neural Networks in One-Dimensional Virtual Space

We also performed a simulation experiment similar to experiment 1, which used the simulated spiking neural network. The model for spiking neurons, proposed by Izhikevich [23] and explained above, was used to simulate excitatory neurons and inhibitory neurons. The network consisted of 80 excitatory neurons and 20 inhibitory neurons. This ratio of inhibitory neurons is standard in simulations [22, 23] and similar to biological values [7]. The excitatory neurons were divided into three groups: input (10 neurons), output (10 neurons), hidden (60 neurons). The networks were fully connected; the weight values w for neurons were randomly initialized with uniform distributions as 0 < w < 5 for excitatory neurons, −5 < w < 0 for inhibitory neurons. Only connections between excitatory neurons had synaptic plasticity based on STDP, and the weight values of other connections did not change. Both symmetric and asymmetric STDP, explained above, were used for the synaptic dynamics.

In the simulation experiment, two types of external stimulation conditions were applied: closed-loop and open-loop. In the closed-loop condition, stimulation was delivered at a fixed frequency (100 Hz) with 10 mV, and if more than 5 out of 10 output neurons fired within 10 ms after the stimulation, then the stimulation was removed for 1,000–2,000 ms (randomly chosen each time). Under these conditions, it was possible for the network to learn a behavior to avoid the stimulation. In the open-loop condition, the stimulation was randomly removed, regardless of the firing of output neurons, and any other settings of the stimulation were the same as in the closed-loop condition. In this condition, the network could not learn any behavior that avoided the stimulation.

## 3 Results

We first show the results of experiments 1 and 2, then the results of the simulation experiments. In the analysis of experiments 1 and 2, we investigated the neural dynamics in conditions where it was difficult to learn stimulus-avoidance behavior; we focused on the results of the open-loop condition in experiment 1, in which it is impossible for the network to learn that behavior, as well as the results of experiment 2, in which more networks failed to learn the behavior than in experiment 1.

### 3.1 Cultured Neurons Learn Stimulus-Avoiding Behaviors

#### 3.1.1 Experiment 1

We evaluated the learning results using the reaction time (i.e., the time from the beginning of the stimulation to the time of wall avoidance). Figure 6 shows the learning curves from experiment 1. The vertical axis represents the reaction time; lower values indicate higher learning ability. As shown in Figure 6, in the closed-loop condition, the reaction time rapidly decreased and stabilized, indicating higher learning ability. On the other hand, in the open-loop condition, where the stimulus was randomly applied, the reaction time did not stabilize at lower values, and the variance was higher than that in the closed-loop condition.

Figure 6.

Learning curve for both the closed-loop condition and the open-loop condition, where the stimulus is randomly applied. Statistical results for n = 6 with standard error.

Figure 6.

Learning curve for both the closed-loop condition and the open-loop condition, where the stimulus is randomly applied. Statistical results for n = 6 with standard error.

These results are similar to the results from previous experiments involving large numbers of cultured neurons (10,000–50,000) [40]. This suggests that such learning behaviors scale from small to large numbers of cultured neurons. In addition, we found that the stimulus-evoked firing considerably decreased in some cases of the open-loop conditions where the stimulus was randomly applied (Figure 7), although the stimulus-evoked firing increased in closed-loop conditions [29]. Below, we focus on the data from the open-loop conditions.

Figure 7.

Stimulus-evoked spikes of all neurons in the open-loop conditions where the stimulus was randomly applied. The “pre” columns display the first 5 min of the experiment, and the “post” the last 5 min. The evoked spikes at the end of the experiment (i.e., according to “post”) had decreased.

Figure 7.

Stimulus-evoked spikes of all neurons in the open-loop conditions where the stimulus was randomly applied. The “pre” columns display the first 5 min of the experiment, and the “post” the last 5 min. The evoked spikes at the end of the experiment (i.e., according to “post”) had decreased.

Figure 8 shows mean evoked firing rates. The evoked firing rate, except for input neurons (Figure 8(a)), consisted of spikes within 200 ms after each stimulus. The evoked firing rate of input neurons (Figure 8(b)) consisted of spikes 50 ms after each stimulus; this time window differs from the window mentioned above because we focused on the spikes evoked by each stimulus, excluding the spikes evoked by feedback from other neurons.

Figure 8.

Mean evoked firing rates with the standard errors in the open-loop conditions where the stimuli were randomly applied (n = 6). Here “pre” is the first 5 min in the experiment; “post” is the last 5 min. (a) Mean evoked firing rates of all neurons, except for input neurons (p < 0.03). (b) Mean evoked firing rates of input neurons.

Figure 8.

Mean evoked firing rates with the standard errors in the open-loop conditions where the stimuli were randomly applied (n = 6). Here “pre” is the first 5 min in the experiment; “post” is the last 5 min. (a) Mean evoked firing rates of all neurons, except for input neurons (p < 0.03). (b) Mean evoked firing rates of input neurons.

Besides the qualitative results in Figure 7, the statistical results show that the mean evoked firing rates for all neurons, except for input neurons, significantly decreased in the last 5 min of the experiments, relative to the first 5 min (Wilcoxon signed-rank test, n = 6, p = 0.012) (Figure 8(a)). On the other hand, the evoked firing rates of input neurons did not change significantly (Figure 8(b)). These results imply that the decrease in the number of evoked spikes was not caused by a decrease in the firing rates of the input neurons. This suggests that the cultured neurons tried to learn a behavior to avoid an external stimulation, but if the network could not avoid the stimulus, it tended to ignore the external stimulation.

#### 3.1.2 Experiment 2

In experiment 2, the learning of wall avoidance behavior succeeded in only two out of five experiments. Here, success was defined as the reaction time (i.e., the time from the start of stimulation to the time of wall avoidance) decreasing by 30% or more. The average reaction times from the first 10 min and the last 10 min of the experiments were used to calculate the success rate. We focused on dynamics where LSA failed, examining whether dynamics similar to those from the open-loop conditions in experiment 1 were observed in this more difficult task.

As shown in Figure 9, the stimulus-evoked spikes decreased in all failure cases. These results are similar to the results from the open-loop conditions in experiment 1.

Figure 9.

Stimulus-evoked spikes for all neurons in failure cases from experiment 2; “pre” shows the first 5 min of the experiment, and “post” shows the last 5 min. In all examples, the evoked spikes at the end of the experiment (i.e., according to “post”) decreased.

Figure 9.

Stimulus-evoked spikes for all neurons in failure cases from experiment 2; “pre” shows the first 5 min of the experiment, and “post” shows the last 5 min. In all examples, the evoked spikes at the end of the experiment (i.e., according to “post”) decreased.

The statistical results showed that the mean evoked firing rate of all neurons, except for input neurons, significantly decreased in the last 5 min of the experiment relative to the first 5 min (Wilcoxon signed-rank test, n = 3, p = 0.023) (Figure 10(a)). On the other hand, the mean evoked firing rate of the input neurons did not change significantly (Figure 10(b)).

Figure 10.

Mean evoked firing rates with standard errors in the failure cases (n = 3) from experiment 2. Here “pre” shows the first 5 min of the experiment, and “post” shows the last 5 min. (a) Mean evoked firing rate of all neurons, except for input neurons (p < 0.03). (b) Mean evoked firing rates of input neurons.

Figure 10.

Mean evoked firing rates with standard errors in the failure cases (n = 3) from experiment 2. Here “pre” shows the first 5 min of the experiment, and “post” shows the last 5 min. (a) Mean evoked firing rate of all neurons, except for input neurons (p < 0.03). (b) Mean evoked firing rates of input neurons.

Figure 11 shows the time evolution of the evoked firing rates of input neurons as well as other neurons. As shown in Figure 11, the evoked firing rates of other neurons gradually decreased during the experiments, but the evoked firing rates of input neurons did not decrease.

Figure 11.

Time series of evoked firing rates in the failure cases from experiment 2. The values represent the mean firing rate with standard errors in the specific time window after delivering a stimulus (50 ms for input neurons and 200 ms for other neurons).

Figure 11.

Time series of evoked firing rates in the failure cases from experiment 2. The values represent the mean firing rate with standard errors in the specific time window after delivering a stimulus (50 ms for input neurons and 200 ms for other neurons).

These results imply that the decreased firing rates for all neurons, except for input neurons, were not caused by a decrease in the evoked spikes from the input neurons, but by the weakening of the synaptic connection from the input neurons to the others. Therefore, this suggests that embodied cultured neural networks try to learn an action to avoid external stimulation, but, if the synaptic connections that control the motor output cannot stop the stimulus, the networks tend to ignore the external stimulation, thereby weakening the connection strength from the inputs.

According to the results of the experiments, we found that if the cultured neurons cannot learn a behavior that avoids stimulation, the neural dynamics work to isolate the uncontrollable neurons that receive stimuli that the network cannot learn an action to avoid.

### 3.2 Spiking Neural Networks Reproduce Stimulus-Avoiding Behaviors

In the simulation experiments, we examined whether simulated networks reproduce the stimulus-avoiding behaviors observed in the experiments above, that is, when the network cannot learn an action to avoid input stimuli, the network tends to ignore the uncontrollable input by weakening the connections from the uncontrollable input neurons.

When applying symmetric STDP where the dynamics of LTP and LTD are symmetric (ALTP, ALTD = 1.0; τLTP, τLTD = 20), the synaptic weight from the input neurons increased in both closed-loop and open-loop conditions (Figure 12(a)).

Figure 12.

Time evolution of the mean connection strength from the input neurons to the other neurons with standard error. (a) Symmetric STDP (n = 20). (b) Asymmetric STDP: ALTP = 1.0, ALTD = 1.1; τLTP = 20, τLTD = 24 (n = 20).

Figure 12.

Time evolution of the mean connection strength from the input neurons to the other neurons with standard error. (a) Symmetric STDP (n = 20). (b) Asymmetric STDP: ALTP = 1.0, ALTD = 1.1; τLTP = 20, τLTD = 24 (n = 20).

In the open-loop condition, the stimulation was randomly removed; thus, the network cannot learn the behavior to avoid the stimulation. The results show no dynamics leading to the isolation of the uncontrollable neurons.

On the other hand, when applying asymmetric STDP where the dynamics of LTP and LTD are asymmetric (ALTP = 1.0, ALTD = 1.1; τLTP = 20, τLTD = 24; the peak of LTD is higher and the tail of LTD is longer than that of LTP), the synaptic weights from the input neurons increased in the closed-loop conditions, but decreased in the open-loop conditions (Figure 12(b)). This tendency is the same as that observed in the experiments using the neuronal cultures explained above. Therefore, we found that spiking neural networks with asymmetric STDP reproduce the dynamics to isolate the uncontrollable input neurons as observed in the experiments using cultured neurons.

To examine the conditions where this behavior works, we explored the parameter space, changing ALTD and τLTD in Equation 6. ALTD is the parameter for the strength of LTD. τLTD represents the working time window of LTD. In Figure 13(a), the color represents the value of the selection indicator (SI), which is defined as follows:
$SI=Wci−Woi.$
(11)
Here, Wci denotes the average weight of the connections from input neurons to other neurons in the closed-loop condition and Woi denotes that in the open-loop condition. SI indicates the weight selection: a higher value indicates a higher selection tendency. The selection behavior here refers to the weights of the connections from input neurons with controllable inputs (i.e., those in the closed-loop condition) being reinforced, and weights from input neurons with uncontrollable inputs (i.e., those in the open-loop condition) being depressed.
Figure 13.

(a) Dependence of the performance of stimulus avoidance by weight selection on ALTD and τLTD. The color represents the mean selection indicator (SI) with each parameters (n = 20). (b) Dependence of the STDP curve on the parameters ALTD and τLTD. The color represents the integral value of the STDP function with each parameter.

Figure 13.

(a) Dependence of the performance of stimulus avoidance by weight selection on ALTD and τLTD. The color represents the mean selection indicator (SI) with each parameters (n = 20). (b) Dependence of the STDP curve on the parameters ALTD and τLTD. The color represents the integral value of the STDP function with each parameter.

The results of the asymmetric STDP with the parameters ALTP = 1.0, ALTD = 1.1; τLTP = 20, τLTD = 24 (Figure 3(b)) show the maximum value of SI in the parameter space. With ALTP = 1.0, ALTD = 0.95; τLTP = 20, τLTD = 28 (Figure 3(c)), the shape of the STDP function is similar to that of the classical STDP function observed in vitro and in vivo [6]. The peak of LTD is lower than the peak of LTP, and the working time window of LTD is longer than that of LTP. The value of SI in this region was still positive, implying the networks isolate the uncontrollable input neurons. This suggests that the dynamics can also work in biological neural networks.

Figure 13(b) shows the integral values of the STDP function with the same parameters as in Figure 13(a). In the blue regions, LTD is stronger than LTP; thus, in theory random spikes in presynaptic neurons should decrease the weight from the neurons. The weight selection occurred in the blue regions. However, if such a decrease was too strong (e.g., with the parameter: ALTP = 1.0, ALTD = 1.4; τLTP = 20, τLTD = 30; Figure 3(d)), both Wci and Woi decreased almost to zero. Thus, weight selection did not occur, because LTD was too strong compared to LTP; thus, almost all the weights of connections between neurons decreased, and the networks could not learn anything. Therefore, we found that weight selection occurred in balanced regions where the integral value of LTD is stronger, but not much stronger, than that of LTP.

We examined what kind of connections were weakened by this weight selection dynamics in the spiking neural networks. To simplify, we considered a minimal case with two neurons and one connection (Figure 14) to compare asymmetric STDP with symmetric STDP. In the asymmetric STDP case, if Δtp ≈ Δtd and Δtp < τLTP, Δtd < τLTD, then the connection weight decreases because with the asymmetric STDP, LTD has a stronger effect than LTP (Figure 14(a)). In the symmetric STDP case, the connection does not change much, because LTP and LTD affect the connection equally in that case (Figure 14(b)). Thus, with the asymmetric STDP, if the mean value of the spike intervals that cause LTP (Δtp) and the mean value of the spike intervals that cause LTD (Δtd) are close, as in $∑t=1N$$ΔtpiN$$∑t=0N$$ΔtdiN$, the connection disappears.

Figure 14.

Dynamics of the weight selection of 2 neurons: a presynaptic neuron and a postsynaptic neuron. (a) Asymmetric STDP. If Δtp ≈ Δtd and Δtp < τLTP, Δtd < τLTD, the connection decreases because with asymmetric STDP, LTD has a greater effect than LTP (τLTP and τLTD are the working time windows of LTP and LTD in the STDP function, respectively). (b) Symmetric STDP. The connection between the neurons does not change much, even if the conditions above are satisfied, as LTP and LTD equally affect the connection with the symmetric STDP.

Figure 14.

Dynamics of the weight selection of 2 neurons: a presynaptic neuron and a postsynaptic neuron. (a) Asymmetric STDP. If Δtp ≈ Δtd and Δtp < τLTP, Δtd < τLTD, the connection decreases because with asymmetric STDP, LTD has a greater effect than LTP (τLTP and τLTD are the working time windows of LTP and LTD in the STDP function, respectively). (b) Symmetric STDP. The connection between the neurons does not change much, even if the conditions above are satisfied, as LTP and LTD equally affect the connection with the symmetric STDP.

In large networks, such situations occur if a presynaptic neuron fires independently from the other neurons at a high frequency. The connections from the input neurons, which are stimulated at a high frequency (e.g., more than 20 Hz when τLTP + τLTD = 50 ms), should decrease according to the dynamics of the asymmetric STDP. The stimulation frequency SF [Hz] for the weight selection is as follows: SFk, where k ≈ 103/(τLTP + τLTD) for a minimal case with 2 neurons. This is inconsistent with our results regarding neuronal cultures, in which the networks showed a weight selection behavior at a low frequency (1 Hz). This suggests that this condition on the stimulation frequency should be modified for larger networks.

In our experiments, we focused on the weight selection for input neurons; however, these dynamics should also work inside the networks. A synaptic weight from an active neuron with a high firing rate should decrease, as the conditions are consistent with that of input neurons with the high-frequency stimulation explained above. However, in the case of hidden neurons, since there should be a feedback loop, this decrease leads to a decrease in the firing rate of the active neuron itself; if the firing rate becomes lower than a certain threshold, this weight depression should end. Thus, the synaptic weight should decrease but remain over zero, as in the case of the input neurons observed in our simulation experiments. This effect should stabilize the network through pruning connections from neurons with a high firing rate.

## 4 Discussion

Our previous studies showed that embodied spiking neural networks with STDP learn a behavior as if they avoid stimulation, and the learning dynamics of the stimulus-avoiding behaviors scales from small networks to large networks (from two cells [43] to approximately 50,000 cells [29]). If LSA also works in cultured neurons, a smaller number of cultures than obtained by [40] can learn an action to avoid stimulation. We found that a small number of cultured neurons (42–110 cells) can do so. This suggests that the learning dynamics are based on LSA. In addition, we found that if the network could not learn the behavior to avoid the stimulus, plasticity worked to suppress the influence of the uncontrollable stimulus on the network by weakening the connection from the input neurons. We also demonstrated that spiking neural networks with asymmetric STDP reproduced the stimulus-avoiding behaviors observed in the cultured neurons. Below, we further discuss the stimulus-avoiding behaviors.

In this study, we found that if a network cannot learn the behavior needed to avoid a stimulus, plasticity works to suppress the influence of an uncontrollable stimulus on the network by weakening the connection strength from the input neurons. In neuroscience, this kind of phenomena, in which constant sensor inputs are ignored, is known as neural adaptation, sensory adaptation, or stimulus-specific adaptation. These phenomena are observed in many regions in the brain (e.g., in the auditory system [34, 36, 41, 46]) and also in vitro [50]. Such adaptation can be divided into two types: fast adaptation (less than one second) and slow adaptation (more than a few minutes) [8]. The mechanism of slow adaptation is considered to be synaptic plasticity, as with LTD. One of the possible mechanisms of fast adaptation is synaptic fatigue [42], in which repeated stimulation depletes the neurotransmitters in the synapse, weakening the response of neurons to the stimulation. Some studies suggest that synaptic fatigue is caused by high-frequency stimulation, and occurs in presynaptic neurons [42].

In our experiment on neuronal cultures, low-frequency stimulation (1 Hz) was used, and the results showed that the evoked spikes of the presynaptic neuron (input neurons) did not decrease, and the evoked spikes of all the other neurons decreased for more than 20 min. Therefore, our results with neuronal cultures suggest that the observed behavior resembles slow adaptation caused by LTD. The spiking neural networks with asymmetric STDP reproduced the observed slow adaptation (although the networks have STP that has similar dynamics to that of synaptic fatigue, STP stabilizes the firing rate within 1 s; thus, slow adaptation cannot be caused by STP). Moreover, we found that the phenomena were observed in parameter spaces where the shape of the STDP function was similar to the shape broadly observed in vitro. Therefore, we argue that the mechanism based on asymmetric STDP should work in biological neural networks.

In the field of artificial life, adaptive behavior is and has been a major theme and should be explored further (e.g., see [10, 11, 20]). Many researchers have studied the emergence of adaptive behavior as an outcome of neural homeostasis. Although their models are very insightful, they are too abstract to explain homeostatic properties in biological neural networks. In our previous studies, we found that bio-inspired spiking neural networks with STDP have homeostatic properties that allow networks to learn a behavior to avoid external stimulation from the environment (e.g., wall avoidance behavior) [43]. Our previous and present studies also showed that LSA can work in biological neural networks in vitro [29, 30], suggesting that LSA can explain homeostatic adaptation in vivo.

Recently, Friston proposed the free-energy principle (FEP) [14, 16] based on Bayesian inference by extending the predictive coding model [38]. To minimize free energy (i.e., surprise) in FEP, the way of reconfiguring the internal model is called perceptual inference; on the other hand, reducing surprises using actions is called active inference. Active inference has attracted much attention from the viewpoint of autonomous behavior. Indeed, Friston discussed the relationship between the homeostatic adaptive behavior of animals and active inference [15, 17, 18]. In our framework of stimulus avoidance in neural networks, regarding external stimulation as a surprise, the neural dynamics used to learn actions in order to avoid surprise is similar to an intuitive interpretation of active inference, where an agent behaves to minimize surprise. To examine this insight, we need to calculate the free energy in our framework in future research.

In addition, in this article, we proposed a new principle, saying that if embodied neural networks cannot learn actions to avoid a stimulation, the networks can work to isolate the neurons that are receiving uncontrollable stimulation from the environment. This two-layered homeostatic principle is similar to Ashby's theory of ultrastability, in which the system has two types of homeostasis; if the first regular homeostasis is unstable and its essential variables exceed the limits, then the second homeostasis works to rearrange the system dramatically [1]. The system will reconstruct itself by trial and error until a stable homeostasis is acquired. Ashby suggests that biological systems are ultrastable with these two types of homeostasis [1]. Our results suggest that such a behavior can emerge thanks to the local dynamics of neurons in both biological and bio-inspired artificial neural networks.

The simulation results showed that ignoring an uncontrollable constant stimulus is a strong feature of spiking neural networks with asymmetric STDP. Almost all the connections from the input neurons with uncontrollable stimuli decreased to zero. However, the synaptic weights from the input neurons with controllable inputs (i.e., sensor inputs that the agent can learn to avoid) increased. This suggests that the networks can isolate the input neurons with uncontrollable stimulus inputs, and it can be regarded as dynamics trying to regulate self and nonself. A closed loop of the sensor and motor, in which the motor outputs control the sensor stimulation as in sensorimotor contingency [35], is regarded as self, while an open loop of the sensor and motor is regarded as nonself. The open loop collapses after isolating the sensor neuron. Thus the self-boundary is not limited to the network, but extended to the environment through its body. It is interesting that the dynamics emerge from just the simple local dynamics of neurons.

How to discriminate self from nonself reminds us of a theory of autopoiesis proposed by Maturana and Varela [32]. In addition to the structural viewpoint of regulating the self-boundary explained above, here we discuss the results of self-regulating behavior from an autopoietic point of view.

In autopoiesis, discrimination comes with the boundary between self and nonself. It is not a physical rigid boundary, but a dynamic one: It should be constantly produced and maintained by its system's own processes.

Varela reported a simple mathematical model using artificial chemistry featuring autopoiesis [47]. Two metabolite particles (S) generate one boundary particle (L) catalyzed by a catalytic particle (C). Those boundary particles connect to form a connected boundary, which encloses C and L. The boundary constantly decays and is repaired by the free boundary particles L. This self-organizing process of encapsulating C and L defines self-discrimination. No single particle defines the self-boundary; rather, self-entity only emerges at a certain collective level.

This picture becomes much clearer on taking the immune systems as an example. Vertebrates establish self-nonself discrimination by forming an idiotype network, in which an antibody-antigen chain reaction exists among antibodies according to Jerne's hypothesis [24]. The current understanding of self-nonself discrimination has a molecular-biological basis; however, the acquired immunity still needs to be exploited. A candidate for the explanation is the autopoietic picture. Each antibody can adaptively change the self-boundary. By self-organizing an idiotype network, self-nonself discrimination emerges as a result of the network reactions: The antigen-antibody reaction is suppressed locally for the self antigens, but the reaction is percolated for the nonself antigens. The reaction network determines self-nonself discrimination similarly to how Varela's simple artificial chemistry [47] is determined.

Coming back to the present study, no single neural response determines the self-nonself boundary. Self-nonself boundaries emerge only in neural networks of a certain size. A neural network determines the self-nonself boundary in the same way the immune system does. The boundaries for immune systems and for neural systems are processed dynamically. It is not explicit in the network whether a certain firing pattern of neurons depends on what comes from the outside or from the inside of the network. As in the immune system, a pattern that makes the network's response stronger and causes structural changes in the network is regarded here as nonself.

For example, the controllable input above is initially regarded as a pattern from the outside of the network (nonself). However, as the change of the network progresses and the network learns the behavior to control it, the input will no longer cause large changes in the network and will be regarded as a pattern from the inside of the network (self). The uncontrollable input is initially regarded as a pattern from the outside (nonself) as in the case of controllable inputs; however, by weakening the connections from the sensor according to the dynamics proposed in this article, explicit boundaries like Varela's cellular boundary [47] can be created. Thereby, the inputs are explicitly isolated from the inside and no longer affect the internal network (self).

Further, our previous results, with spiking neural networks consisting of 100 neurons with almost the same models as in this article, could predict a simple causal sequence of stimuli [29]. In that situation, like controllable input, predictable input is initially regarded as an external pattern (nonself) that causes structural changes in the network; however, when the network learns to predict the input, the input no longer affects the network and is regarded as a pattern from the inside (self).

Although we have only discussed the patterns from the environment, this dynamics should also work inside the network (although regulation by action does not occur inside the network and requires coupling with the environment). Inside the network, the dynamics of isolation of uncontrollable patterns and prediction of predictable patterns regulate the boundaries, and the network converges to stable states in which it shows transitions of several patterns. In this way, the neural network can also be regarded as a system that acquires its own stability through the autopoietic process by means of action, prediction, and selection.

## 5 Conclusion

We have presented neural homeodynamic responses to environmental changes in experiments using both biological neural networks and artificial neural networks. As a result of the experiments, we found that the embodied neural networks show two kinds of stimulus-avoiding behaviors: (i) when input stimuli are controllable via actions, the embodied networks learn an action to avoid the stimulation, and (ii) when input stimuli are uncontrollable, the connections from neurons with the uncontrollable input are weakened to avoid influences of the stimulation on other neurons. We argued that these stimulus-avoiding behaviors can be regarded as the dynamics of an autonomous regulation of self and nonself, in which controllable neurons are regarded as self, and uncontrollable neurons are regarded as nonself. This article has introduced neural autopoiesis by proposing the principle of stimulus avoidance. We have thus extended the notion of autopoiesis to neural networks.

## Acknowledgments

The high-density CMOS array used in this study was provided by Professor Andreas Hierlemann, ETH Zürich. This work was supported by a Grant-in-Aid for JSPS Fellows (16J09357), KAKENHI (17K20090), AMED (JP18dm0307009), the Asahi Glass Foundation, and the Kayamori Foundation of Information and Science Advancement. This work is partially supported by the MEXT project “Studying a Brain Model based on Self-Simulation and Homeostasis” in the Grant-in-Aid for Scientific Research on Innovative Areas “Correspondence and Fusion of Artificial Intelligence and Brain Science” (19H04979).

## References

1
Ashby
,
W. R.
(
1960
).
Design for a brain: The origin of adaptive behavior
.
New York
:
Wiley
.
2
Bakkum
,
D. J.
,
Chao
,
Z. C.
, &
Potter
,
S. M.
(
2008
).
Spatio-temporal electrical stimuli shape behavior of an embodied cortical network in a goal-directed learning task
.
Journal of Neural Engineering
,
5
(
3
),
310
323
.
3
Bakkum
,
D. J.
,
Frey
,
U.
,
,
M.
,
Russell
,
T. L.
,
Müller
,
J.
,
Fiscella
,
M.
,
Takahashi
,
H.
, &
Hierlemann
,
A.
(
2013
).
Tracking axonal action potential propagation on a high-density microelectrode array across hundreds of sites
.
Nature Communications
,
4
,
2181
.
4
Brette
,
R.
,
Rudolph
,
M.
,
Carnevale
,
T.
,
Hines
,
M.
,
Beeman
,
D.
,
Bower
,
J. M.
,
Diesmann
,
M.
,
Morrison
,
A.
,
Goodman
,
P. H.
,
Harris
,
F. C.
,
Zirpe
,
M.
,
Natschläger
,
T.
,
Pecevski
,
D.
,
Ermentrout
,
B.
,
Djurfeldt
,
M.
,
Lansner
,
A.
,
Rochel
,
O.
,
Vieville
,
T.
,
Muller
,
E.
,
Davison
,
A. P.
,
El Boustani
,
S.
, &
Destexhe
,
A.
(
2007
).
Simulation of networks of spiking neurons: A review of tools and strategies
.
Journal of Computational Neuroscience
,
23
(
3
),
349
398
.
5
Canepari
,
M.
,
Bove
,
M.
,
Maeda
,
E.
,
Cappello
,
M.
, &
Kawana
,
A.
(
1997
).
Experimental analysis of neuronal dynamics in cultured cortical networks and transitions between different patterns of activity
.
Biological Cybernetics
,
77
(
2
),
153
162
.
6
Caporale
,
N.
, &
Dan
,
Y.
(
2008
).
Spike timing-dependent plasticity: A Hebbian learning rule
.
Annual Review of Neuroscience
,
31
,
25
46
.
7
Cassenaer
,
S.
, &
Laurent
,
G.
(
2007
).
Hebbian STDP in mushroom bodies facilitates the synchronous flow of olfactory information in locusts
.
Nature
,
448
(
7154
),
709
713
.
8
Chung
,
S.
,
Li
,
X.
, &
Nelson
,
S. B.
(
2002
).
Short-term depression at thalamocortical synapses contributes to rapid adaptation of cortical sensory responses in vivo
.
Neuron
,
34
(
3
),
437
446
.
9
Dan
,
Y.
, &
Poo
,
M.-M.
(
2006
).
Spike timing-dependent plasticity: From synapse to perception
.
Physiological Reviews
,
86
(
3
),
1033
1048
.
10
Di Paolo
,
E. A.
(
2000
).
Homeostatic adaptation to inversion of the visual field and other sensorimotor disruptions
. In
J. A.
Meyer
,
A.
Berthoz
,
D.
Floreano
,
H. L.
Roitblat
, &
S. W.
Wilson
(Eds.),
From animals to animats VI: Proceedings of the 6th International Conference on Simulation of Adaptive Behavior
(pp.
440
449
).
Cambridge, MA
:
MIT Press
.
11
Di Paolo
,
E. A.
, &
Iizuka
,
H.
(
2008
).
How (not) to model autonomous behaviour
.
BioSystems
,
91
,
409
423
.
12
Eytan
,
D.
, &
Marom
,
S.
(
2006
).
Dynamics and effective topology underlying synchronization in networks of cortical neurons
.
The Journal of Neuroscience
,
26
(
33
),
8465
8476
.
13
Frey
,
U.
,
Sedivy
,
J.
,
Heer
,
F.
,
Pedron
,
R.
,
Ballini
,
M.
,
Mueller
,
J.
,
Bakkum
,
D.
,
Hafizovic
,
S.
,
Faraci
,
F. D.
,
Greve
,
F.
,
Kirstein
,
K.-U.
, &
Hierlemann
,
A.
(
2010
).
Switch-matrix-based high-density microelectrode array in CMOS technology
.
IEEE Journal of Solid-State Circuits
,
45
(
2
),
467
482
.
14
Friston
,
K.
(
2010
).
The free-energy principle: A unified brain theory?
Nature Reviews Neuroscience
,
11
,
127
.
15
Friston
,
K.
,
FitzGerald
,
T.
,
Rigoli
,
F.
,
Schwartenbeck
,
P.
,
O'Doherty
,
J.
, &
Pezzulo
,
G.
(
2016
).
Active inference and learning
.
Neuroscience and Biobehavioral Reviews
,
68
,
862
879
.
16
Friston
,
K.
,
Kilner
,
J.
, &
Harrison
,
L.
(
2006
).
A free energy principle for the brain
.
Journal of Physiology Paris
,
100
(
1–3
),
70
87
.
17
Friston
,
K.
,
Mattout
,
J.
, &
Kilner
,
J.
(
2011
).
Action understanding and active inference
.
Biological Cybernetics
,
104
(
1–2
),
137
160
.
18
Friston
,
K.
,
Samothrakis
,
S.
, &
Montague
,
R.
(
2012
).
Active inference and agency: Optimal control without cost functions
.
Biological Cybernetics
,
106
(
8
),
523
541
.
19
Hebb
,
D.
(
1949
).
The organization of behavior: A neuropsychological theory
.
New York
:
Wiley
.
20
Iizuka
,
H.
, &
Di Paolo
,
E. A.
(
2007
).
Toward Spinozist robotics: Exploring the minimal dynamics of behavioral preference
.
,
15
(
4
),
359
376
.
21
Ikegami
,
T.
, &
Suzuki
,
K.
(
2008
).
From a homeostatic to a homeodynamic self
.
Biosystems
,
91
(
2
),
388
400
.
22
Izhikevich
,
E. M.
(
2003
).
Simple model of spiking neurons
.
IEEE Transactions on Neural Networks
,
14
(
6
),
1569
1572
.
23
Izhikevich
,
E. M.
(
2004
).
Which model to use for cortical spiking neurons?
IEEE Transactions on Neural Networks
,
15
(
5
),
1063
1070
.
24
Jerne
,
N. K.
(
1974
).
Towards a network theory of the immune system
.
Annales d'Immunologie
,
125C
,
373
389
.
25
Maass
,
W
. (
1997
).
Networks of spiking neurons: The third generation of neural network models
.
Neural Networks
,
10
(
9
),
1659
1671
.
26
MacQueen
,
J. B.
(
1967
).
Some methods for classification and analysis of multivariate observations
. In
L. M. L.
Cam
&
J.
Neyman
(Eds.),
Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability
(
vol. 1
, pp.
281
297
).
Oakland, CA
:
University of California Press
.
27
,
R.
,
Chao
,
Z. C.
, &
Potter
,
S. M.
(
2007
).
Plasticity of recurring spatiotemporal activity patterns in cortical networks
.
Physical Biology
,
4
(
3
),
181
193
.
28
Marom
,
S.
, &
Shahaf
,
G.
(
2002
).
Development, learning and memory in large random networks of cortical neurons: Lessons beyond anatomy
.
Quarterly Reviews of Biophysics
,
35
,
63
87
.
29
Masumori
,
A.
(
2019
).
Homeostasis by action, prediction, selection in embodied neural networks
.
Ph.D. thesis
,
The University of Tokyo
.
30
Masumori
,
A.
,
Maruyama
,
N.
,
Sinapayen
,
L.
,
Mita
,
T.
,
Frey
,
U.
,
Bakkum
,
D.
,
Takahashi
,
H.
, &
Ikegami
,
T.
(
2015
).
Emergence of sense-making behavior by the stimulus avoidance principle: Experiments on a robot behavior controlled by cultured neuronal cells
. In
P. S.
Andrews
,
L. S. D.
Caves
,
R.
Doursat
,
S. J.
Hickinbotham
,
F. A. C.
Polack
,
S.
Stepney
,
T.
Taylor
, &
J.
Timmis
(Eds.),
Proceedings of the 13th European Conference on Artificial Life (ECAL 2015)
(pp.
373
380
).
Cambridge, MA
:
MIT Press
.
31
Masumori
,
A.
,
Sinapayen
,
L.
, &
Ikegami
,
T.
(
2017
).
Learning by stimulation avoidance scales to large neural networks
. In
C.
Knibbe
,
G.
Beslon
,
D. P.
Parsons
,
D.
Misevic
,
J.
Rouzaud-Cornabas
,
N.
Bredche
,
S.
Hassas
,
O.
Simonin
, &
H.
Soula
(Eds.),
Proceedings of the 14th European Conference on Artificial Life (ECAL 2017)
(pp.
275
282
).
Cambridge, MA
:
MIT Press
.
32
Maturana
,
H. R.
, &
Varela
,
F. J.
(
1980
).
Autopoiesis and cognition: The realization of the living
.
Dordrecht, Boston
:
D. Reidel
.
33
Mongillo
,
G.
,
Barak
,
O.
, &
Tsodyks
,
M.
(
2008
).
Synaptic theory of working memory
.
Science
,
319
(
5869
),
1543
1546
.
34
Noda
,
T.
,
Kanzaki
,
R.
, &
Takahashi
,
H.
(
2014
).
Amplitude and phase-locking adaptation of neural oscillation in the rat auditory cortex in response to tone sequence
.
Neuroscience Research
,
79
,
52
60
.
35
O'Regan
,
J. K.
, &
Nöe
,
A.
(
2001
).
A sensorimotor account of vision and visual consciousness
.
Behavioral and Brain Sciences
,
24
(
5
),
939
973
.
36
Parras
,
G. G.
,
Nieto-Diego
,
J.
,
Carbajal
,
G. V.
,
Valdés-Baizabal
,
C.
,
Escera
,
C.
, &
Malmierca
,
M. S.
(
2017
).
Neurons along the auditory pathway exhibit a hierarchical organization of prediction error
.
Nature Communications
,
8
(
1
),
2148
.
37
Potter
,
S. M.
, &
DeMarse
,
T. B.
(
2001
).
A new approach to neural cell culture for long-term studies
.
Journal of Neuroscience Methods
,
110
(
1–2
),
17
24
.
38
Rao
,
R. P. N.
, &
Ballard
,
D. H.
(
1999
).
Predictive coding in the visual cortex: A functional interpretation of some extra-classical receptive-field effects
.
Nature Neuroscience
,
2
(
1
),
79
87
.
39
Sejnowski
,
T. J.
,
Koch
,
C.
, &
Churchland
,
P. S.
(
1988
).
Computational neuroscience
.
Science, New Series
,
241
(
4871
),
1299
1306
.
40
Shahaf
,
G.
, &
Marom
,
S.
(
2001
).
Learning in networks of cortical neurons
.
The Journal of Neuroscience
,
21
(
22
),
8782
8788
.
41
Shiramatsu
,
T. I.
,
Kanzaki
,
R.
, &
Takahashi
,
H.
(
2013
).
Cortical mapping of mismatch negativity with deviance detection property in rat
.
PLoS ONE
,
8
(
12
),
e82663
.
42
Simons-Weidenmaier
,
N. S.
,
Weber
,
M.
,
Plappert
,
C. F.
,
Pilz
,
P. K.
, &
Schmid
,
S.
(
2006
).
Synaptic depression and short-term habituation are located in the sensory part of the mammalian startle pathway
.
BMC Neuroscience
,
7
(
1
),
38
.
43
Sinapayen
,
L.
,
Masumori
,
A.
, &
Ikegami
,
T.
(
2017
).
Learning by stimulation avoidance: A principle to control spiking neural networks dynamics
.
PLoS ONE
,
12
(
2
),
e0170388
.
44
Song
,
S.
,
Miller
,
K. D.
, &
Abbott
,
L. F.
(
2000
).
Competitive Hebbian learning through spike-timing-dependent synaptic plasticity
.
Nature Neuroscience
,
3
(
9
),
919
926
.
45
Tajima
,
S.
,
Mita
,
T.
,
Bakkum
,
D. J.
,
Takahashi
,
H.
, &
Toyoizumi
,
T.
(
2017
).
Locally embedded presages of global network bursts
.
Proceedings of the National Academy of Sciences of the U.S.A.
,
114
(
36
),
9517
9522
.
46
Ulanovsky
,
N.
,
Las
,
L.
, &
Nelken
,
I.
(
2003
).
Processing of low-probability sounds by cortical neurons
.
Nature Neuroscience
,
6
,
391
.
47
Varela
,
F.
,
Maturana
,
H.
, &
Uribe
,
R.
(
1974
).
Autopoiesis: The organization of living systems, its characterization and a model
.
Biosystems
,
5
(
4
),
187
196
.
48
Wagenaar
,
D.
,
DeMarse
,
T. B.
, &
Potter
,
S. M.
(
2005
).
Meabench: A toolset for multi-electrode data acquisition and on-line analysis
. In
2nd International IEEE EMBS Conference on Neural Engineering, 2005
(pp.
518
521
).
New York
:
IEEE
.
49
Wagenaar
,
D. A.
, &
Potter
,
S. M.
(
2002
).
Real-time multi-channel stimulus artifact suppression by local curve fitting
.
Journal of Neuroscience Methods
,
120
(
2
),
113
120
.
50
Whitmire
,
C. J.
, &
Stanley
,
G. B.
(
2016
).
Rapid sensory adaptation redux: A circuit perspective
.
Neuron
,
92
(
2
),
298
315
.
51
,
Y.
,
Kanzaki
,
R.
, &
Takahashi
,
H.
(
2016
).
State-dependent propagation of neuronal sub-population in spontaneous synchronized bursts
.
Frontiers in Systems Neuroscience
,
10
,
28
.
52
,
Y.
,
Mita
,
T.
,
,
A.
,
Yano
,
R.
,
Kanzaki
,
R.
,
Bakkum
,
D. J.
,
Hierlemann
,
A.
, &
Takahashi
,
H.
(
2017
).
Development of neural population activity toward self-organized criticality
.
Neuroscience
,
343
,
55
65
.