## Abstract

The design and implementation of adaptive chemical reaction networks, capable of adjusting their behavior over time in response to experience, is a key goal for the fields of molecular computing and DNA nanotechnology. Mainstream machine learning research offers powerful tools for implementing learning behavior that could one day be realized in a wet chemistry system. Here we develop an abstract chemical reaction network model that implements the backpropagation learning algorithm for a feedforward neural network whose nodes employ the nonlinear “leaky rectified linear unit” transfer function. Our network directly implements the mathematics behind this well-studied learning algorithm, and we demonstrate its capabilities by training the system to learn a linearly inseparable decision surface, specifically, the XOR logic function. We show that this simulation quantitatively follows the definition of the underlying algorithm. To implement this system, we also report ProBioSim, a simulator that enables arbitrary training protocols for simulated chemical reaction networks to be straightforwardly defined using constructs from the host programming language. This work thus provides new insight into the capabilities of learning chemical reaction networks and also develops new computational tools to simulate their behavior, which could be applied in the design and implementations of adaptive artificial life.

## 1 Introduction

The field of molecular computing has made great strides in the implementation of sophisticated information processing systems at the nanoscale, including logic circuits (Qian & Winfree, 2011a; Seelig et al., 2006), distributed algorithms (Chen et al., 2013), and artificial neural networks (Cherry & Qian, 2018; Qian et al., 2011). Such systems often use DNA molecules to implement their computational processes because the sequence-specific nature of DNA chemistry makes it possible to encode structures and interactions based solely on the sequences of the constituent molecules that make up these systems. The ability to process information at the nanoscale has a range of enticing applications in biomedical diagnostics and therapeutics (Chatterjee et al., 2018; Chen et al., 2015; Groves et al., 2016; C. Zhang et al., 2020) and in the control of nanoscale robotic devices (Thubagere, Li, et al., 2017).

The fact that molecular computing devices can mimic the computational behavior of artificial neural networks is of particular interest because it hints at the possibility of implementing adaptive behavior in engineered biochemical systems, if those molecular systems could be made to learn. To date, this has not been accomplished in a molecular implementation of an artificial neural network. However, living systems exhibit a range of adaptive behaviors, and even single-celled organisms have been shown to adapt to changes in their environment (Dexter et al., 2019), with claims made in the literature that even these simple organisms are capable of learning (Hennessey et al., 1979; Wood, 1988). This implies that there may be a molecular basis for learning behaviors in biological systems and, furthermore, that engineered molecular computing systems might be able to replicate some of that behavioral complexity. The design and implementation of learning algorithms for molecular computing systems is of great interest and would enable the development of engineered nanoscale control systems capable of intelligently responding to changes in their environment. For example, smart biomedical devices endowed with molecular learning circuits could be programmed to learn and adapt to the specifics of a particular patient’s physiology.

In mainstream machine learning research, artificial neural networks have become established as a powerful mechanism for training models. Given that previous work has demonstrated the implementation of neural networks using DNA strand displacement reactions (Cherry & Qian, 2018; Qian et al., 2011), neural networks are a promising framework for implementing machine learning algorithms in a molecular computing system. Furthermore, mathematically well-defined learning algorithms exist for training, such as the well-established backpropagation algorithm (Rumelhart et al., 1986). Therefore, in this article, we show that a simple neural network learning algorithm based on backpropagation can be realized as an abstract chemical reacton network (CRN). CRNs are the underlying “programming language” of molecular computing circuits (Cook et al., 2009), and it has been shown that any such network can be compiled into a corresponding network in real wet chemistry using DNA strand displacement reactions (Soloveichik et al., 2010). This provides a potential route to a future laboratory implementation of our circuit designs. Furthermore, simulating CRNs is of great interest for molecular programming.

Here we outline the design of an abstract CRN that can implement backpropagation learning in a multilayer artificial neural network architecture. The neurons in our system utilize the leaky rectified linear unit (leaky ReLU) nonlinearity, which is widely used in modern machine learning models. We demonstrate the use of our system to train a simple neural network to learn a logic function that is not linearly separable, by simulating an deterministic ordinary differential equation (ODE) model of the CRN. We illustrate the correctness of this CRN design by comparing its output to that from a reference implementation of the underlying backpropagation algorithm. We develop a flexible, general-purpose simulator to enable these simulations to be carried out straightforwardly. Our work therefore demonstrates the first implementation of backpropagation learning in a multilayer nonlinear chemical neural network design, thereby opening the door to future development of chemical reaction network designs with advanced learning capabilities.

## 2 Related Work

Previous work by ourselves and others has developed and simulated designs for neural network–like chemical reaction systems capable of learning, from which we draw inspiration for the current work. In early work, we developed designs for systems capable of supervised learning in networks of DNAzyme-based reactions (Lakin et al., 2014) and toehold-mediated DNA strand displacement reactions (Lakin & Stefanovic, 2016). Blount et al. (2017) reported a feedforward chemical network that uses a backpropagation-style algorithm for learning; however, the system design does not implement a rigorous mathematical specification of the algorithm, and thus its behavior is somewhat hard to predict a priori. Other approaches to learning in chemical systems have also been investigated, including training via reinforcement learning (Banda et al., 2014) and learning via weight perturbation learning in a multilayer network of neurons using the tanh nonlinearity (Arredondo & Lakin, 2022). The crucial goal in all of this work has been to establish systems in which the learning process is carried out solely within the simulated chemistry and not via external computational mechanisms. Autonomous chemical learning systems could, at least in principle, be developed to function autonomously and be trained by an experimenter to implement specific computational functions. However, there remains a lack of circuit designs that can implement the standard and well-known algorithm of backpropagation for supervised learning in a well-understood and clearly specified manner; the current article aims to address this.

Theroretical work on chemical neural networks that can compute outputs but cannot learn has yielded promising approaches to use deep learning to program molecular circuits (Vasić et al., 2020). Previous work has discovered rate-independent chemical reaction systems for computing the nonleaky version of the rectified linear unit (ReLU) nonlinearity (Vasić, Soloveichik, & Khurshid, 2020b) and has also explored the potential for DNA-based implementations of binarized neural networks (Linder et al., 2021). However, while using neurons with binarized weights greatly simplifies the forward computation, training such neurons via gradient descent is challenging because gradient descent works by making many small changes to the weight values, which cannot be done if the weights are binarized. Thus an additional continuous representation of the weights must be maintained for training purposes, which is then binarized for carrying out the forward computations (Simons & Lee, 2019).

Much of the work discussed has focused on high-level abstract specifications of CRNs, in which the system design is expressed in terms of transformation rules between abstract chemical reactants and products. While abstract, such specifications can nevertheless be related to experimental science via translation into networks of toehold-mediated DNA strand displacement reactions (D. Y. Zhang & Seelig, 2011). This correspondence was established by Soloveichik et al. (2010) for arbitrary abstract CRNs, and previous work has studied a range of distinct translation schemes (Cardelli, 2010, 2013). Thus abstract CRNs are conveniently high level while also linked to low-level molecular implementations; therefore a range of computational tools to design and test these translations has been developed. Such tools specialized for DNA nanotechnology include the Nuskell and Peppercorn tools developed in the Winfree group (Badelt et al., 2017, 2020), which are also embedded within Python. The Visual DSD system for DNA strand displacement modeling (Lakin et al., 2011) has limited facilities for perturbing the system (Yordanov et al., 2014). With regard to tools more suited for simulations of adaptive and trainable chemical systems, other, more general simulators, such as StochKit2 (Sanft et al., 2011) and GillesPy (Abel et al., 2016), use both time-based and state-based event triggers for system perturbations that can be used to model external interference from an experimenter or other feedback from the chemical environment. The Kappa system includes perturbation events expressed in the Kappa DSL itself (Boutillier et al., 2018), along with alarms that check these periodically. The COEL cloud-based simulation framework (Banda & Teuscher, 2016) uses perturbation definitions based on the Java Math Expression Parser. To our knowledge, these related systems do not permit the use of arbitrary computation in a general-purpose host language to specify arbitrarily complex interventions in the system when those triggers are activated, as is possible with the simulator on which we report herein.

Finally, some experimental work on computational DNA strand displacement circuits has yielded exciting results in terms of neural network implementations. Most notably, Qian et al. (2011) implemented a recurrent Hopfield neural network using catalytic “seesaw” DNA strand displacement reactions (Qian & Winfree, 2011b). That work was subsequently scaled up to much larger networks that could carry out pattern recognition tasks on a version of the MNIST data set of handwritten digits (Cherry & Qian, 2018). These results hint at exciting future developments in molecular learning circuits. However, these systems are still very small and very simple by the standards of modern-day silicon-based computing hardware, and the amount of additional circuitry required to go from a one-shot neural network implementation to a version with built-in learning capabilities is significant. Thus computational work still has a part to play in laying out future paths for this work.

## 3 Chemical Reaction Networks

*species*{

*X*

_{1}, …,

*X*

_{n}} and a finite set of

*reactions*{

*r*

_{1}, …,

*r*

_{q}} over those species. Each reaction

*r*

_{i}has the form

_{ij}and

*m*

_{ij}are nonnegative integer-valued stoichiometric coefficients. (Equivalently, the left-hand and right-hand sides of the reaction are both multisets over the set of species.) The left-hand species are said to be the

*reactants*, and the right-hand species are said to be the

*products*. Note that these may be zero, meaning that the given species does not appear in that reaction as a reactant or product or both, as appropriate.

*stoichiometry*of species

*X*

_{j}in reaction

*r*

_{i}is then defined as

*X*

_{j}caused by a single execution of reaction

*r*

_{i}. As per the law of mass action, the flux through reaction

*r*

_{i}is proportional to the products of the concentrations of all of the reactants, with the constant of proportionality being the nonnegative real-valued rate constant,

*k*

_{i}; we write this as follows:

*X*

_{j}] is raised to the power of ℓ

_{ij}to account for the fact that multiple copies of

*X*

_{j}could in theory be required to be present as reactants.

*X*

_{j}over time, can be expressed as follows:

*X*

_{j},

*r*

_{i}) and flux(

*r*

_{i}) are as defined earlier. For any given abstract CRN, this set of coupled ODEs can be formed via a mechanical translation, with one differential equation per species. Given an initial set of species concentrations, the CRN can be simulated by numerically integrating these ODEs with respect to time using any suitable ODE solver, thereby solving the corresponding initial value problem. (As a practical matter, one can implement such simulations computationally by forming the corresponding

*stoichiometry matrix*, a matrix that records all the stoichiometric coefficients of all species in all reactions in the CRN. In conjunction with a function that calculates the flux through each reaction at each time point based on the current species concentrations, the time derivatives can be calculated via simple matrix multiplication operations.)

## 4 Results

### 4.1 Neural Network Structure and Definition

In this work, we use abstract CRNs to input trainable neural networks composed of leaky ReLU neurons. This nonlinearity is in widespread use in mainstream machine learning research because of its relative simplificity and good learning performance. Figure 1 outlines the schematics of single neurons and the network structure under study. In a single neuron, the input signals are multipled by weights before being passed through the leaky ReLU nonlinearity. The whole network under study accepts two input signals, *X*1 and *X*2, that are passed into two hidden layer neurons, *N*_{0} and *N*_{1}. These produce output signals *H*0 and *H*1, respectively. These signals are fed into a single output layer neuron, *N*_{2}, which produces the overall output signal, *Y*. Each neuron also has a third input, a bias signal that is clamped to −1. Importantly, this is a feedforward network, which means that we can straightforwardly apply the backpropagation algorithm to calculate the weight updates, as outlined in what follows.

*β*= 1 and assume that the values of

*α*and

*β*are identical for all neurons in the network. Given this definition, the output

*Y*of our network for a given

*α*and

*β*can then be defined as follows:

*i*for the result of the linear weighting calculation carried out within neuron

*N*

_{i}, before that value is passed through the leaky ReLU nonlinearity.

### 4.2 Derivations of Loss Derivatives

*Wji*in the network. The backpropagation algorithm for supervised learning in a feedforward neural network (such as ours) works by propagating input signals forward through the network to produce an output signal, using this and the target value to compute the loss function. This loss value is then backpropagated through the network to compute the partial derivative of the loss with respect to each weight via repeated application of the chain rule. Here and henceforth, we follow the notational convention of Mitchell (1997).

*Xji*is the

*i*th input to the

*j*th neuron. Thus the main issue will be to calculate $\u2202L\u2202NETj$ for each neuron in the system. There are two cases to consider: an output neuron (

*N*

_{2}) and a hidden layer neuron (

*N*

_{0}and

*N*

_{1}).

*L*= $12$ · (

*Y*− TARGET)

^{2}, and for the output neuron

*N*

_{2}, we calculate $\u2202L\u2202Y$ and $\u2202Y\u2202NET2$ and then use the chain rule to give an expression for $\u2202L\u2202NET2$. Thus we obtain the following set of loss derivatives with respect to the weights associated with neuron

*N*

_{2}:

*N*

_{1}in general, we would need to sum over the contributions from all downstream neurons, thus

*N*

_{2}), so we can simplify the preceding equation to

*N*

_{1}:

*N*

_{0}can be calculated similarly. We now have expressions for all of the values that our neural network CRN must compute, not just for forward propagation of input signals to produce an output signal and the corresponding loss value but also for the backpropagation of the loss signal through the network to produce loss derivatives and thus the desired weight update values.

### 4.3 Designing a CRN to Directly Compute the Loss Derivatives

Given the mathematical definition of our network’s desired operation, both for output generation and for learning, we can now start to build an abstract CRN implementation of this behavior. This is a key advantage of our approach; by grounding our system in a well-studied and well-understood network implementation and learning algorithm, we have a concrete target against which to test our CRN implementation so as to gauge its performance. In this section, we describe the various features of our CRN neural network structure and implementation. Although the various components of the CRN design are presented separately, our system is a single well-mixed solution in which all reactions are assumed to be occurring simultaneously within a one-pot reaction vessel. As outlined in what follows, the timing of reactions is mediated by the use of a chemical clock signal, but the reactions are all assumed to be occurring in parallel and competing for reactants; indeed, this competition is a key aspect of our system design.

#### 4.3.1 Regulation by a Molecular Clock Signal

*C*25 and

*C*26 with concentration 1.0 (written [

*C*25]) = [

*C*26] = 1.0 and with [

*Ci*] = 10

^{−6}for all other clock species

*Ci*, we obtain a robust oscillation that cycles through precisely one of the clock species going high (with a maximum concentration of 2.0) at any given time, in order. By using these clock species as catalysts to drive other reactions, we can ensure that those reactions experience only a nonnegligible flux through them when the particular clock species in question is the high one. As in previous work (Arredondo & Lakin, 2022; Vasić, Soloveichik, & Khurshid, 2020a), we use only odd-numbered clock species as catalysts, to prevent any overlap between the phases. Our design uses a relatively large number of distinct clock phases, though it is possible that this could be optimized in future work. The division of different parts of the feedforward network computation and backpropgation learning process between the various clock phases is summarized in Table 1; the details of these operations and their implementations are described in the following pages.

Clock phase
. | Operations carried out
. |
---|---|

C1 | Input fanout |

C3 | Input copying and weight calculations for hidden layer neurons (N_{0}) and N_{1} |

C5 | Preparing leaky ReLU calculations for N_{0} and N_{1} |

C7 | Executing leaky ReLU calculations for N_{0} and N_{1} |

C9 | Input copying and weight calculations for output layer neuron (N_{2}) |

C11 | Preparing leaky ReLU calculation for N_{2} |

C13 | Executing leaky ReLU calculation for N_{2} |

C15 | Calculation of network loss |

C17 | Calculation of backpropagation signal into N_{2} |

C19 | Calculation of weight updates for N_{2} and backpropagation signals into N_{1} and N_{0} |

C21 | Calculation of weight updates for N_{1} and N_{0} |

C23 | Application of weight updates |

C25 | Cleanup (degradation of various signals) in preparation for next training round |

Clock phase
. | Operations carried out
. |
---|---|

C1 | Input fanout |

C3 | Input copying and weight calculations for hidden layer neurons (N_{0}) and N_{1} |

C5 | Preparing leaky ReLU calculations for N_{0} and N_{1} |

C7 | Executing leaky ReLU calculations for N_{0} and N_{1} |

C9 | Input copying and weight calculations for output layer neuron (N_{2}) |

C11 | Preparing leaky ReLU calculation for N_{2} |

C13 | Executing leaky ReLU calculation for N_{2} |

C15 | Calculation of network loss |

C17 | Calculation of backpropagation signal into N_{2} |

C19 | Calculation of weight updates for N_{2} and backpropagation signals into N_{1} and N_{0} |

C21 | Calculation of weight updates for N_{1} and N_{0} |

C23 | Application of weight updates |

C25 | Cleanup (degradation of various signals) in preparation for next training round |

#### 4.3.2 Dual-Rail Signal Representation

*dual-rail*representation of a signal

*X*as a

*pair*of chemical species:

*X*

_{p}to represent the positive component and

*X*

_{m}to represent the negative component. We interpret the true value of the signal

*X*and [

*X*

_{p}] − [

*X*

_{m}]. However, for each such species pair, we include a dual-rail annihilation reaction of the form

*k*Annih = 1.0. These reactions do not require a clock species as catalyst and can therefore fire at any time. This ensures that only one of the two dual-rail species will be present at any given time with a nonnegligible concentration, which simplifies the interpretation of the system state.

#### 4.3.3 Input Fan-out

*X*1 and

*X*2) and the bias input (BIAS). The latter always takes the value −1, however, we include both dual-rail species for consistency with the other, similar signals. In the first (

*C*1) clock cycle, these inputs are transferred into multiple locations as required for the calculation of multiple neuron signals that operate on the same inputs. This is achieved by simple reactions, catalyzed by the

*C*1 clock signal, that consume the inputs entirely and produce the same concentration of several output species as products. For example, the dual-rail positive input species

*X*1p is fanned out into corresponding input species

*X*01p and

*X*11p for neurons

*N*

_{0}and

*N*

_{1}, respectively, via the single reaction

*k*= 1.0. Similar reactions are used for the fan-out of

*X*1m and both dual-rail components of the

*X*2 signal. The fan-out reactions for the bias signals are similar but include an additional product species because the bias signal must be duplicated for all three neurons in the network, not just for the two input neurons, for example, for BIASm

#### 4.3.4 Input Weighting

*X*can be transferred into the concentration of an “output” species

*Y*, scaled by the concentration of a “weight” species

*W*, via the pair of competitive reactions

*Y*) is

*k*· [

*W*] · [

*X*], while the flux through the second reaction is just

*k*· [

*X*]. In the absence of other reactions simultaneously producing or consuming these species (which is effectively the case in our clocked CRN implementation), each input

*X*will thus catalyze the first reaction [

*W*] times before it is consumed in the second reaction, leading to the steady state concentration [

*Y*] being the product of the initial values of [

*W*] and [

*X*]. The input

*X*is consumed by this process, which is slightly different to the approach from Buisman et al. (2009), in which the inputs act catalytically to produce the outputs with a scaled concentration. This feature fits well with some aspects of our network design, for example, the fact that the calculation of the leaky ReLU nonlinearity depends on one of the positive or negative dual-rail signals annihilating the other. However, we do need the original input values for the backpropagation calculations, because the partial loss derivatives with respect to the weights do depend on the input values. Therefore we must also copy the input values so they are available during backpropagation, as outlined in the Input Copying section.

*N*

_{0}is as follows:

*sum*of the weight inputs, as required.

#### 4.3.5 Input Copying

*N*

_{0}, this can be achieved via the reactions

*K*) species. Because these reactions are occurring in parallel with the input weighting reactions outlined previously, the combination of this catalysis reaction and the input degradation reactions shown as part of the input weight reactions means that the final concentration of each copy species matches the initial concentration of the corresponding input species. Thus the input copying reactions record the input values for use in the subsequent weight update calculations specified in the backpropagation algorithm, and because the inputs serve as catalysts in these reactions, the operation of the input weighting reactions is unaffected. Similar reactions take place in clock phase

*C*3 for the

*N*

_{1}inputs, and similar reactions also take place in clock phase

*C*9 for the inputs to the output neuron

*N*

_{2}(which are the outputs from the hidden layer neurons, as shown in Figure 1(c)).

#### 4.3.6 Calculation of Leaky ReLU Nonlinearity

Having calculated the weighted sums of the inputs, and copied the input values as appropriate, the next step in the feedforward network computation is to pass the weighted inputs through the leaky ReLU nonlinearity. Previous work (Linder et al., 2021; Vasić, Soloveichik, & Khurshid, 2020b) has uncovered rate-independent CRNs for calculating the nonleaky variant of the ReLU nonlinearity; however, here we wish to use the leaky variant and thus must design our own, new CRN.

*N*

_{0}based on the value of the corresponding weighted sum of the inputs, which we call NET0 in the case of

*N*

_{0}. We assume the existence of a pair of species that store the values of the two different gradients for the leaky ReLU unit, corresponding to the

*α*and

*β*constants from the definition of the leaky ReLU transfer function. These do not need to be dual-rail species, as the leaky ReLU gradients must be positive. There is a separate pair of such species for each neuron, so the gradients could be varied on a per-neuron basis if desired, though here we do not do so. For neuron

*i*, these species are DPOS

*i*and DNEG

*i*for the gradients for positive and negative

*x*input values, respectively. One of these will then be copied into the

*Di*species, which represents the correct gradient to be applied to the waiting input value. This is achieved for neuron

*N*

_{0}via the following reactions:

*x*= 0, if required (see Discussion). These are simple copying reactions using the scheme of Buisman et al. (2009). They work as expected because the dual-rail annihilation reactions will have already ensured that only one of NET0p and NET0m is present at the start of the

*C*5 clock phase. Then, the value of either DPOS0 or DNEG0 will be copied into the

*D*0 species. (For the sake of simplifying the model setup code, we use a dual-rail species for

*Di*, even though it will only ever be a positive value.) We assume that the correct concentrations of the DPOS

*i*and DNEG

*i*species for each neuron have been supplied in the initial system setup, and our network does not modify these values during its execution. We also assume that the values of the

*Di*species are neglible at the start of each training round; this is true in our initial setup, and the cleanup phase at the end of each training round (outlined later) ensures that this invariant holds for subsequent training rounds. Similar reactions take place in the

*C*5 clock phase for the second hidden layer neuron,

*N*

_{1}, and in the

*C*11 clock phase for the output neuron,

*N*

_{2}.

*N*

_{0}are as follows:

*N*

_{0}is stored in the

*H*0 dual-rail species. Similar reactions take place in the

*C*7 clock phase for neuron

*N*

_{1}, producing the

*H*1 output signal. Finally, similar reactions take place in the

*C*13 clock phase for the output neuron

*N*

_{2}, producing the

*Y*signal. The concentration of this dual-rail signal represents the overall output from the feedforward network computation. Importantly, the chosen gradient values remain stored in the

*Di*species after this phase has completed, which means they are available for subsequent use in the backpropagation algorithm that calculates the weight updates, as outlined in what follows.

#### 4.3.7 Calculation of Network Error

Having completed the feedforward network computation, before we can start the backpropagation learning stage, we must first calculate the network error (also known as loss) between the output signal *Y* produced by the output neuron and the expected output value TARGET, which we assume is supplied by the experimenter along with the input signals at the start of each training round.

In fact, what we actually calculate to serve as the input to the backpropagation algorithm is the partial derivative of the loss with respect to the network output, $\u2202L\u2202Y$. Given that we define the loss as *L* = $12$ · (*Y* − TARGET)^{2}, this partial derivative can be computed straightforwardly as $\u2202L\u2202Y$ = *Y* − TARGET. This obviates the need to calculate the square of the difference between the output *Y* and the target output TARGET, although this could in principle be computed by a CRN using previously reported techniques (Buisman et al., 2009).

*E*using the following four simple reactions, which occur in the

*C*15 clock phase:

*Y*and TARGET species into the

*E*dual-rail signal, with the signs matched in the case of

*Y*and inverted in the case of TARGET, so as to correcly realize the subtraction operation.

#### 4.3.8 Calculation of Loss Derivatives via Backpropagation

The loss derivative $\u2202L\u2202Y$ having been calculated as the dual-rail species *E*, this can serve as the input to the backpropagation calculations required to compute the weight updates for gradient descent learning. This is the crux of our CRN learning algorithm. The outline structure of the backpropagation calculations is summarized in Figure 2. Once the input signals and chosen leaky ReLU gradients have been copied as outlined earlier, the backpropagation phase is just a series of straightforward multiplications, following the derivations presented in section S1 of the Supplemental Material.

For each neuron, we divide the backpropagation calculation into two parts, the computing of the backpropagation signal “into” that neuron, which is the same for all weight updates associated with that neuron, and the computing of the individual weight updates, which differ between the various weights associated with a given neuron. To simplify the analysis of the CRN, these occur in distinct clock phases, although they could be combined if reducing clock phases were a concern. We use species *Qi* to represent the shared incoming value for neuron *N*_{i} in the backpropagation calculation.

*N*

_{2}, the value stored in the

*Q*2 signal represents the value of $\u2202L\u2202NET2$, which is calculated (see section S1 of the Supplemental Material) by multiplying the

*E*signal generated earlier by the leaky ReLU gradient for

*N*

_{2}in this training round, which is stored in the

*D*2 species, as outlined previously. This is a straightforward dual-rail multiplication process, implemented by the following reactions which occur in the

*C*17 clock phase:

*Q*2 signal having been calculated, this can be used to compute the partial loss derivatives for the

*N*

_{2}weights. It must also be propagated backward through the network to serve as one of the inputs for the equivalent calculations for the hidden layer neurons,

*N*

_{1}and

*N*

_{0}. Because these reactions will consume the

*Q*2 species, as earlier, these processes need to happen in parallel, during the

*C*19 clock phase. With regard to the

*N*

_{2}weight updates, these are calculated by multiplying the value stored in

*Q*2 by the copied input value associated with each weight, producing a dual-rail signal Delta

*Wij*representing the partial derivative of the network loss with respect to weight

*Wij*. (These will be scaled by the learning rate parameter and actually applied to the weights in a subsequent phase, outlined later.) Given that the input scaled by weight

*Wij*has already been copied into the species

*Kij*, it follows that we must multiply

*Q*2 by

*K*2

*j*to produce the corresponding weight update Delta

*W*2

*j*. This is implemented via the following reactions:

*C*19 clock phase to calculate Delta

*W*21 and Delta

*W*22. Note, however, that the degradation reactions shown for

*Q*2p and

*Q*2m should

*not*be duplicated; otherwise, this will be the equivalent of multiplying the rate constant for those reactions, and the calculations will be incorrect because the competition will be biased in favor of the degradation reactions.

*Q*2 species must be used to calculate

*Q*1 and

*Q*0, as outlined in Figure 2. Considering neuron

*N*

_{1}as an example, the value of

*Q*1 represents $\u2202NET2\u2202NET1$ · $\u2202L\u2202NET2$; thus we must multiply the value of

*Q*2 by

*W*22 and also by the leaky ReLU gradient from neuron

*N*

_{1}in this cycle, which is stored as

*D*1. This double multiplication can be achieved via the following reactions, which follow the same approach as for multipication, except with an additional catalyst representing the extra multiplier; note that there are now eight reactions to cover all combinations of signs for the three dual-rail inputs to this multiplication process:

*Q*2p and

*Q*2m already specified to run in the

*C*19 clock phase to provide the competition for these multiplication reactions. Similar reactions are run simultaneously in the

*C*19 phase to compute the corresponding

*Q*0 signal for

*N*

_{0}. These are followed by reactions during the

*C*21 clock phase to calculate the weight updates for the weights associated with

*N*

_{1}and

*N*

_{0}; these follow the same pattern as described for computing the weight updates for the output neuron

*N*

_{2}. Thus, once the

*C*21 clock phase finishes, the backpropagation phase of our learning CRN will have computed all of the partial loss derivatives $\u2202L\u2202Wij$ and stored them in corresponding Delta

*Wij*dual-rail chemical species.

#### 4.3.9 Application of Weight Updates

Having calculated the partial loss derivatives, we can now apply the updates to the weights. Note that each weight cannot be updated immediately as the partial loss derivatives are calculated because the original weight values used in the feedforward network computation phase of each training cycle are required to compute the backpropagation signals; these computations are going on at the same time, as outlined previously. Thus we compute the partial loss derivatives as earlier for all weights, then apply the weight updates in the next clock phase (*C*23) after the backpropagation process has concluded.

*α*(a small positive constant that scales the size of the step taken at each training round) and subtract these products from the current value of the corresponding weight to produce the new weight value. Given that the learning rate must be positive, we will represent this as the concentration of a single species Alpha and note it as a dual-rail species, which will simplify these reactions. Furthermore, the subtraction reaction can be combined with the multiplication by inverting the signs of the products with respect to the dual-rail inputs. For example, the reactions associated with updating weight

*W*00 are as follows:

*C*23 for all of the other weights, meaning that all weights are updated together after the backpropagation phase has concluded, in preparation for the next round of training.

#### 4.3.10 Cleanup at the End of Training Cycle

*E*dual-rail species that stored the partial derivative of network loss with respect to network output, the

*Di*dual-rail species that stored the leaky ReLU gradient used by each neuron in the last cycle, and the

*Kij*dual-rail signals that stored the duplicated values of each input value presented to the network in the preceding training round. These are removed via simple degradation reactions catalyzed by the

*C*25 clock signal, which is the last one actually used to drive reactions in our circuit design. For example, the

*E*dual-rail signal is reset to approximately zero by the following two reactions:

#### 4.3.11 Chemical Reaction Network Size

The numbers of species and reactions required are a common metric for CRN design complexity; the CRN design outlined herein includes a total of 135 species and 315 reactions. This is similar to the simpler of the two similar networks from our previous work that we trained via weight perturbation (Arredondo & Lakin, 2022); that CRN involved 144 species and 286 reactions. However, only one weight could be adjusted per training round in that version, while our backpropgation network adjusts every weight in each training cycle. The similar network that could adjust every weight in each training cycle via weight perturbation was significantly larger, involving 583 species and 1,213 reactions. Thus the design outlined herein improves on previous work to enable the entire network to be trained using a more compact design than previously available. A full listing of the backpropagation CRN species and reactions is presented in section S3 of the Supplemental Material.

#### 4.3.12 Initial Network State

In the initial state of the CRN, the concentrations of the clock species *C*25 and *C*26 are set to 1.0, with all other clock species concentrations set to 10^{−6}. Thus the system starts in the “last” cycle of the oscillator, and the first full training round will begin shortly after the simulation begins. The dual-rail weight species *Wij* are initialized with values corresponding to the initial values of all the corresponding weights in the untrained network. The DPOS*i* and DNEG*i* species for each neuron *N*_{i} are initialized with the values for the leaky ReLU gradient parameters *α* and *β* for each neuron. (These could, in principle, be different for each neuron.) Finally, the Alpha species is assigned a concentration corresponding to the value of the learning rate parameter for the network. The initial concentrations of all other CRN species are set to zero.

#### 4.3.13 Training Implementation

The CRN is trained by increasing the concentrations of the species representing the dual-rail input signals (*X*1, *X*2, and BIAS) at the start of each *C*1 oscillator phase, which corresponds to the start of a new training cycle. The amount of the increase represents the value of that signal for this training cycle. An additional concentration of the TARGET species, representing the expected output of the network given these inputs, is also supplied. These additions represent the actions of an experimenter external to the system; thus, the system will undergo *supervised learning*. We assume that the volume increase associated with these additions is neglible so that we do not need to model dilution effects. This is repeated at the start of the next training cycle, and so on. These additions are modeled via a system of *model perturbations* implemented in our simulation engine, which we describe later.

### 4.4 Simulation Methodology

Having established a design for abstract CRNs capable of backpropagation learning, we need a methodology to easily simulate their behavior. To this end, we have developed ProBioSim, a simulator that incorporates features specifically to simulate such systems, in which the CRN component interacts with an environment that is specified externally to the CRN itself but that can modify the CRN in an arbitrarily complex, state- and time-dependent manner. Given that existing designs for molecular systems are currently beyond our experimental capabilities, many of these potential applications of DNA nanotechnology have been studied *in silico*. This motivates the development of software tools targeted specifically for the study of the interactions of engineered molecular systems with externally specified environments.

#### 4.4.1 ProBioSim Simulator Outline

As outlined in Figure 3(a), ProBioSim is a Python library that incorporates domain-specific language (DSL) for molecular circuits (Lakin & Phillips, 2020). The DSL included within ProBioSim is a simple syntax for specifying the reactions, rate constants, and simulation settings required to define and simulate the behavior of abstract CRNs. This language also includes the capability to specify simple external interventions to add or remove quantities of certain chemical species at specified points in simulation time. This syntax can encode unconditional, deterministic perturbations of the system like those used in previous work on supervised learning in chemical systems (Lakin & Stefanovic, 2016) and allows for rapid prototyping and testing of candidate CRN designs. The fact that ProBioSim is based on Python means that it can be trivially integrated with systems for rapid prototyping and visual feedback, such as Jupyter Notebooks (Kluyver et al., 2016), which also enhances reproducibility of modeling and simulation workflows.

There are two types of perturbation in ProBioSim: perturbation *actions* and perturbation *functions*. In each case, the perturbation is scheduled to occur at a specific time point in the simulation, at which point the simulation (either stochastic or deterministic) is paused, the specified changes to the simulation state are applied, and the simulation is restarted, as outlined in Figure 3(b). Multiple perturbation actions and functions can be specified to execute at the same time point.

#### 4.4.2 Perturbation Actions

*Perturbation actions* are explicitly specified modifications to the state of the simulation. They can be specified within the text-based CRN specification language included within ProBioSim and can be exported to and imported from text files that specify the model. The set of perturbation actions that can be specified in this way is limited to the following:

*increment*the concentration or count of a species by a specified amount,*decrement*the concentration or count of a species by a specified amount, or*set*the concentration or count of a species to a specified absolute amount.

perturbation X + = 10 @ 5.0

#### 4.4.3 Arbitrary Perturbation Functions Expressed as Python Code

The real power of the ProBioSim system comes from its ability to define perturbations to the CRN system in terms of arbitrary code. This is possible because, in contrast to some other molecular design tools (Lakin et al., 2011), ProBioSim is embedded within a host language: the Python general-purpose programming language. This provides a ready-made language for expressing arbitrary responses to CRN behaviors, which can be used to simulate the interactions of the CRN with an experimenter or a responsive external environment. Such *perturbation functions* are modifications to the simulation state that are specified in terms of functions written in Python. As such, perturbation functions are significantly more powerful than perturbation actions, as they allow arbitrary computations to be carried out to determine how the state of the system should be updated. ProBioSim itself does not check the content of user-supplied perturbation functions. However, those perturbation functions must be written in a particular way to enable them to be called by the simulator as required. Specifically, the form of a perturbation function must be as follows:

def pfun(t, x0, get, set, adjust):

# … perturbation code here …

t is the simulation time at which the function is called;

x0 is the simulation state passed into the function;

get is a function that, when called as get(x0, sp), returns the current value associated with species sp in the current state x0;

set is a function that, when called as set(x0, sp, n), updates the state x0 by assigning the value n to the species sp; and

adjust is a function that, when called as adjust (x0, sp, n), updates the state x0 by adding n (which could be negative) to the value currently assigned to the species sp.

def pfun(t, x0, get, set, adjust):

print(‘>>> Value of X__before__perturbation at time’ +

str(t) + ‘=’ + str(get(x0, ‘X’)))

adjust(x0, ‘X’, 10.0)

print(‘>>> Value of X__after__perturbation at time’ +

str(t) + ‘=’ + str(get(x0, ‘X’)))

This approach to defining perturbations obviates the need to define a full DSL capable of implementing a desired set of environmental responses to the behavior of the simulated CRN. In addition, free variables in the perturbation function can be used to pass in additional information to the perturbation function. For example, random number generator objects can be straightforwardly defined in the simulation setup code and referenced as free variables in the perturbation function. In this way, randomized responses can be trivially implemented, as outlined in section S4 in the Supplemental Material. Other uses of this technique could include passing a file handle into the simulator to enable facile logging of the simulation state at each perturbation.

#### 4.4.4 Applying Perturbations

Perturbations can be defined in any order in the input and are organized into time order by the simulator. If multiple perturbation actions or functions are specified at the same time point (or at time points that are indistinguishable up to the tolerance of floating-point arithmetic), then all of the perturbation *actions* will be applied to the simulation first, followed by all of the perturbation *functions*. Within these two categories, the actions and functions will be applied in the order in which they were added to the model. (Note that changing the order of addition could therefore change the overall result of the perturbations applied at a given time point.)

After the perturbation actions and functions are applied, the resulting state must be checked to ensure the integrity of the simulation. First, the resulting state of the system is checked to ensure that there are no negative values; if there are, these are set to zero (negative concentrations or species counts being unrealistic). Then, if the simulation is stochastic, any noninteger values are converted to integers, by *rounding down* if necessary. This prevents fractional species counts being recorded. The simulator is then restarted, beginning from the new state that resulted from the application of the perturbations.

#### 4.4.5 Model Import and Export

ProBioSim models can be saved to a human-readable text file and subsequently reparsed back from that file (with the exception of perturbation functions, which must be added to the model programmatically). Models can also be typeset via generation of corresponding LaTeX code. However, all model construction and modification functions can also be carried out programmatically from a Python script, enabling the automated, modular construction of large-scale CRNs. See section S4 in the Supplemental Material for example model definitions specified partially as text and defined programmatically using Python code.

#### 4.4.6 Simulating ProBioSim Models

*X*

_{i}, the corresponding ODE is given by

*r*

_{j}are the reactions included in the model, the stoich(

*X*

_{i},

*r*

_{j}) function returns the stoichiometry (net change) in

*X*

_{i}from one occurrence of reaction

*r*

_{j}, and the rate(

*r*

_{j}) function returns the rate constant value associated with reaction

*r*

_{j}. The system of such mass-action ODEs for all species

*X*

_{i}in the model is formed and then integrated over time using the solve_ivp function from the scipy.integrate module (Virtanen et al., 2020). The user can choose between using a stiff solver, which uses the LSODA method, or a nonstiff solver, which uses the RK45 Runge–Kutta method.

Stochastic simulations are implemented using Gillespie’s (1977) direct method, which provides an exact implementation of stochastic mass-action kinetics. Briefly, for each reaction *r*_{j}, the counts of all reactant species are multiplied together and by the reaction’s rate constant rate(*r*_{j}) to calculate the propensity *ρ*_{j} of that reaction. The next reaction is chosen at random, with the probability of choosing *r*_{j} given by *ρ*_{j}/(∑_{j}*ρ*_{j}), that is, with probability proportional to the corresponding reaction propensity. The *time* until the next reaction fires is drawn from an exponential distribution with mean 1/(∑_{j}*ρ*_{j}). The reaction is applied to update the state, the propensities are recalculated, and the simulation loop moves on to the next iteration. (Note, however, that our backpropagation learning simulations will only use the deterministic ODE solving simulation method.)

In the following section, we present results from learning simulations on our backpropagation- based chemical neural network design, carried out using the ProBioSim simulator. We refer the reader to section S4 of the Supplemental Material for further examples of ProBioSim model code and simulation outputs, including some examples involving state-dependent and probabilistic perturbations that show the full power of the ProBioSim approach, including some examples of associative learning in simple abstract chemical reaction networks.

### 4.5 Learning Simulation Results

To demonstrate the successful operation of our CRN, we simulated it learning the XOR logic function. The truth table for this function can be summarized as follows:

F xor F = F F xor T = T T xor F = T T xor T = F

The specific CRN instance that we simulated starts with the following initial weight values: *W*00 = −0.8, *W*01 = −0.8, *W*02 = −0.8, *W*10 = −2.0, *W*11 = −0.8, *W*12 = −1.0, *W*20 = −1.2, *W*21 = 1.6, and *W*22 = −1.0. For each neuron *i*, the values of DPOS*i* and DNEG*i*, representing the gradients for the positive and negative arms of the leaky ReLU transfer function, are 1.0 and 0.1, respectively. The value of the learning rate parameter was *α* = 0.5. To train the system, the following sequence of eight training instances was repeated three times, for a total of 24 training rounds:

*X*1 = −1,*X*2 = −1, TARGET = −1;*X*1 = −1,*X*2 = 1, TARGET = 1;*X*1 = 1,*X*2 = −1, TARGET = 1;*X*1 = 1,*X*2 = 1, TARGET = −1;*X*1 = −0.5,*X*2 = −0.5, TARGET = −1;*X*1 = −0.5,*X*2 = 0.5, TARGET = 1;*X*1 = 0.5,*X*2 = −0.5, TARGET = 1; and*X*1 = 0.5,*X*2 = 0.5, TARGET = −1.

The learning CRN corresponding to this system was compiled into reactions and perturbations, and the resulting ProBioSim model is presented in section S3 of the Supplemental Material. This CRN was simulated by integrating a deterministic ODE model, and the resulting weights were used to calculate the decision surface after certain training rounds; these are plotted as heat maps in Figure 4. The heat maps show the decision surface induced by the weights shifting toward the desired form over the course of the training rounds; eventually, it ends up in the expected shape, with a band of high output running diagonally between the (−1, +1) and (+1, −1) corners of the space, with low output in the vicinity of the (−1, −1) and (+1, +1) corners. This qualitatively indicates that our CRN system is indeed able to learn the challenging XOR example using the backpropagation learning algorithm. To illustrate the sequence of key operations that takes place during training, section S2 of the Supplemental Material presents example time course figures from the first training round of this simulation, covering both the feedforward and backpropagation phases.

To assess the quantitative performance of our learning CRN, we compared the sequence of weights obtained in this simulation with those that would be expected to arise from a direct reference implementation of the backpropagation algorithm. We calculated the difference between the weight values observed at the end of each training round in our simulated learning CRN and the corresponding weight values obtained from a reference implementation written in Python. These values are presented in Table 2. We observe accuracy to at least two decimal places in all cases and to three or more in many cases. This indicates that our system is quantitatively correct in that its behavior closely matches that of the underlying learning algorithm that it aims to implement. We briefly outline some potential sources of error in the following pages.

Round
. | W00
. | W01
. | W02
. | W10
. | W11
. | W12
. | W20
. | W21
. | W22
. |
---|---|---|---|---|---|---|---|---|---|

1 | −0.0029 | −0.0029 | −0.0029 | 0.0018 | 0.0018 | 0.0018 | −0.0010 | 0.0028 | 0.0043 |

2 | −0.0030 | −0.0030 | −0.0028 | −0.0013 | −0.0013 | 0.0049 | −0.0007 | 0.0028 | 0.0032 |

3 | −0.0031 | −0.0029 | −0.0029 | −0.0048 | 0.0022 | 0.0014 | −0.0003 | 0.0029 | 0.0007 |

4 | −0.0005 | −0.0055 | −0.0056 | −0.0045 | 0.0018 | 0.0011 | −0.0000 | 0.0045 | 0.0002 |

5 | −0.0005 | −0.0055 | −0.0056 | −0.0046 | 0.0018 | 0.0010 | −0.0000 | 0.0045 | 0.0003 |

6 | −0.0006 | −0.0056 | −0.0055 | −0.0054 | 0.0014 | 0.0014 | 0.0000 | 0.0045 | 0.0006 |

7 | −0.0006 | −0.0055 | −0.0056 | −0.0060 | 0.0017 | 0.0011 | 0.0000 | 0.0045 | 0.0010 |

8 | −0.0004 | −0.0057 | −0.0057 | −0.0060 | 0.0017 | 0.0010 | 0.0000 | 0.0047 | 0.0010 |

9 | −0.0004 | −0.0057 | −0.0057 | −0.0063 | 0.0014 | 0.0008 | 0.0001 | 0.0048 | 0.0010 |

10 | −0.0005 | −0.0057 | −0.0057 | −0.0067 | 0.0010 | 0.0012 | 0.0001 | 0.0048 | 0.0014 |

11 | −0.0005 | −0.0057 | −0.0057 | −0.0072 | 0.0015 | 0.0007 | 0.0001 | 0.0048 | 0.0018 |

12 | −0.0006 | −0.0056 | −0.0057 | −0.0072 | 0.0015 | 0.0007 | 0.0002 | 0.0047 | 0.0018 |

13 | −0.0005 | −0.0056 | −0.0057 | −0.0069 | 0.0017 | 0.0009 | 0.0001 | 0.0046 | 0.0020 |

14 | −0.0006 | −0.0056 | −0.0057 | −0.0072 | 0.0016 | 0.0011 | 0.0001 | 0.0046 | 0.0024 |

15 | −0.0006 | −0.0056 | −0.0057 | −0.0075 | 0.0017 | 0.0009 | 0.0000 | 0.0046 | 0.0028 |

16 | −0.0003 | −0.0058 | −0.0058 | −0.0075 | 0.0017 | 0.0009 | 0.0000 | 0.0048 | 0.0028 |

17 | −0.0003 | −0.0057 | −0.0058 | −0.0074 | 0.0018 | 0.0010 | 0.0000 | 0.0048 | 0.0029 |

18 | −0.0004 | −0.0058 | −0.0058 | −0.0077 | 0.0015 | 0.0013 | −0.0000 | 0.0048 | 0.0033 |

19 | −0.0004 | −0.0057 | −0.0058 | −0.0080 | 0.0018 | 0.0010 | −0.0001 | 0.0048 | 0.0037 |

20 | −0.0005 | −0.0056 | −0.0057 | −0.0081 | 0.0019 | 0.0010 | 0.0001 | 0.0046 | 0.0037 |

21 | −0.0005 | −0.0056 | −0.0057 | −0.0074 | 0.0022 | 0.0013 | −0.0000 | 0.0045 | 0.0037 |

22 | −0.0006 | −0.0057 | −0.0056 | −0.0067 | 0.0026 | 0.0010 | −0.0007 | 0.0045 | 0.0040 |

23 | −0.0006 | −0.0057 | −0.0056 | −0.0066 | 0.0025 | 0.0010 | −0.0015 | 0.0045 | 0.0042 |

24 | −0.0003 | −0.0058 | −0.0057 | −0.0065 | 0.0025 | 0.0010 | −0.0015 | 0.0047 | 0.0041 |

Round
. | W00
. | W01
. | W02
. | W10
. | W11
. | W12
. | W20
. | W21
. | W22
. |
---|---|---|---|---|---|---|---|---|---|

1 | −0.0029 | −0.0029 | −0.0029 | 0.0018 | 0.0018 | 0.0018 | −0.0010 | 0.0028 | 0.0043 |

2 | −0.0030 | −0.0030 | −0.0028 | −0.0013 | −0.0013 | 0.0049 | −0.0007 | 0.0028 | 0.0032 |

3 | −0.0031 | −0.0029 | −0.0029 | −0.0048 | 0.0022 | 0.0014 | −0.0003 | 0.0029 | 0.0007 |

4 | −0.0005 | −0.0055 | −0.0056 | −0.0045 | 0.0018 | 0.0011 | −0.0000 | 0.0045 | 0.0002 |

5 | −0.0005 | −0.0055 | −0.0056 | −0.0046 | 0.0018 | 0.0010 | −0.0000 | 0.0045 | 0.0003 |

6 | −0.0006 | −0.0056 | −0.0055 | −0.0054 | 0.0014 | 0.0014 | 0.0000 | 0.0045 | 0.0006 |

7 | −0.0006 | −0.0055 | −0.0056 | −0.0060 | 0.0017 | 0.0011 | 0.0000 | 0.0045 | 0.0010 |

8 | −0.0004 | −0.0057 | −0.0057 | −0.0060 | 0.0017 | 0.0010 | 0.0000 | 0.0047 | 0.0010 |

9 | −0.0004 | −0.0057 | −0.0057 | −0.0063 | 0.0014 | 0.0008 | 0.0001 | 0.0048 | 0.0010 |

10 | −0.0005 | −0.0057 | −0.0057 | −0.0067 | 0.0010 | 0.0012 | 0.0001 | 0.0048 | 0.0014 |

11 | −0.0005 | −0.0057 | −0.0057 | −0.0072 | 0.0015 | 0.0007 | 0.0001 | 0.0048 | 0.0018 |

12 | −0.0006 | −0.0056 | −0.0057 | −0.0072 | 0.0015 | 0.0007 | 0.0002 | 0.0047 | 0.0018 |

13 | −0.0005 | −0.0056 | −0.0057 | −0.0069 | 0.0017 | 0.0009 | 0.0001 | 0.0046 | 0.0020 |

14 | −0.0006 | −0.0056 | −0.0057 | −0.0072 | 0.0016 | 0.0011 | 0.0001 | 0.0046 | 0.0024 |

15 | −0.0006 | −0.0056 | −0.0057 | −0.0075 | 0.0017 | 0.0009 | 0.0000 | 0.0046 | 0.0028 |

16 | −0.0003 | −0.0058 | −0.0058 | −0.0075 | 0.0017 | 0.0009 | 0.0000 | 0.0048 | 0.0028 |

17 | −0.0003 | −0.0057 | −0.0058 | −0.0074 | 0.0018 | 0.0010 | 0.0000 | 0.0048 | 0.0029 |

18 | −0.0004 | −0.0058 | −0.0058 | −0.0077 | 0.0015 | 0.0013 | −0.0000 | 0.0048 | 0.0033 |

19 | −0.0004 | −0.0057 | −0.0058 | −0.0080 | 0.0018 | 0.0010 | −0.0001 | 0.0048 | 0.0037 |

20 | −0.0005 | −0.0056 | −0.0057 | −0.0081 | 0.0019 | 0.0010 | 0.0001 | 0.0046 | 0.0037 |

21 | −0.0005 | −0.0056 | −0.0057 | −0.0074 | 0.0022 | 0.0013 | −0.0000 | 0.0045 | 0.0037 |

22 | −0.0006 | −0.0057 | −0.0056 | −0.0067 | 0.0026 | 0.0010 | −0.0007 | 0.0045 | 0.0040 |

23 | −0.0006 | −0.0057 | −0.0056 | −0.0066 | 0.0025 | 0.0010 | −0.0015 | 0.0045 | 0.0042 |

24 | −0.0003 | −0.0058 | −0.0057 | −0.0065 | 0.0025 | 0.0010 | −0.0015 | 0.0047 | 0.0041 |

*Note*. Error values were calculated as CRN output minus expected output and are shown to four decimal places. These values demonstrate that the learning calculations carried out by our CRN design produce results very close to those expected based on a direct mathematical calculation of the expected system behavior.

## 5 Discussion

### 5.1 CRN Learning Performance

The results presented in Table 2 demonstrate that our learning CRN design qualitatively implements the backpropagation learning algorithm for our leaky ReLU neural network with relatively small errors. However, although the errors are small, they are nonzero, which means there is some potential room for improvement in our designs. One posssible source of error is that the various reactions in our CRN design are scheduled via a molecular clock; though the off-cycle clock phases are low in the times between their correct cycles, they are not zero, which means that the reactions catalyzed by them are still happening in our model, albeit with very low fluxes through them. This introduces small errors into the computation that cannot be avoided if we wish to use an autonomous clock of this form to organize the reactions, as in previous work (Arredondo & Lakin, 2022; Vasić, Soloveichik, & Khurshid, 2020a). An additional potential source of inaccuracy is the fact that our leaky ReLU calculations have some inherent error around the discontinuity in the derivative when the input equals zero. This is because the species representing the input signal catalyzes copying of the species representing the gradient values into the corresponding species required for computing the partial loss derivatives during the backpropagation phase; this process might fail to complete before the associated clock cycle ends if the input value is very close to zero. This could be alleviated by increasing the value of the kCopy rate constant.

### 5.2 Learning CRN Design Considerations

The design of our CRN and neural network reflects certain limitations of the CRN formalism and the ability to train within that formalism. In particular, the basic arithmetic operations that our system must perform (addition, subtraction, and multiplication) can all be implemented straightforwardly in an abstract CRN using established techniques (Buisman et al., 2009). We use the molecular clock methodology to sequence reactions into discrete phases in large part because this makes it simpler to calculate the discontinuous derivatives of the leaky ReLU units by simply copying the appropriate value into a specific output species, before using that value for computation in a subsequent step. Other approaches, such as rate-independent calculations of the nonleaky ReLU function (Vasić, Soloveichik, & Khurshid, 2020b), are valid only for integer-valued signal representations in stochastic models; furthermore, as has been established, that nonlinearity suffers from the “dying ReLU” problem in that neurons can no longer learn if their input goes negative and the derivative is zero. Note also that in previous work (Arredondo & Lakin, 2022), we used a tanh nonlinearity and trained the system via weight perturbation rather than backpropagation; that was done because our approach to calculating the tanh(*x*) function could not be straightforwardly applied to compute its derivative, which is 1 − tanh^{2}(*x*), because the second derivative at *x* = 0 is already zero. By not using a sigmoidal nonlinearity here, we avoid that problem, but there is the alternative possibility of unbounded output signals being produced by our network, which can cause training to fail. Thus the approach taken here is shaped by the desire to produce a system that can be trained within the chemistry using relatively well-known approaches, while still using a transfer function that is of practical relevance.

### 5.3 Future Directions for Work on Learning CRNs

Future work on learning CRNs such as ours could take a number of directions. For example, the approach taken here of using sequential phases to execute the various parts of the calculation could be adapted to these other approaches with discontinuities in their derivatives. Previous work (Linder et al., 2021) also used the “hardtanh” transfer function (a piecewise linear approximation to tanh); this could be amenable to our approach. However, that system suffers from some of the same issues as the nonleaky ReLU function due to having gradients of zero in both the positive and negative directions. In addition, the use of complex CRN designs such as those presented here to implement these learning systems would make them extremely challenging to realize in an actual wet chemistry experiment. Some of these challenges could be ameliorated in future iterations of our learning CRN design, for example, by trying to reduce the number of distinct clock phases required to drive the system, or perhaps by doing away with the autonomous molecular clock altogether and using manually controlled signals to regulate the operation of the circuit.

Ideally, chemical learning systems would leverage the massive parallelism inherent in chemistry to advantage, and possible approaches might eventually move beyond feedforward networks to recurrent networks. Related work on chemical Boltzmann machines (Poole et al., 2017) and message-passing inference algorithms (Napp & Adams, 2013) could offer inspiration here, as could work on chemical implementations of reservoir computing systems (Goudarzi et al., 2013), which use a highly recurrent fixed network attached to a linear readout layer that could be trained relatively straightforwardly.

Given the practical difficulties with building molecular computing networks in the laboratory, the development of adaptive and trainable molecular systems is an important goal for the field of molecular programming, not least because it would enable a “build now, train later” approach that could greatly simplify the development and deployment of molecular circuits with specific autonomous behaviors for practical tasks, such as the diagnosis of disease (Lopez et al., 2018; C. Zhang et al., 2020). Finally, molecular learning systems could find applications in Artificial Life, as they could serve as adaptive control programs for engineered lifelike systems and synthetic cells.

Thus a major challenge in the field of chemical learning networks is to find a formalism that both fits naturally with the CRN framework and is also tractable to simulate and, ultimately, build in the laboratory. The latter criterion would imply that the system would need to be very simple; it could well be that learning systems that go beyond supervised learning to embrace alternative frameworks, like reinforcement learning (Banda et al., 2014) or associative learning (Fernando et al., 2009), could hold the answer.

With regard to our particular design, the presence of trimolecular reactions is a possible sticking point for future experimental implementations; however, we note that CRN implementation frameworks such as DNA strand displacement reactions (Soloveichik et al., 2010; D. Y. Zhang & Seelig, 2011) are theoretically capable of realizing such systems via their multistep encodings. However, it is worth noting that the third reactant in the trimolecular reactions in our scheme is always a clock catalyst; this could make it easier to realize in an experimental system via alternative means, such as the presence or absence of some nonchemical signal that controls whether the reactions may proceed, such as an optical signal (Prokup et al., 2012). Furthermore, given that our design relies on competition for input species to drive the computation, rate constants do need to be well balanced to ensure that the results are accurate; this is a fundamental difficulty in analog computation. Recent work on rate-independent chemical implementations of ReLU networks (Vasić et al., 2022) offers a possible way forward here; alternatively, for an experimental implementation using DNA strand displacement, models now exist to predict strand displacement kinetics based on sequence (J. X. Zhang et al., 2021). Experimentalists are also able to tune the concentrations of other components as necessary to recover the desired kinetic balance (Thubagere, Thachuk, et al., 2017).

### 5.4 Future Directions for Work on the ProBioSim Simulator

Future work to further develop the ProBioSim simulator may include performance optimizations as well as the implementation of alternative simulation algorithms, such as tau-leaping (Gillespie, 2001), or the implementation of additional rate laws, such as Michaelis–Menten kinetics. To enhance the expressiveness of the system, the format of the perturbation functions outlined herein could be generalized to modify other simulation parameters. In particular, the ability to perturb rate constant values would allow ProBioSim to straightforwardly model physical changes in the system, such as changes in temperature or pH (Idili et al., 2014) or the activation of optical control signals (Prokup et al., 2012). These are of interest because they could be used to modulate the behavior of an experimental implementation of the learning CRNs outlined earlier. The simulator could also be modified to explicitly model dilution effects when species are added to perturb the system, as these are *not* currently modeled explicitly. Our current assumption is reasonable if one makes the assumption that the volume of new sample added during the perturbation is negligible compared to the overall reaction volume. However, it is worth noting that the flexibility of our perturbation function approach could already be used to model these effects; any changes in this regard would be purely for convenience. One could also imagine implementing triggers for events based on state-based predicates, such as the counts or concentrations of various species. These could be implemented (to some extent) within the perturbation function framework, although such conditional perturbations could only possibly be applied at the prespecified times at which the perturbation function is applied. Finally, the ProBioSim simulator could in principle be integrated with techniques for parameter estimation to infer rate constants and other model parameters based on experimental data, although this would only be of real use if such data were available: this is not yet the case for experimental implementations of learning in chemical reaction networks.

## 6 Conclusions

In conclusion, we have reported a design for novel abstract CRNs that can implement supervised learning via the standard backpropagation training algorithm. This CRN design directly implements the mathematics of backpropagation for a class of nonlinear neurons using the leaky ReLU nonlinear transfer function. We have demonstrated the capabilities of this system to carry out learning tasks by training it to learn the two-input XOR logic function, a function that is not linearly separable and thus provably requires the use of a multilayer network. In addition, we have reported ProBioSim, a simulator for CRNs that incorporates the novel capability to define perturbations to the concentrations (or counts) of species as the results of arbitrary pieces of code expressed in the host programming language, Python. This enables the straightforward modeling of nontrivial environmental responses to the behavior of the CRN component of the system, such as our training protocols. These would otherwise require an ad hoc infrastructure for calculating and implementing these perturbations to be constructed around the underlying simulator. Our approach allows such calculations to be integrated directly into the simulation engine via a straightforward mechanism, enabling facile programming of complex interdependencies between an abstract CRN and the simulated training environment. ProBioSim is released under an open source license, and the source code is available online at https://github.com/matthewlakin/ProBioSim/.

## Acknowledgments

This material is based on work supported by the National Science Foundation under grants 1518861 and 1935087.

## References

*Stentor*: A response-dependent process