## Abstract

Recently a new so-called energy complexity measure has been introduced and studied for feedforward perceptron networks. This measure is inspired by the fact that biological neurons require more energy to transmit a spike than not to fire, and the activity of neurons in the brain is quite sparse, with only about 1% of neurons firing. In this letter, we investigate the energy complexity of recurrent networks, which counts the number of active neurons at any time instant of a computation. We prove that any deterministic finite automaton with m states can be simulated by a neural network of optimal size with the time overhead of per one input bit, using the energy O(e), for any e such that and e=O(s), which shows the time-energy trade-off in recurrent networks. In addition, for the time overhead satisfying , we obtain the lower bound of on the energy of such a simulation for some constant c>0 and for infinitely many s.

## 1.  Introduction

In biological neural networks, the energy cost of a firing neuron is relatively high, while energy supplied to the brain is limited and hence the activity of neurons in the brain is quite sparse, with only about 1% of neurons firing (Lennie, 2003).1 This is in contrast to artificial neural networks in which, on average, every second unit fires during a computation. This fact has recently motivated the definition of a new complexity measure for feedforward perceptron networks (threshold circuits), the so-called energy complexity (Uchizawa, Douglas, & Maass, 2006), which is the maximum number of units in the network that output 1, taken over all the inputs to the circuit. Energy complexity has been shown to be closely related by trade-off results to other complexity measures such as network size (i.e., the number of neurons) (Uchizawa & Takimoto, 2008; Uchizawa, Takimoto, & Nishizeki, 2011), circuit depth (i.e., parallel computational time) (Uchizawa, Nishizeki, & Takimoto, 2010; Uchizawa & Takimoto, 2008), and fan-in (i.e., the maximum number of inputs to a single unit) (Suzuki, Uchizawa, & Zhou, 2011). In addition, energy complexity has found its use in circuit complexity, for example, as a tool for proving the lower bounds (Uchizawa & Takimoto, 2011).

In this letter, we investigate, for the first time, the energy complexity of recurrent neural networks, which we define to be the maximum number of neurons outputting 1 at any time instant, taken over all possible computations. It has been known for a long time that the computational power of binary-state recurrent networks corresponds to that of finite automata since the network of size s units can reach only a finite number (at most 2s) of different states (Šíma & Orponen, 2003). A simple way of simulating a given deterministic finite automaton A with m states by a neural network N of size O(m) is to implement each of the 2m transitions of A (having 0 and 1 transitions for each state) by a single unit in N, which checks whether the input bit agrees with the respective type of transition (Minsky, 1967). Clearly this simple linear-size implementation of finite automata requires only a constant energy.

Much effort had been given to reducing the size of neural automata (e.g., Alon, Dewdney, & Ott, 1991; Horne & Hush, 1996; Indyk, 1995; Šíma & Wiedermann, 1998) and, indeed, neural networks of size implementing a given deterministic finite automaton with m states were proposed and proven to be size optimal (Horne et al., 1996; Indyk, 1995). A natural question arises: What is the energy consumption when simulating finite automata by optimal-size neural networks? We answer this question by proving the trade-off between the energy and the time overhead of the simulation. In particular, we prove that an optimal-size neural network of units can be constructed to simulate a deterministic finite automaton with m states using the energy O(e) for any function e such that and e=O(s), while the time overhead for processing one input bit is . For this purpose, we adapt the asymptotically optimal method of threshold circuit synthesis due to Lupanov (1973).

In addition, we derive lower bounds on the energy consumption e of a neural network of size s simulating a finite automaton within the time overhead per one input bit, by using the technique due to Uchizawa and Takimoto (2008), which is based on communication complexity (Kushilevitz & Nisan, 1997). In particular, for less than sublogarithmic time overhead satisfying , we obtain the lower bound , which implies for some constant c>0 and for infinitely many s. For example, this means that for time overhead , the energy of any simulation must fulfill for some constant such that , and for infinitely many s, which can be compared to the energy e=O(s) consumed by our simulation. For where , any simulation requires for any , while is sufficient for our implementation.

This letter is organized as follows. After a brief review of the basic definitions in section 2, the main result concerning a low-energy simulation of finite automata by neural nets is formulated in section 3, including the basic ideas of the proof. The subsequent two sections are devoted to the technical details of the proof: section 4 deals with a decomposition of the transition function, and section 5 describes the construction of low-energy neural automata. The lower bounds on the energy consumption of such neural automata are derived and compared to the respective upper bounds in section 6. A concluding summary is given in section 7. A preliminary version appeared as an extended abstract (Šíma, 2013).

## 2.  Neural Networks as Finite Automata

We first specify the model of an (artificial) neural network N. The network consists of s units (neurons, threshold gates), indexed as , where s is called the network size. The units are connected into a directed graph representing the architecture of N, in which each edge (i, j) leading from unit i to j is labeled with an integer weight w(i, j). The absence of a connection within the architecture corresponds to a zero weight between the respective neurons, and vice versa.

In contrast to general recurrent networks, which have cyclic architectures, the architecture of a feedforward network (or a so-called threshold circuit) is an acyclic graph. Hence, units in a feedforward network can be grouped in a unique minimal way into a sequence of d+1 pairwise disjoint layers so that neurons in any layer are connected only to neurons in subsequent layers , u>t. Usually the zeroth, or input layer , consists of external inputs and is not counted in the number of layers and in the network size. The last, or output layer , is composed of output neurons. The number of layers d excluding the input one is called the depth of threshold circuit.

The computational dynamics of (not necessarily feedforward) network N determines for each unit its binary state (output) at discrete time instants . We say that neuron j is active (fires) at time t if y(t)j=1, while j is passive for y(t)j=0. This establishes the network state at each discrete time instant . At the beginning of a computation, the neural network N is placed in an initial state y(0), which may also include an external input. At discrete time instant , an excitation of any neuron is defined as
2.1
including an integer threshold h(j) local to unit j. At the next instant, t+1, the neurons from a selected subset update their states in parallel by applying the Heaviside function , which is defined as
2.2
The remaining units do not change their outputs, that is, y(t+1)j=y(t)j for . In this way, the new network state y(t+1) at time t+1 is determined.

Without loss of efficiency (Orponen, 1997), we implicitly assume synchronous computations. Thus, the sets , which define the computational dynamics of N, are predestined deterministically for each time instant t (e.g., for any means fully parallel synchronous updates). Note that computations in feedforward networks proceed layer by layer from the input layer up to the output one (i.e., sets naturally coincide with layers), which implement Boolean functions. We define the energy complexity of N to be the maximum number of active units at any time instant , taken over all the computations of N.

The computational power of recurrent neural networks has been studied analogously to that of the traditional models of computations so that the networks are exploited as acceptors of formal languages over the binary alphabet. For the finite networks that are to recognize regular languages, the following input-output protocol has been used (Alon et al., 1991; Horne et al., 1996; Indyk, 1995; Siegelmann & Sontag, 1995; Šíma & Wiedermann, 1998; Šíma & Orponen, 2003). A binary input word (string) of arbitrary length is sequentially presented to the network bit by bit via an input neuron . The state of this unit is externally set (and clamped) to the respective input bits at prescribed time instants regardless of any influence from the remaining neurons in the network, that is,
2.3
for where an integer parameter is the period or time overhead for processing a single input bit. Then an output neuron signals at time whether the input word belongs to underlying language L, that is,
2.4

As usual, we will describe the limiting behavior (rate of growth) of functions when the argument tends toward infinity in terms of simpler functions by using Landau or big O notation. Recall that for functions and defined for all natural numbers, notations g=O(f) and mean that for some real constant c>0 and for all but finitely many natural numbers n, and , respectively. In addition, if g=O(f) and simultaneously. Similarly, g=O(f) denotes that for every real constant c>0 and for all but finitely many natural numbers n, , while means that for some real constant c>0 and for infinitely many natural numbers n, . Clearly, g=O(f) iff iff .

## 3.  Low-Energy Optimal-Size Neural Finite Automata

Now we can formulate our main result concerning a low-energy implementation of finite automata by optimal-size neural nets:

Theorem 1.

A given deterministic finite automaton A with m states can be simulated by a neural network N of optimal size neurons with time overhead per one input bit, using the energy O(e), where e is any function satisfying and e=O(s).

Proof.

We first outline the main ideas of the proof; the following two sections provide the detailed argument. Because we are interested in asymptotic analysis, we hereafter assume m to be sufficiently large. A set Q of m states of a given deterministic finite automaton A can be arbitrarily enumerated so that each is binary encoded using bits, including one additional (e.g., the pth) bit, which indicates the final states (i.e., its value is 1 just for the final states of A). Then the respective transition function of automaton A, producing its new state from the old state and current input bit , can be viewed as a vector Boolean function in terms of binary encoding of automaton states.

Furthermore, “transition” function f is implemented by a four-layer neural network C of asymptotically optimal size using the method of threshold circuit synthesis due to Lupanov (1973). Feedforward network C implementing the transition function of A can then simply be transformed to a recurrent neural network N simulating A by adding the recurrent connections from the fourth layer to the first one (in fact, the fourth output layer of C is identified with the zeroth input layer in N), which replace the code of the old state of A by the new one. Using this approach, we can implement a finite automaton by an optimal-size neural net (Horne et al., 1996).

Unfortunately, the second layer of C in Lupanov's construction (Lupanov, 1973) contains neurons, half of which fire for any input to C, which results in an unacceptably high energy consumption . In order to achieve a low-energy implementation of A, this layer of neurons is properly partitioned into O(s/e) blocks of O(e) units each. Then so-called control units are introduced that ensure that these blocks are updated successively one by one so that the energy consumption (i.e., the maximum number of simultaneously active neurons) of O(e) is guaranteed while the time overhead for processing a single input bit increases to O(s/e). In addition, the results of computation of particular blocks must somehow be preserved using the available energy.

In particular, we decompose transition function f in section 4 using the method of threshold circuit synthesis (Lupanov, 1973) so that f can be implemented by a four-layer threshold circuit C of asymptotically optimal size. This decomposition is then used in section 5 for constructing a low-energy recurrent neural network N that incorporates circuit C and thus simulates finite automaton A.

## 4.  The Transition Function Decomposition

In this section, we employ the asymptotically optimal method of threshold circuit synthesis due to Lupanov (1973) for a decomposition of function , which implements the transition function of finite automaton A in terms of binary encoding of its states. This decomposition allows f to be evaluated by a four-layer threshold circuit C of asymptotically optimal size that will be incorporated (in section 5) into a low-energy recurrent neural network N simulating A. In particular, the resulting formula, 4.14, for function f is derived below and will be exploited for computing f using four layers of perceptron units as it is summarized at the beginning of section 5. Unlike the result by Horne et al. (1996), the decomposition technicalities are needed for minimizing the energy demands when A is being implemented by N. Therefore, in the rest of this section, we completely reformulate the method due to Lupanov (1973) in a detailed and compact form, which is simplified and adopted for our future use.2

The p+1 arguments of vector function f(u, v, z) are split into three groups , , and , respectively, where
4.1
4.2
4.3
Parameters p1, p2, p3 are chosen so that the resulting circuit for f will have the asymptotically optimal size (see section 5.6). Then each function element () of vector function is decomposed to
4.4
where the respective literals are defined as
4.5
Furthermore, we define vector functions for as
4.6
where denotes an n-bit binary representation of integer , that is, . Note that we use for denoting the inverse value to [j]n, e.g., . The vector produced by gk in equation 4.6 has p1 elements out of which the first items are defined using fk for all possible values of argument , while the remaining ones are 0s, which is a correct definition since for sufficiently large p according to formulas 4.1 and 4.2.

In the following lemma, we further decompose functions gk (). For this purpose, we split the first part of gk’s argument into its first bit and the remaining bits where r=p1−1. Thus, we will use notation as an alternative to gk(u, v) introduced in equation 4.6 when the first bit of gk’s argument needs to be specified explicitly (similarly for functions f, fk, and so on).

Lemma 1.

For each gk (), one can construct four vector functions, and , for satisfying the following two conditions:

1. For any , , and ,
wheredenotes a bitwise parity.
2. Functionsgak, hakare injective in the first vector argument, that is, for any, , and,

Recall that the parity used in equation 4.7 is defined for vectors , , and as
4.9
(i.e., zi=1 iff ) for every , which is an associative operation.
Proof.
For any , the function values gak([i]r, v) are defined inductively for as gak([i]r, v) is chosen arbitrarily from where
4.10
and functions hak are defined so that equation 4.7 is met:
4.11

Note that and according to definition 4.10, which implies . Hence, for any , which ensures that is correctly defined for all the values of argument . Moreover, condition 4.8 is satisfied because for any such that i>j, definition 4.10 secures and by using condition 4.11 and the fact that . This completes the proof of lemma 1.

We further rewrite gak and hak from lemma 1 by using the functions and so that
4.12
respectively, which for any , , and , satisfy
4.13
according to condition 4.8. Now we can plug formulas 4.6, 4.7, 4.9, and 4.12 into equation 4.4, which results in the final f’s decomposition,
4.14
where (x)i denotes the ith element of vector x.

Formula 4.14 can be used for implementing a four-layer circuit C, which computes the transition function f of finite automaton A using the asymptotically optimal number of threshold gates. The details of such an implementation are presented in section 5, where this circuit C will be incorporated into a low-energy recurrent neural network N simulating A.

## 5.  The Finite Automaton Implementation

In this section, we introduce the construction of a low-energy recurrent neural network N simulating a given finite automaton A. In particular, a set of neurons V is composed of four disjoint layers that mainly evaluate the transition function f according to its decomposition, equation 4.14, as follows.

The zeroth layer is composed of p+1 units storing an input bit and a current state of A, which are split according to f’s arguments (see section 5.1). The gates in the first layer compute all possible monomials for over variables v (see section 5.2), which are used for evaluating functions and in the second layer (see section 5.3). The third layer computes conjunctions and from formula 4.14, while their disjunction for is evaluated for each fk in the fourth layer, which coincides with zeroth layer as the old state of A is replaced by the new one (see section 5.5).

In addition, so-called control units are introduced in layers and for minimizing the energy demands of N (see section 5.4). The global computational dynamics of N, together with its energy complexity, is specified in section 5.6. The network is schematically depicted in Figure 1, where the layers or their parts are indicated and only a few representative units and connections are drawn. The directed edges connecting units are labeled with the corresponding weights, whereas the edges drawn without an originating unit correspond to the threshold parameters.

Figure 1:

A schema of neural network N implementing finite automaton A.

Figure 1:

A schema of neural network N implementing finite automaton A.

### 5.1.  Layer ν0 Stores an Input Bit and Automaton's State.

An input bit and a current state of A are stored using p+1 neurons, which constitute layer . Thus, set includes the input neuron and the output neuron , which stores the bit (in the state encoding) that indicates the final states. We will implement formula 4.14 in N for evaluating the transition function f in terms of binary encoding of states in order to compute the new state of A. For this purpose, layer is disjointly split into four parts corresponding to the partition of arguments of , respectively, that is, the input neuron in represents the first variable a as yin=a, , , and , where neuron corresponds to the output unit out as the last bit of state encoding has been chosen to indicate the final states of A. Note that we identify the names of neurons in with the arguments of f encoding the automaton's state so that these variables describe the states of corresponding units (e.g., ). For notational simplicity, we often omit the time index in the states of neurons (e.g., we write yin=a instead of y(t)in=a), and we implicitly assume step-by-step timing, which follows the directed connections among neurons in the natural order as they are introduced when describing the implementation of f in N. The precise timing will be formalized within the global computational dynamics of N simulating A in section 5.6.

### 5.2.  Layer ν1 Computes Monomials ∧i=1p2ℓ bi(vi).

The next layer, , consists of neurons in for computing all possible monomials for over variables v, and two so-called control units in that indicate the input bit value. These monomials will be used in section 5.3 for evaluating functions and in formula 4.14. Thus, we introduce weights (i.e., for bi=1 whereas for bi=0) for , and threshold , for any so that the following lemma trivially holds:

Lemma 2.

Unitfires for inputiffb=v.

In addition, we define , and , , which ensures that fires just one computational step after the current input bit is presented to in iff yin=1, and similarly, is active iff yin=0 (see lemma 6).

### 5.3.  Layer ν2 Computes ϕ ka(u′, v) and ψka(u′, v).

Furthermore, layer where , and with , serves for a low-energy computation of functions and for any and .

In this section, we first show how to implement functions for any and with no constraints on energy by using the outputs of neurons from and , which provide the values of argument (see section 5.1) and monomials for all over variables v (see section 5.2), respectively. In particular, pairs of neurons for are employed for this purpose having zero thresholds for now (their thresholds are defined in section 5.4 for the low-energy implementation) and weights for and such that for . Note that is uniquely defined according to condition 4.13. For this definition of weights, the following lemma shows how the outputs of for represent the function value :

Lemma 3.

For any inputand, at least one unit from the pair fires for each . In addition, both unitsandfire simultaneously iff.

Proof.
It follows that for given and , neuron fires iff
5.1
since iff b=v according to lemma 2. Similarly, neuron is active iff . Clearly, either or and, hence, at least one unit from the pair is always active, while both neurons fire at the same time iff iff , which implements function . This ends the proof of lemma 3.

Functions for any and are implemented in an analogous way (replace by above) using pairs of neurons for so that the following lemma holds (compare with lemma 3):

Lemma 4.

For any inputand, at least one unit from the pairfires for each. In addition, both unitsandfire simultaneously iff.

### 5.4.  Low-Energy Implementation of ϕka(u′, v) and ψka(u′, v).

We employ so-called control units for and , for synchronizing the computation of functions , by neurons from as described in section 5.3 so that their energy consumption is bounded by e+2.

For this purpose, we split set into two parts of size according to , and both parts are further partitioned into d blocks, each of size at most 2e, that is, , where . In addition, we require that any pair , respectively, , is included into one block, which means for each , if , then , and if , then . Furthermore, all neurons whose thresholds were originally assumed to be zero (see section 5.3) are now blocked by large thresholds h(j)=W where W=2r, which make them passive. Then, for any and , the neurons in block are released to fire by control unit using the weights for all . Thus, if control unit fires, then threshold h(j)=W of unit is canceled by weight and neuron j takes part in the computation of and according to lemmas 3 and 4, respectively, as described in section 5.3. This is summarized in the following lemma:

Lemma 5.

For anyand, neurons incan fire (according to lemmas 3 and 4) iff control unitis active.

For current input bit , the control units fire successively one by one, which is achieved by weights for , for , and thresholds for , as the following lemma formally proves:

Lemma 6.

For everyand for current input bit, which is first presented to network N at time instant (i.e., ), whereasfor anysuch that.

Proof.

The argument for proceeds by induction on . For i=0, we know from section 5.2 that for (all the control units are assumed to be passive at time instant t0). For i>0, the excitation of unit at time instant t0+i can be evaluated as because by induction hypothesis. Hence, . It follows that for since and , which makes and keeps neuron () passive after fires at time instant t0+i+1, that is, for any such that . This concludes the proof for lemma 6.

Lemmas 5 and 6 ensure that only the neurons from one block of size at most 2e can fire simultaneously. Note that the respective units from , which were released by active control unit , fire at the same time as the next control unit becomes active. In fact, we know from lemmas 3 and 4 that just one unit of each pair or is active except for the special pairs of both firing units and such that and , respectively. Hence, the energy consumption of is bounded by e+2.

In addition, we must also guarantee that the resulting function values , are preserved, that is, neurons remain active without any support from corresponding control units until all the blocks perform computations and are blocked, which is indicated by control unit . This is implemented by symmetric weights for , , . Note that in the case when only one unit from the pair (similarly for ) is active, say, fires and is passive, the introduced weight does not cause the other unit to fire, although this weight cancels its threshold since is passive for zero threshold anyway. Moreover, neuron eventually resets all neurons in before becoming itself passive, which is accomplished by weights for all and .

### 5.5.  Layers ν3 and ν0 Evaluate f.

Finally, layer is composed of pairs of neurons for each , which evaluate conjunctions and from formula 4.14, respectively, for current input . For this purpose, the states of neurons from , which store the values of argument z, and the outputs of units for , which represent the function values according to lemmas 3 and 4 after fires, are used. For , we define weights for , , and for , and thresholds . The correctness of this definition is shown in the following lemma:

Lemma 7.

For any values of f’s arguments, for every, and, unitfires iffand neuronis active iff.

Proof.

We know that for yin=a and each , only one pair of neurons for fires such that by lemma 3, and only one pair of units for is active such that according to lemma 4, while the remaining units in v21 are passive (blocked) after fires, which follows from lemmas 5 and 6. Hence, neuron is active iff and , and for every , when active neurons contribute to excitation by 2, while the contribution of units to reaches , which altogether equals the threshold (compare with lemma 2). Analogously, neuron fires iff and , and for every . This concludes the proof of lemma 7.

It follows from lemma 7 that for any , at most one unit among over is active, which determines the value of according to formula 4.14, as described in the following lemma:

Lemma 8.

For any valuesof f’s arguments and for every, iff eitherorfires. In addition, the remaining unitsforare passive.

Thus, a binary encoding of the new state of automaton A is computed as disjunctions 4.14 over for by units from (which rewrite the code of the old state of A) using the recurrent connections leading from neurons of . After reindexing the units in layer properly, for each , the kth disjunction is implemented by weights for every , and threshold h(k)=1, according to lemma 8.

### 5.6.  Computational Dynamics and Complexity of N.

Now we specify the computational dynamics of neural network N simulating the finite automaton A. At the beginning, the states of neurons from are placed in an initial state of A. Each bit xi () of input word , which is read by input neuron at time instant (i.e., ), is being processed by N within the desired period of time steps. The states of neurons in N are successively updated in the order following the architecture of layers. Thus, we define sets of units updated at time instants as , for , , and , for . Eventually the output neuron signals at time instant whether input word x belongs to underlying language L, that is, iff .

The size of N simulating the finite automaton A with m states can be expressed as in terms of m according to formulas 4.1 to 4.3, which matches the known lower bound (Horne et al., 1996; Indyk, 1995). Finally, energy consumption can be bounded for particular layers as follows. Layer can possibly require all p+1 units to fire for storing the binary encoding of a current automaton's state (see section 5.1). Moreover, there is only one active unit among neurons in that serve for evaluating all possible monomials over variables v according to lemma 2, and also only one control unit from fires at one time instant by lemma 6. In addition, we know that the energy consumption by is at most e+2 (see section 5.4), and at most p neurons among from fire (one for each ) according to lemma 8. Altogether, the global energy consumption of N is bounded by e+2p+5=O(e+log s)=O(e) as is assumed. This completes the proof of theorem 1.

## 6.  The Lower Bound

In this section, we will show lower bounds on the energy complexity of neural networks implementing finite automata. For this purpose, we will employ the technique due to Uchizawa and Takimoto (2008), which is based on communication complexity (Kushilevitz & Nisan, 1997). Assume that is a Boolean function whose value f(x, y) has to be computed by two players with unlimited computational power, each receiving only his or her part of the input and , respectively, while they wish to exchange with each other the least possible number of bits. In particular, they communicate according to a randomized protocol, additionally making use of the same public random bit string. For any error probability satisfying , the communication complexity of function f is defined to be the maximum number of bits needed to be exchanged for the best randomized protocol to make the two players compute the correct value of f(x, y) with probability at least , for every input assignment x and y.

It is well known (Kushilevitz & Nisan, 1997) that almost all Boolean functions f of 2n variables have a large communication complexity,
6.1
for any error probability such that . An example of a particular function that meets condition 6.1 is the Boolean inner product , defined as
6.2

Uchizawa and Takimoto (2008) proved the upper bound on the communication complexity of Boolean function f in terms of the size, depth, and energy complexity of a feedforward network computing f:

Theorem 2
(Uchizawa & Takimoto, 2008). If a Boolean functioncan be computed by a threshold circuit of size S, depth d, and energy complexity E, then
6.3
for error probability
6.4

The lower and upper bounds on the communication complexity, equations 6.1 and 6.3, respectively, are put together in the following lemma:

Lemma 9.

Let be a Boolean function of 2n variables whose communication complexity satisfies condition 6.1, which can be computed by a threshold circuit of size S, depth d, and energy complexity E such that n=O(S) and d=O(E). Then n=O(Ed+1log S).

Proof.
It follows from condition 6.1 applied to formula 6.4 that there is a constant such that
6.5
for sufficiently large n. On the other hand, formula 6.3, together with n=O(S) and d=O(E), gives
6.6
for some constant cu>0, as the term (1+1/E)E+1 is bounded. Putting inequalities 6.5 and 6.6 together, we get
6.7
which implies n=O(Ed+1log S).

Now we will formulate the result providing the lower bound (for some constant c>0 and for infinitely many s) on the energy complexity e of a recurrent neural network of size s neurons implementing a given finite automaton with time overhead such that . This means the lower bound is valid for less than sublogarithmic time overheads:

Theorem 3.

Let . There exists a neural network of size s neurons simulating a finite automaton with time overhead per one input bit that needs energy e such that .

Proof.

Let N be a neural network of size s neurons simulating a finite automaton A with time overhead per one input bit. The states of A are represented by the 2s−1 states of N (excluding the input neuron ), and the transition function of A is computed by N within time steps. Clearly network N can be “unwound” into a threshold circuit C of depth and size , which implements the transition function of A so that each layer is a copy of N (Savage, 1972). Thus, the states of neurons in the ith layer of C coincide with the network state for , when the new state of A is produced from the old one , including the current input bit. Hence, the energy complexity of C is a multiple of the energy consumed by N, that is, .

As component (for ) of the transition function defining A can be arbitrary, there is a neural network N simulating A such that fk implemented by C has large communication complexity satisfying condition 6.1. Moreover, and , which meets the remaining assumptions of lemma 9. It follows that
6.8
according to lemma 9. On the contrary, suppose that
6.9
We will prove that , which contradicts equation 6.8. For this purpose, it suffices to show that . This can be rewritten as , which follows from the assumption of the theorem and equation 6.9, completing the argument.

In the following corollary, we present the lower bounds on energy complexity in terms of the network size for selected cases of sublogarithmic time overhead.

Corollary 1.

1. If, thenfor somesuch thatand for infinitely many s.

2. If, thenfor anysuch that.

3. Iffor some, thenfor anysuch that.

Proof.

1. For , the assumption trivially holds and the proposition follows straightforwardly from theorem 3.

2. For , there is cu>0 such that for all but finitely many s, we have . According to theorem 3, there is such that for infinitely many s. On the contrary, suppose that for some satisfying , which implies , leading to a contradiction .

3. If for some , then there is cu>0 such that for all but finitely many s, we have . According to theorem 3, there is such that for infinitely many s. On the contrary, suppose that for some satisfying , which implies leading to a contradiction .

We can compare the lower bounds on energy complexity of simulating the finite automata by neural nets presented in corollary 1 to the respective upper bounds provided by theorem 1. For the constant time overhead , the construction from theorem 1 achieves the energy consumption of e=O(s), while any simulation requires energy for some constant such that and for infinitely many s, according to corollary 1. Similarly, for the time overhead of where , we have the upper bound of , which compares to the lower bound of . Clearly, there are still gaps between these lower and upper bounds, respectively, which need to be eliminated.

## 7.  Conclusions

We have, for the first time, applied the energy complexity measure to recurrent neural nets. This measure has recently been introduced and studied for feedforward perceptron networks. The binary-state recurrent neural networks recognize exactly the regular languages, so we have investigated their energy consumption of simulating the finite automata with the asymptotically optimal number of neurons. We have presented a low-energy implementation of finite automata by optimal-size neural nets with the trade-off between the time overhead for processing one input bit and the energy varying from the logarithm to the full network size. We have also achieved lower bounds for the energy consumption of neural finite automata, which are valid for less than sublogarithmic time overheads and are still not tight. An open problem remains for further research whether these bounds can be improved. In addition, we have so far assumed the worst-case energy consumption while the average case analysis would be another challenge.

## Acknowledgments

The presentation of this letter benefited significantly from valuable comments of anonymous reviewers. My research was done with institutional support RVO: 67985807 and partially supported by grant P202/10/1333 of the Czech Science Foundation.

## References

Alon
,
N.
,
Dewdney
,
A. K.
, &
Ott
,
T. J.
(
1991
).
Efficient simulation of finite automata by neural nets
.
Journal of the ACM
,
14
(
2
),
495
514
.
Horne
,
B. G.
, &
Hush
,
D. R.
(
1996
).
Bounds on the complexity of recurrent neural network implementations of finite state machines
.
Neural Networks
,
9
(
2
),
243
252
.
Indyk
,
P.
(
1995
).
Optimal simulation of automata by neural nets
. In
E. W. Mayr & C. Puech
(Eds.),
Proceedings of the 12th Annual Symposium on Theoretical Aspects of Computer Science (STACS 1995), LNCS 900
(pp.
337
348
).
Berlin
:
Springer-Verlag
.
Kushilevitz
,
E.
, &
Nisan
,
N.
(
1997
).
Communication complexity
.
Cambridge
:
Cambridge University Press
.
Lennie
,
P.
(
2003
).
The cost of cortical computation
.
Current Biology
,
13
(
6
),
493
497
.
Lupanov
,
O.
(
1973
).
On the synthesis of threshold circuits
.
Problemy Kibernetiki
,
26
,
109
140
.
Minsky
,
M.
(
1967
).
Computations: Finite and infinite machines.
Englewood Cliffs, NJ
:
Prentice Hall
.
Orponen
,
P.
(
1997
).
Computing with truly asynchronous threshold logic networks
.
Theoretical Computer Science
,
174
(
1–2
),
123
136
.
Savage
,
J. E.
(
1972
).
Computational work and time on finite machines
.
Journal of the ACM
,
19
(
4
),
660
674
.
Siegelmann
,
H. T.
, &
Sontag
,
E. D.
(
1995
).
Computational power of neural networks
.
Journal of Computer System Science
,
50
(
1
),
132
150
.
Šíma
,
J.
(
2013
).
A low-energy implementation of finite automata by optimal-size neural nets
. In
V. Mladenov, P. D. Koprinkova-Hristova, G. Palm, A. E. P. Villa, B. Appollini, & N. Kasabov
(Eds.),
Proceedings of the 23rd International Conference on Artificial Neural Networks (ICANN 2013), LNCS 8131
(pp.
114
121
).
Berlin
:
Springer-Verlag
.
Šíma
,
J.
, &
Orponen
,
P.
(
2003
).
General-purpose computation with neural networks: A survey of complexity theoretic results
.
Neural Computation
,
15
(
12
),
2727
2778
.
Šíma
,
J.
, &
Wiedermann
,
J.
(
1998
).
Theory of neuromata
.
Journal of the ACM
,
45
(
1
),
155
178
.
Suzuki
,
A.
,
Uchizawa
,
K.
, &
Zhou
,
X.
(
2011
).
Energy and fan-in of threshold circuits computing mod functions
. In
M. Ogihara & J. Tarui
(Eds.),
Proceedings of the 8th Annual Conference on Theory and Applications of Models of Computation (TAMC 2011), LNCS 6648
(pp.
154
163
).
Berlin
:
Springer Verlag
.
Uchizawa
,
K.
,
Douglas
,
R.
, &
Maass
,
W.
(
2006
).
On the computational power of threshold circuits with sparse activity
.
Neural Computation
,
18
(
12
),
2994
3008
.
Uchizawa
,
K.
,
Nishizeki
,
T.
, &
Takimoto
,
E.
(
2010
).
Energy and depth of threshold circuits
.
Theoretical Computer Science
,
411
(
44–46
),
3938
3946
.
Uchizawa
,
K.
, &
Takimoto
,
E.
(
2008
).
Exponential lower bounds on the size of constant-depth threshold circuits with small energy complexity
.
Theoretical Computer Science
,
407
(
1–3
),
474
487
.
Uchizawa
,
K.
, &
Takimoto
,
E.
(
2011
).
Lower bounds for linear decision trees via an energy complexity argument
. In
F. Murlak & P. Sankowski
(Eds.),
Proceedings of the 36th International Symposium on Mathematical Foundations of Computer Science (MFCS 2011), LNCS 6907
(pp.
568
579
).
Berlin
:
Springer-Verlag
.
Uchizawa
,
K.
,
Takimoto
,
E.
, &
Nishizeki
,
T.
(
2011
).
Size-energy tradeoffs for unate circuits computing symmetric Boolean functions
.
Theoretical Computer Science
,
412
(
8–10
),
773
782
.

## Notes

1

The relatively small difference in the oxygen consumption of a used versus not used functional part of brain (e.g., visual cortex), which is documented by the fMRI studies, is caused by the fact that the corresponding neurons fire, although the respective region is not employed for its purpose.

2

To the best of our knowledge, the classic result due to Lupanov (1973) is described in full details only in his original Russian paper.