We present an investigation on threshold circuits and other discretized neural networks in terms of the following four computational resources—size (the number of gates), depth (the number of layers), weight (weight resolution), and energy—where the energy is a complexity measure inspired by sparse coding and is defined as the maximum number of gates outputting nonzero values, taken over all the input assignments. As our main result, we prove that if a threshold circuit C of size s, depth d, energy e, and weight w computes a Boolean function f (i.e., a classification task) of n variables, it holds that log( rk (f))ed(logs+logw+logn) regardless of the algorithm employed by C to compute f, where rk (f) is a parameter solely determined by a scale of f and defined as the maximum rank of a communication matrix with regard to f taken over all the possible partitions of the n input variables. For example, given a Boolean function CD n(ξ) =i=1n/2ξiξn/2+i, we can prove that n/2ed( log s+logw+logn) holds for any circuit C computing CD n. While its left-hand side is linear in n, its right-hand side is bounded by the product of the logarithmic factors of s,w,n and the linear factors of d,e. If we view the logarithmic terms as having a negligible impact on the bound, our result implies a trade-off between depth and energy: n/2 needs to be smaller than the product of e and d. For other neural network models, such as discretized ReLU circuits and discretized sigmoid circuits, we also prove that a similar trade-off holds. Thus, our results indicate that increasing depth linearly enhances the capability of neural networks to acquire sparse representations when there are hardware constraints on the number of neurons and weight resolution.

Nervous systems receive an abundance of environmental stimuli and encode them within neural networks, in which the internal representations formed play a crucial role in fine neural information processing. More formally, DiCarlo and Cox (2007) considered an internal representation as a firing pattern, that is, a vector in a very high-dimensional space, where each axis is one neuron’s activity, and the dimensionality equals the number (e.g., approximately 1 million) of neurons in a feedforward neural network and then argued that for object recognition tasks, a representation is considered to be good if, for a given pair of images that are hard to distinguish in the input space, there exist representations that are easy to separate by simple classifiers such as linear classifiers.

However, there is a severe constraint toward forming useful internal representations. While the total energy resources supplied to the brain are limited, there are high energetic costs incurred by both the resting and signaling of neurons (Attwell & Laughlin, 2001; Lennie, 2003). Niven and Laughlin (2008) claimed that selective pressures both to improve behavioral performance and reduce energy consumption can affect all levels of components and mechanisms in the brain. Thus, in addition to the development of subcellular structures and individual neurons, neural networks are likely to develop useful and energy-efficient internal representations.

Sparse coding is a strategy in which a relatively small number of neurons are simultaneously active out of a large population and is considered a plausible principle for constructing internal representations in the brain. Sparse coding can reconcile the issue of representational capacity and energy expenditure (Földiák, 2003; Levy & Baxter, 1996; Olshausen & Field, 2004) and has been experimentally observed in various sensory systems (see, for example, Barth & Poulet, 2012; Shoham et al., 2006), including the visual cortex, where a representation of a few active neurons conveys information useful enough to reconstruct or classify natural images (Tang et al., 2018; Yoshida & Ohki, 2020).

Because sparse coding restricts the available internal representations of a neural network, it limits its representational power. Can we quantitatively evaluate the effects of a sparse coding strategy on neural information processing?

Uchizawa et al. (2008) sought to address the question from the viewpoint of circuit complexity. Let C be a class of feedforward circuits modeling neural networks. In a typical circuit complexity argument, we introduce a complexity measure for Boolean functions and show that any Boolean function computable by a circuit CC has a low complexity bounded by its available computational resources for C. If a Boolean function f has inherently high complexity regarding its scale, we can conclude that any circuit in C requires a sufficiently large number of resources to compute f. A Boolean function can model a binary classification task, implying that neural networks cannot construct good internal representations for the task if the computational resources are limited or, equivalently, the scale of the task is too large.

More formally, Uchizawa et al. (2008) employed threshold circuits as a model for neural networks, where a threshold circuit is a feedforward logic circuit whose basic computational element computes a linear threshold function (McCulloch & Pitts, 1943; Minsky & Papert, 1988; Parberry, 1994; Rosenblatt, 1958; Siu & Bruck, 1991; Siu et al., 1995; Siu & Roychowdhury, 1994). Size, depth, and weight are computational resources that have been extensively studied. For a threshold circuit C, size s is defined as the number of gates in C, depth d is the number of layers of C, and weight w is the degree of resolution of the weights among the gates. These resources are likely to be bound by neural networks in the brain; the number of neurons in the brain is clearly limited, and the number of neurons performing a particular task could be further restricted. Depth is related to the reaction time required for performing tasks; therefore, low depth values are preferred. A single synaptic weight may take an analog value; however, it is unlikely that a neuron will have an infinitely high resolution against neuronal noise; hence, a neuron may have a bounded degree of resolution. In addition, inspired by the sparse coding, Uchizawa et al. (2008) introduced a new complexity measure called energy complexity, where the energy of a circuit is defined as the maximum number of internal gates outputting nonzero values taken over all the input assignments to the circuit. Studies on the energy complexities of other types of logic circuits have also been reported: Dinesh et al. (2020), Kasim-zade (1992), Silva and Souza (2022), and Sun et al. (2019); Vaintsvaig (1961).

Uchizawa et al. (2008) then showed that the energy-bounded threshold circuits have a certain computational power by observing that threshold circuits can simulate linear decision trees, where a linear decision tree is a binary decision tree in which a query at each internal node is given by a linear threshold function. In particular, they proved that any linear decision tree of leaves can be simulated by a threshold circuit of size O() and energy O(log). Hence, any linear decision tree of poly (n) leaves can be simulated by a threshold circuit of size poly (n) and energy only O(logn), where n is the number of input variables.

Following Uchizawa et al. (2008), a sequence of papers shows relations among other resources such as size, depth, and weight (Maniwa et al., 2018; Suzuki et al., 2011, 2013; Uchizawa & Takimoto, 2008; Uchizawa, 2020, 2014; Uchizawa et al., 2011). In particular, Uchizawa and Takimoto (2008) showed that any threshold circuit C of depth d and energy e requires size s=2Ω(n/ed) if C computes a high bounded-error communication complexity function such as the inner product modulo 2. Even for low communication complexity functions, an exponential lower bound on the size is known for constant-depth threshold circuits: any threshold circuit C of depth d and energy e requires size s=2Ω(n/e2e+dlogen) if C computes the parity function (Uchizawa, 2020). These results provide exponential lower bounds if the depth is constant and energy is sublinear (Uchizawa & Takimoto, 2008) or sublogarithmic (Uchizawa, 2020), while both the inner product modulo 2 and parity are computable by linear-size, constant-depth, and linear-energy threshold circuits. These results imply that the energy is strongly related to the representational power of threshold circuits and is an important computational resource. However, these lower bounds break down when we consider threshold circuits of larger depth and energy, say, nonconstant depth and sublinear energy.

Here, we provide a more sophisticated relation among size, depth, energy, and weight. Our main result is formulated as follows. Let f be a Boolean function of n variables and (X,Y) be a binary partition of {1,2,...,n}. Then we can express any assignment ξ{0,1}n as (a,b){0,1}|X|×{0,1}|Y|. A communication matrix MfX:Y is a 2|X|×2|Y| matrix, where each row (resp., each column) is indexed by an assignment a{0,1}|X| (resp., b{0,1}|Y|), and the value MfX:Y[a,b] is defined to be the output of f given a and b. We denote by rk (f) the maximum rank of MfX:Y over real numbers, where the maximum is to take over all the partitions (X,Y) of {1,2,...,n}. We establish the following relation.

Theorem 1.
Let s,d,e, and w be integers satisfying 2s,d, 7e, 1w. If a threshold circuit C computes a Boolean function f of n variables and has size s, depth d, energy e, and weight w, then it holds that
(1.1)
Equation 1.1 implies a relationship among size depth, energy, and weight and may provide insights into the challenges associated with the construction of internal representations of neural networks. As illustrative examples, let us consider Boolean functions CD n and EQ n defined as follows. Let n be an even positive integer. For every ξ=(ξ1,ξ2,...,ξn){0,1}[n],
where [[P]] for a statement P denotes a notation of the function that outputs one if P is true and zero otherwise. These functions, CD n and EQ n, have been extensively studied in circuit complexity research and are also biologically motivated Boolean functions. Maass (1997) defined CD n to model coincidence detection or a pattern matching, and Lynch and Musco (2022) introduced a related problem, called the filter problem, for studying the theoretical aspect of spiking neural networks. Furthermore, CD n and EQ n can be considered as simple tasks of visual information processing. The function CD n is related to a visual search that asks whether there is an object satisfying two conditions specified by ξ1,...,ξn/2 and ξn/2+1,...,ξn among n/2 objects. For instance, if we assume that ξi=1 for i[n/2] if and only if the ith object is a letter X, and ξn/2+i=1 for i[n/2] if and only if the ith object is red, then CD n asks whether there exists a red letter X. The function EQ n is also related to a simple pattern-matching task that asks whether the left half of ξ is identical to the right half of ξ. It is well known that rk ( CD n)=2n/2 and rk ( EQ n)=2n/2; the theorem implies that
(1.2)
holds for any threshold circuit C computes CD n or EQ n. While the left-hand side of equation 1.1 is solely determined by the scale of the task and is independent of the four resources, the right-hand side of equation 1.1 consists of linear terms of e,d and logarithmic terms of s,w,n. Therefore, if we view the logarithmic terms as having negligible impact, equation 1.2 implies a trade-off between e and d: regardless of the algorithm that a neural network employs to compute CD n or EQ n, the term n/2 needs to be smaller than the product of e and d.

The theorem also improves known lower bounds for threshold circuits. By rearranging equation 1.2, the following lower bound can be obtained: 2n/(2ed)/(wn)s, which is exponential in n if both d and e are sublinear and w is subexponential. For example, an exponential lower bound s=2Ω(n1/3) can be obtained even for threshold circuits of depth n1/3, energy n1/3, and weight 2o(n1/3). Similar lower bounds can be obtained for the inner product modulo 2 and the equality function because they have linear ranks. Upon comparing the lower bound s=2Ω(n/ed) from the study by Uchizawa and Takimoto (2008) to ours, our lower bound is found to be meaningful only for subexponential weight but affords two-fold improvement: the lower bound is exponential even if d is sublinear, and it provides an exponential lower bound for Boolean functions with a much weaker condition such that Boolean functions has rank Ω(n) over real numbers. Threshold circuits have received considerable attention in circuit complexity, and several lower-bound arguments have developed for threshold circuits under some restrictions on computational resources including size, depth, energy, and weight (Amano, 2020; Amano & Maruoka, 2005; Chen et al., 2018; Hajnal et al., 1993; Håstad & Goldmann, 1991; Impagliazzo et al., 1997; Kane & Williams, 2016; Nisan, 1993; Razborov & Sherstov, 2010; Sherstov, 2007; Uchizawa, 2020; Uchizawa & Takimoto, 2008; Uchizawa et al., 2011). However, the arguments for lower bounds are designated for constant-depth threshold circuits and cannot provide meaningful bounds when the depth is not constant. In particular, CD n and EQ n are computable by polynomial-size and constant-depth threshold circuits. Thus, directly applying known techniques is unlikely to yield our lower bound.

To complement theorem 1, we show that the relation is tight up to a constant factor if the products of e and d are small.

Theorem 2.
For any integers e and d such that 2e and 2d, CD n is computable by a threshold circuit of size
depth d, energy e, and weight
Substituting s,d,e, and w of a threshold circuit given in theorem 2 to the right-hand side of equation 1.2, we have
which almost matches the left-hand side of equation 1.2 if ed=o(n/logn). Thus, theorem 1 captures the computational aspect of optimal threshold circuits computing CD n. Recall that any linear decision tree of a polynomial number of leaves can be simulated by a polynomial-size and logarithmic-energy threshold circuit (Uchizawa et al., 2008). Further, it is known that CD n is computable by a threshold circuit of size O(n), depth O(n), energy O(1), and weight O(1), and any Boolean function is computable by a threshold circuit of depth two and energy one if an exponential size is allowed (Maniwa et al., 2018). Thus, the situation ed=o(n/logn) is not considered to be too restrictive. We also show in theorem 9 that the lower bound is tight for EQ n.

Apart from threshold circuits, we consider other well-studied models of neural networks, where an activation function and weights are discretized (e.g., discretized sigmoid and ReLU circuits). The size, depth, energy, and weight are also important parameters for artificial neural networks. The size and depth are major topics on the success of deep learning. The energy is related to important techniques for deep learning methods such as sparse autoencoder (Ng, 2011). The weight resolution is closely related to chip resources in neuromorphic hardware systems (Pfeil et al., 2012); accordingly, quantization schemes have received considerable attention (Courbariaux et al., 2015; Hubara et al., 2018).

For discretized circuits, we also show that there exists a similar trade-off to the one for threshold circuits. For example, the following proposition can be obtained for discretized sigmoid circuits:

Theorem 3.
If a discretized sigmoid circuit C of size s, depth d, energy e, and weight w computes a Boolean function f of n variables, then it holds that

Therefore, artificial neural networks obtained through the machine learning process may face a challenge to obtain good internal representations; because hardware constraints are imposed on the number of neurons and the weight resolution, the depth is predefined, and the learning algorithm may force a resulting network to acquire sparse activity. Further, c times larger depth is comparable to 2c times larger size. Thus, increasing the depth could significantly help neural networks to increase their expressive power and can evidently aid neural networks in acquiring sparse activity. Consequently, our bound may afford insight into the reason for the success of deep learning.

The remainder of the article is organized as follows. In section 2, we formally introduce the terminologies and notations that are used in the rest of the article. In section 3, we present our main lower-bound result. In section 4, we show the tightness of the lower bound. In section 5, we show a bound for discretized circuits. In section 6, we conclude with some remarks.

For an integer n, we denote by [n] a set {1,2,...n}. For a finite set Z of integers, we denote by {0,1}Z a set of 2|Z| Boolean assignments, where each assignment consists of |Z| elements, each of which is indexed by iZ. The base of the logarithm is two unless stated otherwise. In section 2.1, we define terms on threshold circuits and discretized circuits. In section 2.2, we define the communication matrix together with some related terms and summarize some facts.

2.1  Circuit Model

2.1.1  Threshold Circuits

Let k be a positive integer. A threshold gate g with k input variables has weight wi for each i[k] and a threshold t. We define the output g(ξ) of g for ξ=(ξ1,ξ2,...,ξk){0,1}[k] as
To evaluate the weight resolution, we assume single synaptic weight to be discrete and w1,w2,...,wk to be integers. The weight wg of g is defined as the maximum of the absolute values of the weights of g. In other words, we assume that w1,w2,...,wk are O(logwg)-bit coded discrete values. Throughout the article, we allow a gate to have both positive and negative weights, even though biological neurons are either excitatory (all the weights are positive) or inhibitory (all the weights are negative). As Maass (1997) noted, this relaxation has basically no impact on circuit complexity investigations unless one cares about constant blow-up in computational resources.

A threshold circuit C is a combinatorial circuit consisting of threshold gates and is expressed by a directed acyclic graph. The nodes of in-degree 0 correspond to input variables, and the other nodes correspond to gates. Let G be a set of the gates in C. For each gate gG, the level of g, denoted by lev (g), is defined as the length of a longest path from an input variable to g on the underlying graph of C. For each [d], we define G as a set of gates in the th level: G={gG lev (g)=}. We denote by g clf a unique output gate, which is a linear classifier separating internal representations given by the gates in the lower levels (possibly together with input variables). We say that Ccomputes a Boolean function f:{0,1}[n]{0,1} if g clf (ξ)=f(ξ) for every ξ{0,1}[n]. Although the inputs to a gate g in C may not be only the input variables but the outputs of gates in the lower levels, we write g(ξ) for the output of g for ξ{0,1}[n] because ξ inductively decides the output of g.

Let C be a threshold circuit. We define size s of C as the number of the gates in C and depth d of C as the level of g clf . We define the energy e of C as
We define weight w of C as the maximum of the weights of the gates in C: w=maxgGwg.
In this article, we sometimes consider a Boolean function f:{0,1}[n]{0,1} together with a partition (X,Y) of [n]. We can then write f as f:{0,1}X×{0,1}Y{0,1}. Let C be a circuit computing f and consider a gate g in C. Let wi be the weight for the ith input variable for every i[n] and tg be the threshold of g. For each gate h directed to g in C, let wh,g be a weight of g for the output of h. Then the output g(a,b) of g for (a,b){0,1}X×{0,1}Y is defined as
where pg(a,b) denotes a potential of g invoked by the input assignments and gates:
We may write pgX(a) (resp., pgY(b)) for the potential invoked by a (resp., b):

2.1.2  Discretized Circuits

Let ϕ be an activation function. Let δ be a discretizer that maps a real number to a number representable by a bit width b. We define a discretized activation function δϕ as a composition of ϕ and δ, that is, δϕ(x)=δ(ϕ(x)) for any number x. We say that δϕ has a silent range for an interval I if δϕ(x)=0 if xI, and δϕ(x)0, otherwise. For example, if we use the ReLU function as the activation function ϕ, then δϕ has a silent range for I=(-,0] for any discretizer δ. If we use the sigmoid function as the activation function ϕ and linear partition as a discretizer δ, then δϕ has a silent range for I=(-,tmax] where tmax=ln(1/(2b-1)) and ln is the natural logarithm.

Let δϕ be a discretized activation function with a silent range. A (δϕ)-gate g with k input variables has weight wi for each i[k], and a threshold t, where the weights and threshold are discretized by δ. The output g(ξ) of g for ξ=(ξ1,ξ2,...,ξk){0,1}[k] is then defined as
A (δϕ)-circuit is a combinatorial circuit consisting of (δϕ)-gates except that the top gate g clf is a threshold gate, that is, a linear classifier. We define the size and depth of a (δϕ)-circuit the same as the ones for a threshold circuit. We define energy e of a (δϕ)-circuit as the maximum number of gates outputting nonzero values in the circuit:

We define weight w of C as w=22b, where 2b is the bit width possibly needed to represent a potential value invoked by a single input of a gate in C.

2.2  Communication Matrix and Its Rank

Let f:{0,1}[n]{0,1} be a Boolean function. For a partition (X,Y) of [n], we can view f as f:{0,1}X×{0,1}Y{0,1}. We define communication matrix MfX:Y as a 2|X|×2|Y| matrix where each row and column is indexed by a{0,1}X and b{0,1}Y, respectively, and each entry is defined as MfX:Y(a,b)=f(a,b). For I{0,1}X and J{0,1}Y, we call R=I×J a combinatorial rectangle and say that R is monochromatic with respect to MfX:Y if MfX:Y is constant on R. If a circuit C computes f, we may write MCX:Y instead of MfX:Y. Figure 1a shows a communication matrix MfX:Y where f(ξ)= sign (i=16ξi-3), X={1,2,3} and Y={4,5,6}. Figure 1b shows a monochromatic combinatorial rectangle R=I×J with respect to MfX:Y, where I={(0,1,1),(1,0,1),(1,1,0)} and J={(0,0,1),(0,1,0),(0,1,1),(1,0,1)}.

Figure 1:

(a) Communication matrix MfX:Y, where f(ξ)= sign (i=16ξi-3), X={1,2,3}, and Y={4,5,6}. (b) Monochromatic combinatrial rectangle R=I×J with respect to MfX:Y, where I={(0,1,1),(1,0,1),(1,1,0)} and J={(0,0,1),(0,1,0),(0,1,1),(1,0,1)}.

Figure 1:

(a) Communication matrix MfX:Y, where f(ξ)= sign (i=16ξi-3), X={1,2,3}, and Y={4,5,6}. (b) Monochromatic combinatrial rectangle R=I×J with respect to MfX:Y, where I={(0,1,1),(1,0,1),(1,1,0)} and J={(0,0,1),(0,1,0),(0,1,1),(1,0,1)}.

Close modal
For a Boolean function f:{0,1}[n]{0,1} and a partition (X,Y) of [n], we denote by rk (MfX:Y) the rank of MfX:Y over real numbers. We then define rank rk (f) of f as
where the maximum is taken over all the partitions of [n]. Note that the rank of f is defined for any Boolean function f without any obvious partition of the input variables.
Let n be an even positive integer. The disjointness function DISJ n of n input variables is defined as follows: For every ξ=(ξ1,ξ2,...,ξn){0,1}[n],
Jukna (2011) provided a simple proof showing that M DISJ nX:Y has full rank for a natural partition (X,Y) of Z.
Theorem 4.

For X={1,...,n/2} and Y={n/2+1,...,n}, rk (M DISJ nX:Y)=2n/2. Thus, rk ( DISJ n)=2n/2.

The coincidence detection function CD n is defined as the complement of DISJ n. We can obtain the same bound for CD n. We also consider another Boolean function EQ n of n input variables asking if ξi=ξn/2+i for every i[n/2]:
For X={1,...,n/2} and Y={n/2+1,...n}, M EQ nX:Y is the identity matrix with full rank. Thus, we have the following proposition:
Proposition 1.

rk ( CD n)=2n/2 and rk ( EQ n)=2n/2.

We also use well-known facts on the rank. Let A and B be two matrices of the same dimensions. We denote by A+B the summation of A and B, and by AB the Hadamard product of A and B.

Fact 1.

For two matrices, A and B, of the same dimensions, we have

  • rk (A+B) rk (A)+ rk (B),

  • rk (AB) rk (A)· rk (B).

In this section, we provide our main results showing the relationship among the four resources and trade-offs.

Theorem 5
(theorem 1 restated). Let s,d,e, and w be integers satisfying 2s,d, 7e, 1w. Suppose a threshold circuit C computes a Boolean function f of n input variables and has size s, depth d, energy e, and weight w. Then it holds that

Let C be a threshold circuit computing a Boolean function f of n variables. We prove the theorem by showing that for any partition (X,Y) of [n], we can express MCX:Y as a sum of matrices, each of which corresponds to an internal representation that arises in C. Since C has bounded energy, the number of internal representations is also bounded. We then show by the inclusion-exclusion principle that each matrix corresponding to an internal representation has a bounded rank. Thus, fact 1 implies the theorem.

Proof.
Let C be a threshold circuit computing a Boolean function f of n variables. Suppose C has size s, depth d, energy e and weight w. Consider an arbitrary partition (X,Y) of [n]. Without loss of generality, we assume that |X||Y|, and hence |X|n/2. We evaluate the rank of MCX:Y and prove that
(3.1)
where c<3. Equation (3.1) implies that
where the last inequality holds if e7. Since equation 3.1 holds for any partition (X,Y) of [n], we obtain the theorem by taking the logarithm of the inequality.
Let G be a set of the gates in C. For [d], let G be a set of the gates in the th level of C. Without loss of generality, we assume that Gd={g clf }. Let P=(P1,P2,...,Pd), where P is a subset of G for each [d]. Given an input (a,b){0,1}X×{0,1}Y, we say that an internal representation P arises for (a,b) if, for every [d], g(a,b)=1 for every gP, and g(a,b)=0 for every gGP. We denote by P*(a,b) the internal representation that arises for (a,b){0,1}X×{0,1}Y. We then define P1 as a set of the internal representations that arise for (a,b) such that g clf (a,b)=1:
For any P=(P1,P2,...,Pd)P1, we have |P1|+|P2|++|Pd-1|e-1 and |Pd|=1. Thus, a standard upper bound on a sum of binomial coefficients implies that
(3.2)
for some constant c<3.
For each PP1, let MP be a 2|X|×2|Y| matrix such that for every (a,b){0,1}X×{0,1}Y,
By the definitions of P1 and MP, we have
and fact 1(i) therefore implies that
Thus, equation 3.2 implies that
We complete the proof by showing that for any PP1, it holds that
In the following argument, we consider an arbitrary fixed internal representation P=(P1,P2,...,Pd) in P1. We call a gate a threshold function if the input to the gate consists of the n input variables alone. For each gG, and (a,b){0,1}X×{0,1}Y, we denote by τ[g,P](a,b) a threshold function defined as
where tg[P] is a threshold of g, based on the assumption that the internal representation P arises. That is, for every gG1, we have tg[P]=tg, and for every gG, where 2,
For each [d], we define a set T of threshold functions as T={τ[g,P]gG}. Since every gate in G1 is a threshold function, T1 is identical to G1.
For any set T of threshold functions, we denote by M[T] a 2|X|×2|Y| matrix such that for every (a,b){0,1}X×{0,1}Y,
It is well known that the rank of M[T] is bounded (Forster et al., 2001; Hajnal et al., 1993). We give a proof for completeness.
Claim 1.

rk (M[T])(nw+1)|T|.

Proof.
Let z=|T|, and τ1,τ2,...,τz be an arbitrary order of threshold functions in T. For each k[z], we define
Since a threshold function receives a value between -w and w from a single input, we have |Rk|2|X|w+1nw+1. For r=(r1,r2,...,rz)R1×R2××Rz, we define R(r)=I(r)×J(r) as a combinatorial rectangle where
and
Clearly, all the rectangles are disjoint and monochromatic with respect to M[T], and hence M[T] can be expressed as a sum of rank-1 matrices given by R(r)’s taken over all the r’s. Thus fact 1(i) implies that its rank is at most |R1×R2××Rz|(nw+1)z.
For each [d], based on P in P, we define a set Q of threshold functions as
and a family T(Q) of sets T of threshold functions as
Following the inclusion-exclusion principle, we define a 2|X|×2|Y| matrix,
We can show that MP is expressed as the Hadamard product of H[Q1],H[Q2],...,H[Qd]:
Claim 2.

MP=H[Q1]H[Q2]H[Qd].

Proof.
Consider an arbitrary fixed assignment (a,b){0,1}X×{0,1}Y. We show that
if MP(a,b)=0, and
if MP(a,b)=1. We write P*=(P1*,P2*,...,Pd*) to denote P*(a,b) for a simpler notation.
Suppose MP(a,b)=0. In this case, we have PP*, and hence there exists a level [d] such that either =1 or PP* while P'=P'* for every '[-1]. For such , it holds that
(3.3)
for every gG. We show that H[Q](a,b)=0, which implies that the Hadamard product is 0, as desired. We consider the following two cases: PP* and PP*.
Consider the case where PP*; then there exists gPP*. Since gP*, we have τ[g,P*](a,b)=0. Thus, equation 3.3 implies that τ[g,P](a,b)=0, and hence M[T](a,b)=0 for every T such that QT. Therefore, for every TT(Q), we have M[T](a,b)=0, and hence
Consider the other case, where PP*. Let Q*={τ[g,P*]gP*}. Equation 3.3 implies that M[T](a,b)=1 if T satisfies QTQ*, and M[T](a,b)=0, otherwise. Thus,
Therefore, by the binomial theorem,
Suppose MP(a,b)=1. In this case, we have P=P*. Thus, for every [d], equation 3.3 implies that M[T](a,b)=1 if T=Q, and M[T](a,b)=0 otherwise. Therefore,
Consequently, H[Q](a,b)H[Q2](a,b)H[Qd](a,b)=1, as desired.
We finally evaluate rk (MP). Claim 2 and fact 1(ii) imply that
(3.4)
Since
fact 1(i) and claim 1 imply that
(3.5)
for every [d-1], and
(3.6)
Equations 3.4 to 3.6 imply that
as desired. Thus, equation 3.1 is verified.

Combining proposition 1 and theorem 5, we obtain the following corollaries:

Corollary 1.
Let s,d,e, and w be integers satisfying 2s,d, 7e, 1w. Suppose a threshold circuit C of size s, depth d, energy e, and weight w computes CD n. Then it holds that
Equivalently, we have 2n/(2ed)/(nw)s.
Corollary 2.
Let s,d,e, and w be integers satisfying 2s,d, 7e, 1w. Suppose a threshold circuit C of size s, depth d, energy e, and weight w computes EQ n. Then it holds that
Equivalently, we have 2n/(2ed)/(nw)s.

We can obtain a similar relationship also for discretized circuits, as follows.

Theorem 6.
Let δ be a discretizer and ϕ be an activation function such that δϕ has a silent range. Suppose a (δϕ)-circuit C computes a Boolean function f of n variables and has size s, depth d, energy e, and weight w. Then it holds that

We establish the trade-off by showing that any discretized circuit can be simulated using a threshold circuit with a moderate increase in size, depth, energy, and weight. Theorem 5 then implies the claim. The detailed proof is included in the supplemental appendix.

In this section, we show that the trade-off given in Theorem 5 is tight if the depth and energy are small.

4.1  Definitions

Let z be a positive integer and f be a Boolean function of n variables. We say that f is z-piecewise with f1,f2,...,fz if the following conditions are satisfied. Let
then
  1. B1,B2,...,Bz compose a partition of [n].

  2. |Bj|n/z for every j[z].

  3. For every assignment ξ{0,1}[n],

We say that a set of threshold gates sharing input variables is a neural set, and a neural set is selective if at most one of the gates in the set outputs one for any input assignment. A selective neural set S computes a Boolean function f if for every assignment in f-1(0), no gates in S output one, while for every assignment in f-1(1), exactly one gate in S outputs one. We define the size and weight of S as |S| and maxgSwg, respectively. Below we assume that S does not contain a threshold gate computing a constant function.

Since any conjunction of literals can be computed by a threshold gate, we can obtain by a DNF-like construction a selective neural set of exponential size that computes f for any Boolean function f (see example 7 and theorem 2.3 in Uchizawa, 2014).

Theorem 7.

For any Boolean function f of n variables, there exists a selective neural set of size 2n and weight one that computes f.

4.2  Upper Bounds

The following lemma shows that we can construct threshold circuits of small energy for piecewise functions.

Lemma 1.
Let e and d be integers satisfying 2e and 2d, and z=(e-1)(d-1). Suppose f:{0,1}[n]{0,1} is a z-piecewise function with f1,f2,...,fz. If fj is computable by a selective neural set of size at most s' and weight w' for every j[z], f is computable by a threshold circuit of size
depth d, energy e, and weight
Proof.
For simplicity, we assume that n is divisible by z. Let f:{0,1}[n]{0,1} be z-piecewise with f1,f2,...,fz. We first prove the proposition for the case where f satisfies
Let us relabel fj for j[z] as fk, for k[e-1], and [d-1]. We then denote by Bk,={i[n]theithinputvariableisfedintofk,} for k[e-1], and [d-1], each of which contains n/z integers. Thus, we have
By the assumption, fk, is computable by a selective neural set Sk, of size s' for every pair of k[e-1] and [d-1].

We construct the desired threshold circuit C by arranging and connecting the selective neural sets, where C has a simple layered structure consisting of the selective neural sets. After we complete the construction of C, we show that C computes f and then evaluate its size, depth, energy, and weight.

We start from the bottom level. We put in the first level the gates in Sk,1 for every k[e-1]. Then, for each , 2d-1, we add at the th level the gates gSk, for every k[e-1], and connect the outputs of all the gates at the lower level to every gSk, with weight -w'n/z. For gSk, and (ξ){0,1}[n], we denote by gC(ξ) the output of the gate g placed in C. Finally, we add g clf that computes a conjunction of all gates in the layers 1,2...,d-1: For every ξ{0,1}[n],

We here show that C computes f. By construction, the following claim is easy to verify:

Claim 3.
For any gSk,,
Proof.

If every gate at the levels 1,...,-1 outputs zero, the output of gC is identical to the counterpart of g, and hence gC(ξ)=g(ξ). Otherwise, there is a gate outputting one at the lower levels. Since gC receives an output from the gate at the lower level, the value -w'n/z is added to the potential of gC. Since g receives at most n/z weights whose absolute values are bounded by w', the potential of gC is now below its threshold.

Suppose f(ξ)=0. In this case, for every k[e-1], [d-1], and gSk,, g(ξ)=0. Therefore, claim 3 implies that no gate in C outputs one.

Suppose f(ξ)=1. In this case, there exists *[d-1] such that fk,l*(a,b)=1 for some k[e-1], while fk,l(a,b)=0 for every 1ke-1 and 1*-1. Since Sk,* computes fk,*, claim 3 implies that there exists gSk,l* such that gC(ξ)=1, which implies that g clf (ξ)=1.

Finally, we evaluate the size, depth, energy, and weight of C. Since C contains at most s' gates for each pair of k[e-1] and [d-1] where z=(e-1)(d-1), we have in total szs'+1. The additional one corresponds to the output gate. Because the gates gSk, are placed at the th level for [d-1], the level of g clf is clearly d, and, hence, C has depth d. Claim 3 implies that if there is a gate outputting one at level , then no gate in higher levels outputs one. In addition, since Sk, is selective, at most one gate gSk, outputs one. Therefore, at most e-1 gates at the th level output one, followed by g clf gates outputting one. Thus, C has an energy e. Any connection in C has weight at most w' or w'n/z. Thus, the weight of C is w'n/z.

Consider the other case where
We can obtain the desired circuit C by the same construction as above except that g clf computes the complement of conjunction of all gates in the layers 1,2...,d-1:

Clearly, CD n is a piecewise function, and so the lemma gives our upper bound for CD n.

Theorem 8.
(Theorem 2 Restated). For any integers e and d such that 2e and 2d, CD n is computable by a threshold circuit of size
depth d, energy e, and weight
Proof.

For simplicity, we consider the case where n/2 is a multiple of z=(e-1)(d-1). It suffices to show that CD n is z-piecewise and computable by a neural set of size s'=2n/(2z) and weight w'=n/(2z).

We can verify that CD n is z-piecewise, as follows. For each j, 1jz, let
We then have
for every ξ=(ξ1,ξ2,...,ξn){0,1}[n], where
and
Consider an arbitrary fixed j[z]. Below we show that fj is computable by a neural set of size
and weight w'n/(2z).
Recall that we denote by [[P]] for a statement P a function that outputs one if P is true and zero otherwise. Then fj can be expressed as
(4.1)
where
For any assignment ξ{0,1}[n], we define B*(ξ) as
Then, for every ξ{0,1}[n],
(4.2)
The function FjB is computable by a threshold gate.
Claim 4.
For any BBj', FjB can be computed by a threshold gate gjB with weights w1,w2,...,wn for the n input variables, and the threshold t is defined as follows: for every i[n/2],
and
and t=|B|2+1.
Proof.
Suppose FjB(ξ)=1, that is, B=B*(ξ), and there exists i*B such that ξn/2+i*=1. Then we have
thus, gjB outputs one.

Suppose FjB(ξ)=0. There are two cases: BB*(ξ) and ξn/2+i=0 for every iB.

Consider the first case. If there exists i*B*(x)B, then
Thus,
and consequently, gjB outputs zero. If B*(ξ)B, then
Thus,
hence, gjB outputs zero.
In the second case, we have
Thus,
hence, gjB outputs zero.
For any ξ{0,1}[n], equation 4.2 implies that only gjB*(ξ) are allowed to output one. Thus, by equation 4.1, a selective neural set,
computes fj. Since |Bj'|n/(2z), we have |Sj|2n/(2z). Claim 4 implies that w'n/(2z).

We can also obtain a similar proposition for EQ n.

Theorem 9.
For any integers e and d such that 2e and 2d, EQ n is computable by a threshold circuit of size
depth d, energy e, and weight
Proof.
For simplicity, we consider the case where n/2 is a multiple of z=(e-1)(d-1). Similar to CD n, EQ n is z-piecewise, as follows. For each j, 1jn/2, let
We then have
where
and
Then theorem 7 implies that fj is computable by a selective neural set of size s'=2n/z and w'=1.

We prove here that a threshold circuit is able to compute only a Boolean function of which the communication matrix has a rank bounded by a product of logarithmic factors of size and weight and linear factors of depth and energy. This bound implies a trade-off between depth and energy if we view the logarithmic terms as having negligible impact. We also prove that a similar trade-off exists for discretized circuits, which suggests that increasing the depth linearly improves the ability of neural networks to decrease the number of neurons outputting nonzero values, subject to hardware constraints on the number of neurons and weight resolution.

It is natural to ask if there exists a similar trade-off tailored for a practically plausible distribution. For a circuit C and a distribution Q over the input assignments to a threshold circuit C, Uchizawa et al. (2008) defined the average energy complexity EC Q of C as

where the expectation is taken over Q. They then showed that the average energy complexity can be bounded by the entropy over the internal representations on Q. It would be interesting if there exists a trade-off with regard to the average energy complexity. For circuit complexity, our trade-off implies a lower bound Ω(n/logn) on the energy of polynomial size, constant depth and polynomial-weight threshold circuits computing CD n. It would also be interesting to ask if there exists a Boolean function that needs linear or even superlinear energy for polynomial-size threshold circuits.

Since we simplified and ignored many aspects of neural computation, our results are not enough to perfectly explain representational power of neural networks in the brain. However, circuit complexity arguments can potentially aid in devising a plausible principle behind neural computation. Apart from the three-level approach to understanding brain computation (computational level, algorithmic level, and implementation level) reported by Marr (1982), Valiant (2014) added a requirement that it has to incorporate some understanding of the quantitative constraints faced by the cortex. Circuit complexity arguments could provide quantitative constraints through complexity measures. Further, Maass et al. (2019) identified the difficulty of uncovering a neural algorithm employed by the brain because its hardware could be extremely adapted to the task; consequently the algorithm vanishes. Even if its precise structure, connectivity, and vast array of numerical parameters are known in the minutest detail, extracting an algorithm implemented in the network would still be difficult. A trade-off does not provide a description of an explicit neural algorithm but can afford insights relevant for formulating computational principles because its argument necessarily concerns every algorithm that a theoretical model of a neural network can implement.

The preliminary version of our article was presented at MFCS2023 (Uchizawa & Abe, 2023). We thank the anonymous reviewers of MFCS2023 for their careful reading and helpful comments. We also thank the anonymous reviewers of Neural Computation for constructive suggestions greatly improving the presentation and organization. This work was supported by JSPS KAKENHI grant JP22K11897.

Amano
,
K.
(
2020
).
On the size of depth-two threshold circuits for the inner product mod 2 function
. In
A.
Leporati
,
C.
Martín-Vide
,
D.
Shapira
, &
C.
Zandron
(Eds.),
Language and automata theory and applications
(pp. 
235
247
).
Springer
.
Amano
,
K.
, &
Maruoka
,
A.
(
2005
).
On the complexity of depth-2 circuits with threshold gates
. In
Proceedings of the 30th International Conference on Mathematical Foundations of Computer Science
(pp. 
107
118
).
Attwell
,
D.
, &
Laughlin
,
S. B.
(
2001
).
An energy budget for signaling in the grey matter of the brain
.
Journal of Cerebral Blood Flow and Metabolism
,
21
(
10
),
1133
1145
.
Barth
,
A. L.
, &
Poulet
,
J. F.
(
2012
).
Experimental evidence for sparse firing in the neocortex
.
Trends in Neurosciences
,
35
(
6
),
345
355
.
Chen
,
R.
,
Santhanam
,
R.
, &
Srinivasan
,
S.
(
2018
).
Average-case lower bounds and satisfiability algorithms for small threshold circuits
.
Theory of Computing
,
14
(
9
),
1
55
.
Courbariaux
,
M.
,
Bengio
,
Y.
, &
David
,
J.-P.
(
2015
).
BinaryConnect: Training deep neural networks with binary weights during propagations
. In
C.
Cortes
,
N.
Lawrence
,
D.
Lee
,
M.
Sugiyama
, &
R.
Garnett
(Eds.),
Advances in neural information processing systems, 28
(pp. 
3123
3131
).
MIT Press
.
DiCarlo
,
J. J.
&
Cox
,
D. D.
(
2007
).
Untangling invariant object recognition
.
Trends in Cognitive Sciences
,
11
(
8
),
333
341
.
Dinesh
,
K.
,
Otiv
,
S.
, &
Sarma
,
J.
(
2020
).
New bounds for energy complexity of Boolean functions
.
Theoretical Computer Science
,
845
,
59
75
.
Földiák
,
P.
(
2003
).
Sparse coding in the primate cortex
. In
M. A.
Arbib
(Ed.),
The handbook of brain theory and neural networks
(pp. 
1064
1068
).
MIT Press
.
Forster
,
J.
,
Krause
,
M.
,
Lokam
,
S. V.
,
Mubarakzjanov
,
R.
,
Schmitt
,
N.
, &
Simon
,
H. U.
(
2001
).
Relations between communication complexity, linear arrangements, and computational complexity
. In
Proceedings of the 21st International Conference on Foundations of Software Technology and Theoretical Computer Science
(pp. 
171
182
).
Hajnal
,
A.
,
Maass
,
W.
,
Pudlák
,
P.
,
Szegedy
,
M.
, &
Turán
,
G.
(
1993
).
Threshold circuits of bounded depth
.
Journal of Computer and System Sciences
,
46
,
129
154
.
Håstad
,
J.
, &
Goldmann
,
M.
(
1991
).
On the power of small-depth threshold circuits
.
Computational Complexity
,
1
(
2
),
113
129
.
Hubara
,
I.
,
Courbariaux
,
M.
,
Soudry
,
D.
,
El-Yaniv
,
R.
, &
Bengio
,
Y.
(
2018
).
Quantized neural networks: Training neural networks with low precision weights and activations
.
Journal of Machine Learning Research
,
18
(
187
),
1
30
.
Impagliazzo
,
R.
,
Paturi
,
R.
, &
Saks
,
M. E.
(
1997
).
Size-depth tradeoffs for threshold circuits
.
SIAM Journal on Computing
,
26
(
3
),
693
707
.
Jukna
,
S.
(
2011
).
Extremal combinatorics with applications in computer science
.
Springer
.
Kane
,
D. M.
, &
Williams
,
R.
(
2016
).
Super-linear gate and super-quadratic wire lower bounds for depth-two and depth-three threshold circuits
. In
Proceedings of the 48th Annual ACM Symposium on Theory of Computing
(pp. 
633
643
).
Kasim-zade
,
O. M.
(
1992
).
On a measure of active circuits of functional elements
Mathematical Problems in Cybernetics
,
4
,
218
228
.
(in Russian)
.
Lennie
,
P.
(
2003
).
The cost of cortical computation
.
Current Biology
,
13
,
493
497
.
Levy
,
W. B.
, &
Baxter
,
R. A.
(
1996
).
Energy efficient neural codes
.
Neural Computation
,
8
(
3
),
531
543
.
Lynch
,
N.
, &
Musco
,
C.
(
2022
).
A basic compositional model for spiking neural networks
(pp. 
403
449
).
Springer
.
Maass
,
W.
(
1997
).
Networks of spiking neurons: The third generation of neural network models
.
Neural Networks
,
10
(
9
),
1659
1671
.
Maass
,
W.
,
Papadimitriou
,
C. H.
,
Vempala
,
S.
, &
Legenstein
,
R.
(
2019
).
Brain computation: A computer science perspective
(pp. 
184
199
).
Springer
.
Maniwa
,
H.
,
Oki
,
T.
,
Suzuki
,
A.
,
Kei
, &
Zhou
,
X.
(
2018
).
Computational power of threshold circuits of energy at most two
.
IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences
,
E101.A
(
9
),
1431
1439
.
Marr
,
D.
(
1982
).
Vision: A computational investigation into the human representation and processing of visual information
.
Freeman
.
McCulloch
,
W. S.
, &
Pitts
,
W.
(
1943
).
A logical calculus of the ideas immanent in nervous activity
.
Bulletin of Mathematical Biophysics
,
5
,
115
133
.
Minsky
,
M.
, &
Papert
,
S.
(
1988
).
Perceptrons: An introduction to computational geometry
.
MIT Press
.
Ng
,
A.
(
2011
).
Sparse autoencoder
.
CS294A Lecture notes
.
Stanford University
. https://web.stanford.edu/class/cs294a/handouts.html
Nisan
,
N.
(
1993
).
The communication complexity of threshold gates
. In
Proceedings of Combinatorics, Paul Erdös Is Eighty
(pp. 
301
315
).
Niven
,
J. E.
, &
Laughlin
,
S. B.
(
2008
).
Energy limitation as a selective pressure on the evolution of sensory systems
.
Journal of Experimental Biology
,
211
(
11
),
1792
1804
.
Olshausen
,
B. A.
, &
Field
,
D. J.
(
2004
).
Sparse coding of sensory inputs
.
Current Opinion in Neurobiology
,
14
(
4
),
481
487
.
Parberry
,
I.
(
1994
).
Circuit complexity and neural networks
.
MIT Press
.
Pfeil
,
T.
,
Potjans
,
T.
,
Schrader
,
S.
,
Potjans
,
W.
,
Schemmel
,
J.
,
Diesmann
,
M.
, &
Meier
,
K.
(
2012
).
Is a 4-bit synaptic weight resolution enough? Constraints on enabling spike-timing dependent plasticity in neuromorphic hardware
.
Frontiers in Neuroscience
,
6
,
90
.
Razborov
,
A. A.
, &
Sherstov
,
A. A.
(
2010
).
The sign-rank of AC0
.
SIAM Journal on Computing
,
39
(
5
),
1833
1855
.
Rosenblatt
,
F.
(
1958
).
The perceptron: A probabilistic model for information storage and organization in the brain
.
Psychological Review
,
65
(
6
),
386
408
.
Sherstov
,
A. A.
(
2007
).
Separating AC0 from depth-2 majority circuits
. In
Proceedings of the Thirty-Ninth Annual ACM Symposium on Theory of Computing
(pp. 
294
301
).
Shoham
,
S.
,
O’Connor
,
D. H.
, &
Segev
,
R.
(
2006
).
How silent is the brain: Is there a “dark matter” problem in neuroscience?
Journal of Comparative Physiology A
,
192
,
777
784
.
Silva
,
J. C. N.
, &
Souza
,
U. S.
(
2022
).
Computing the best-case energy complexity of satisfying assignments in monotone circuits
.
Theoretical Computer Science
,
932
,
41
55
.
Siu
,
K. Y.
, &
Bruck
,
J.
(
1991
).
On the power of threshold circuits with small weights
.
SIAM Journal on Discrete Mathematics
,
4
(
3
),
423
435
.
Siu
,
K.-Y.
, &
Roychowdhury
,
V. P.
(
1994
).
On optimal depth threshold circuits for multiplication and related problems
.
SIAM Journal on Discrete Mathematics
,
7
(
2
),
284
292
.
Siu
,
K.-Y.
,
Roychowdhury
,
V.
, &
Kailath
,
T.
(
1995
).
Discrete neural computation: A theoretical foundation
.
Prentice Hall
.
Sun
,
X.
,
Sun
,
Y.
,
Wu
,
K.
, &
Xia
,
Z.
(
2019
).
On the relationship between energy complexity and other Boolean function measures
. In
Proceedings of the 25th International Computing and Combinatorics Conference
(pp. 
516
528
).
Suzuki
,
A.
,
Uchizawa
,
K.
, &
Zhou
,
X.
(
2011
).
Energy-efficient threshold circuits computing MOD functions
. In
Proceedings of the 17th Computing: The Australasian Theory Symposium
(pp. 
105
110
).
Suzuki
,
A.
,
Uchizawa
,
K.
, &
Zhou
,
X.
(
2013
).
Energy-efficient threshold circuits computing MOD functions
.
International Journal of Foundations of Computer Science
,
24
(
1
),
15
29
.
Tang
,
S.
,
Zhang
,
Y.
,
Li
,
Z.
,
Li
,
M.
,
Liu
,
F.
,
Jiang
,
H.
, &
Lee
,
T. S.
(
2018
).
Large-scale two-photon imaging revealed super-sparse population codes in the V1 superficial layer of awake monkeys
.
eLife
,
7
,
e33370
.
Uchizawa
,
K.
(
2014
).
Lower bounds for threshold circuits of bounded energy
.
Interdisciplinary Information Sciences
,
20
(
1
),
27
50
.
Uchizawa
,
K.
(
2020
).
Size, depth and energy of threshold circuits computing parity function
. In
Y.
Cao
,
S.-W.
Cheng
, &
M.
Li
(Eds.), in
Proceedings of the 31st International Symposium on Algorithms and Computation
(pp. 
54:1
54:13
).
Uchizawa
,
K.
, &
Abe
,
H.
(
2023
).
Exponential lower bounds for threshold circuits of sub-linear depth and energy
. In
J.
Leroux
,
S.
Lombardy
, &
D.
Peleg
(Eds.), in
Proceedings of the 48th International Symposium on Mathematical Foundations of Computer Science
(pp. 
85:1
85:15
).
Uchizawa
,
K.
,
Douglas
,
R.
, &
Maass
,
W.
(
2008
).
On the computational power of threshold circuits with sparse activity
.
Neural Computation
,
18
(
12
),
2994
3008
.
Uchizawa
,
K.
, &
Takimoto
,
E.
(
2008
).
Exponential lower bounds on the size of constant-depth threshold circuits with small energy complexity
.
Theoretical Computer Science
,
407
(
1–3
),
474
487
.
Uchizawa
,
K.
,
Takimoto
,
E.
, &
Nishizeki
,
T.
(
2011
).
Size-energy tradeoffs of unate circuits computing symmetric Boolean functions
.
Theoretical Computer Science
,
412
,
773
782
.
Vaintsvaig
,
M. N.
(
1961
).
On the power of networks of functional elements (in Russian)
.
Doklady Akademii Nauk
,
139
(
2
),
320
323
.
Valiant
,
L. G.
(
2014
).
What must a global theory of cortex explain?
Current Opinion in Neurobiology
,
25
,
15
19
.
Yoshida
,
T.
, &
Ohki
,
K.
(
2020
).
Natural images are reliably represented by sparse and variable populations of neurons in visual cortex
.
Nature Communications
,
11
(
872
).

Supplementary data