## Abstract

We present an investigation on threshold circuits and other discretized neural networks in terms of the following four computational resources—size (the number of gates), depth (the number of layers), weight (weight resolution), and energy—where the energy is a complexity measure inspired by sparse coding and is defined as the maximum number of gates outputting nonzero values, taken over all the input assignments. As our main result, we prove that if a threshold circuit $C$ of size $s$, depth $d$, energy $e$, and weight $w$ computes a Boolean function $f$ (i.e., a classification task) of $n$ variables, it holds that $log( rk (f))\u2264ed(logs+logw+logn)$ regardless of the algorithm employed by $C$ to compute $f$, where $ rk (f)$ is a parameter solely determined by a scale of $f$ and defined as the maximum rank of a communication matrix with regard to $f$ taken over all the possible partitions of the $n$ input variables. For example, given a Boolean function $ CD n(\xi $) $=$$\u22c1i=1n/2\xi i\u2227\xi n/2+i$, we can prove that $n/2\u2264ed( log s+logw+logn)$ holds for any circuit $C$ computing $ CD n$. While its left-hand side is linear in $n$, its right-hand side is bounded by the product of the logarithmic factors of $s,w,n$ and the linear factors of $d,e$. If we view the logarithmic terms as having a negligible impact on the bound, our result implies a trade-off between depth and energy: $n/2$ needs to be smaller than the product of $e$ and $d$. For other neural network models, such as discretized ReLU circuits and discretized sigmoid circuits, we also prove that a similar trade-off holds. Thus, our results indicate that increasing depth linearly enhances the capability of neural networks to acquire sparse representations when there are hardware constraints on the number of neurons and weight resolution.

## 1 Introduction

Nervous systems receive an abundance of environmental stimuli and encode them within neural networks, in which the internal representations formed play a crucial role in fine neural information processing. More formally, DiCarlo and Cox (2007) considered an internal representation as a firing pattern, that is, a vector in a very high-dimensional space, where each axis is one neuron’s activity, and the dimensionality equals the number (e.g., approximately 1 million) of neurons in a feedforward neural network and then argued that for object recognition tasks, a representation is considered to be good if, for a given pair of images that are hard to distinguish in the input space, there exist representations that are easy to separate by simple classifiers such as linear classifiers.

However, there is a severe constraint toward forming useful internal representations. While the total energy resources supplied to the brain are limited, there are high energetic costs incurred by both the resting and signaling of neurons (Attwell & Laughlin, 2001; Lennie, 2003). Niven and Laughlin (2008) claimed that selective pressures both to improve behavioral performance and reduce energy consumption can affect all levels of components and mechanisms in the brain. Thus, in addition to the development of subcellular structures and individual neurons, neural networks are likely to develop useful and energy-efficient internal representations.

Sparse coding is a strategy in which a relatively small number of neurons are simultaneously active out of a large population and is considered a plausible principle for constructing internal representations in the brain. Sparse coding can reconcile the issue of representational capacity and energy expenditure (Földiák, 2003; Levy & Baxter, 1996; Olshausen & Field, 2004) and has been experimentally observed in various sensory systems (see, for example, Barth & Poulet, 2012; Shoham et al., 2006), including the visual cortex, where a representation of a few active neurons conveys information useful enough to reconstruct or classify natural images (Tang et al., 2018; Yoshida & Ohki, 2020).

Because sparse coding restricts the available internal representations of a neural network, it limits its representational power. Can we quantitatively evaluate the effects of a sparse coding strategy on neural information processing?

Uchizawa et al. (2008) sought to address the question from the viewpoint of circuit complexity. Let $C$ be a class of feedforward circuits modeling neural networks. In a typical circuit complexity argument, we introduce a complexity measure for Boolean functions and show that any Boolean function computable by a circuit $C\u2208C$ has a low complexity bounded by its available computational resources for $C$. If a Boolean function $f$ has inherently high complexity regarding its scale, we can conclude that any circuit in $C$ requires a sufficiently large number of resources to compute $f$. A Boolean function can model a binary classification task, implying that neural networks cannot construct good internal representations for the task if the computational resources are limited or, equivalently, the scale of the task is too large.

More formally, Uchizawa et al. (2008) employed threshold circuits as a model for neural networks, where a threshold circuit is a feedforward logic circuit whose basic computational element computes a linear threshold function (McCulloch & Pitts, 1943; Minsky & Papert, 1988; Parberry, 1994; Rosenblatt, 1958; Siu & Bruck, 1991; Siu et al., 1995; Siu & Roychowdhury, 1994). Size, depth, and weight are computational resources that have been extensively studied. For a threshold circuit $C$, size $s$ is defined as the number of gates in $C$, depth $d$ is the number of layers of $C$, and weight $w$ is the degree of resolution of the weights among the gates. These resources are likely to be bound by neural networks in the brain; the number of neurons in the brain is clearly limited, and the number of neurons performing a particular task could be further restricted. Depth is related to the reaction time required for performing tasks; therefore, low depth values are preferred. A single synaptic weight may take an analog value; however, it is unlikely that a neuron will have an infinitely high resolution against neuronal noise; hence, a neuron may have a bounded degree of resolution. In addition, inspired by the sparse coding, Uchizawa et al. (2008) introduced a new complexity measure called energy complexity, where the energy of a circuit is defined as the maximum number of internal gates outputting nonzero values taken over all the input assignments to the circuit. Studies on the energy complexities of other types of logic circuits have also been reported: Dinesh et al. (2020), Kasim-zade (1992), Silva and Souza (2022), and Sun et al. (2019); Vaintsvaig (1961).

Uchizawa et al. (2008) then showed that the energy-bounded threshold circuits have a certain computational power by observing that threshold circuits can simulate linear decision trees, where a linear decision tree is a binary decision tree in which a query at each internal node is given by a linear threshold function. In particular, they proved that any linear decision tree of $\u2113$ leaves can be simulated by a threshold circuit of size $O(\u2113)$ and energy $O(log\u2113)$. Hence, any linear decision tree of $ poly (n)$ leaves can be simulated by a threshold circuit of size $ poly (n)$ and energy only $O(logn)$, where $n$ is the number of input variables.

Following Uchizawa et al. (2008), a sequence of papers shows relations among other resources such as size, depth, and weight (Maniwa et al., 2018; Suzuki et al., 2011, 2013; Uchizawa & Takimoto, 2008; Uchizawa, 2020, 2014; Uchizawa et al., 2011). In particular, Uchizawa and Takimoto (2008) showed that any threshold circuit $C$ of depth $d$ and energy $e$ requires size $s=2\Omega (n/ed)$ if $C$ computes a high bounded-error communication complexity function such as the inner product modulo 2. Even for low communication complexity functions, an exponential lower bound on the size is known for constant-depth threshold circuits: any threshold circuit $C$ of depth $d$ and energy $e$ requires size $s=2\Omega (n/e2e+dlogen)$ if $C$ computes the parity function (Uchizawa, 2020). These results provide exponential lower bounds if the depth is constant and energy is sublinear (Uchizawa & Takimoto, 2008) or sublogarithmic (Uchizawa, 2020), while both the inner product modulo 2 and parity are computable by linear-size, constant-depth, and linear-energy threshold circuits. These results imply that the energy is strongly related to the representational power of threshold circuits and is an important computational resource. However, these lower bounds break down when we consider threshold circuits of larger depth and energy, say, nonconstant depth and sublinear energy.

Here, we provide a more sophisticated relation among size, depth, energy, and weight. Our main result is formulated as follows. Let $f$ be a Boolean function of $n$ variables and $(X,Y)$ be a binary partition of ${1,2,...,n}$. Then we can express any assignment $\xi \u2208{0,1}n$ as $(a,b)\u2208{0,1}|X|\xd7{0,1}|Y|$. A communication matrix $MfX:Y$ is a $2|X|\xd72|Y|$ matrix, where each row (resp., each column) is indexed by an assignment $a\u2208{0,1}|X|$ (resp., $b\u2208{0,1}|Y|$), and the value $MfX:Y[a,b]$ is defined to be the output of $f$ given $a$ and $b$. We denote by $ rk (f)$ the maximum rank of $MfX:Y$ over real numbers, where the maximum is to take over all the partitions $(X,Y)$ of ${1,2,...,n}$. We establish the following relation.

The theorem also improves known lower bounds for threshold circuits. By rearranging equation 1.2, the following lower bound can be obtained: $2n/(2ed)/(wn)\u2264s$, which is exponential in $n$ if both $d$ and $e$ are sublinear and $w$ is subexponential. For example, an exponential lower bound $s=2\Omega (n1/3)$ can be obtained even for threshold circuits of depth $n1/3$, energy $n1/3$, and weight $2o(n1/3)$. Similar lower bounds can be obtained for the inner product modulo 2 and the equality function because they have linear ranks. Upon comparing the lower bound $s=2\Omega (n/ed)$ from the study by Uchizawa and Takimoto (2008) to ours, our lower bound is found to be meaningful only for subexponential weight but affords two-fold improvement: the lower bound is exponential even if $d$ is sublinear, and it provides an exponential lower bound for Boolean functions with a much weaker condition such that Boolean functions has rank $\Omega (n)$ over real numbers. Threshold circuits have received considerable attention in circuit complexity, and several lower-bound arguments have developed for threshold circuits under some restrictions on computational resources including size, depth, energy, and weight (Amano, 2020; Amano & Maruoka, 2005; Chen et al., 2018; Hajnal et al., 1993; Håstad & Goldmann, 1991; Impagliazzo et al., 1997; Kane & Williams, 2016; Nisan, 1993; Razborov & Sherstov, 2010; Sherstov, 2007; Uchizawa, 2020; Uchizawa & Takimoto, 2008; Uchizawa et al., 2011). However, the arguments for lower bounds are designated for constant-depth threshold circuits and cannot provide meaningful bounds when the depth is not constant. In particular, $ CD n$ and $ EQ n$ are computable by polynomial-size and constant-depth threshold circuits. Thus, directly applying known techniques is unlikely to yield our lower bound.

To complement theorem 1, we show that the relation is tight up to a constant factor if the products of $e$ and $d$ are small.

Apart from threshold circuits, we consider other well-studied models of neural networks, where an activation function and weights are discretized (e.g., discretized sigmoid and ReLU circuits). The size, depth, energy, and weight are also important parameters for artificial neural networks. The size and depth are major topics on the success of deep learning. The energy is related to important techniques for deep learning methods such as sparse autoencoder (Ng, 2011). The weight resolution is closely related to chip resources in neuromorphic hardware systems (Pfeil et al., 2012); accordingly, quantization schemes have received considerable attention (Courbariaux et al., 2015; Hubara et al., 2018).

For discretized circuits, we also show that there exists a similar trade-off to the one for threshold circuits. For example, the following proposition can be obtained for discretized sigmoid circuits:

Therefore, artificial neural networks obtained through the machine learning process may face a challenge to obtain good internal representations; because hardware constraints are imposed on the number of neurons and the weight resolution, the depth is predefined, and the learning algorithm may force a resulting network to acquire sparse activity. Further, $c$ times larger depth is comparable to $2c$ times larger size. Thus, increasing the depth could significantly help neural networks to increase their expressive power and can evidently aid neural networks in acquiring sparse activity. Consequently, our bound may afford insight into the reason for the success of deep learning.

The remainder of the article is organized as follows. In section 2, we formally introduce the terminologies and notations that are used in the rest of the article. In section 3, we present our main lower-bound result. In section 4, we show the tightness of the lower bound. In section 5, we show a bound for discretized circuits. In section 6, we conclude with some remarks.

## 2 Preliminaries

For an integer $n$, we denote by $[n]$ a set ${1,2,...n}$. For a finite set $Z$ of integers, we denote by ${0,1}Z$ a set of $2|Z|$ Boolean assignments, where each assignment consists of $|Z|$ elements, each of which is indexed by $i\u2208Z$. The base of the logarithm is two unless stated otherwise. In section 2.1, we define terms on threshold circuits and discretized circuits. In section 2.2, we define the communication matrix together with some related terms and summarize some facts.

### 2.1 Circuit Model

#### 2.1.1 Threshold Circuits

A threshold circuit $C$ is a combinatorial circuit consisting of threshold gates and is expressed by a directed acyclic graph. The nodes of in-degree 0 correspond to input variables, and the other nodes correspond to gates. Let $G$ be a set of the gates in $C$. For each gate $g\u2208G$, the level of $g$, denoted by $ lev (g)$, is defined as the length of a longest path from an input variable to $g$ on the underlying graph of $C$. For each $\u2113\u2208[d]$, we define $G\u2113$ as a set of gates in the $\u2113$th level: $G\u2113={g\u2208G\u2223 lev (g)=\u2113}$. We denote by $g clf $ a unique output gate, which is a linear classifier separating internal representations given by the gates in the lower levels (possibly together with input variables). We say that $C$*computes* a Boolean function $f:{0,1}[n]\u2192{0,1}$ if $g clf (\xi )=f(\xi )$ for every $\xi \u2208{0,1}[n]$. Although the inputs to a gate $g$ in $C$ may not be only the input variables but the outputs of gates in the lower levels, we write $g(\xi )$ for the output of $g$ for $\xi \u2208{0,1}[n]$ because $\xi $ inductively decides the output of $g$.

#### 2.1.2 Discretized Circuits

Let $\varphi $ be an activation function. Let $\delta $ be a discretizer that maps a real number to a number representable by a bit width $b$. We define a discretized activation function $\delta \u2218\varphi $ as a composition of $\varphi $ and $\delta $, that is, $\delta \u2218\varphi (x)=\delta (\varphi (x))$ for any number $x$. We say that $\delta \u2218\varphi $ has a silent range for an interval $I$ if $\delta \u2218\varphi (x)=0$ if $x\u2208I$, and $\delta \u2218\varphi (x)\u22600$, otherwise. For example, if we use the ReLU function as the activation function $\varphi $, then $\delta \u2218\varphi $ has a silent range for $I=(-\u221e,0]$ for any discretizer $\delta $. If we use the sigmoid function as the activation function $\varphi $ and linear partition as a discretizer $\delta $, then $\delta \u2218\varphi $ has a silent range for $I=(-\u221e,tmax]$ where $tmax=ln(1/(2b-1))$ and $ln$ is the natural logarithm.

We define weight $w$ of $C$ as $w=22b$, where $2b$ is the bit width possibly needed to represent a potential value invoked by a single input of a gate in $C$.

### 2.2 Communication Matrix and Its Rank

Let $f:{0,1}[n]\u2192{0,1}$ be a Boolean function. For a partition $(X,Y)$ of $[n]$, we can view $f$ as $f:{0,1}X\xd7{0,1}Y\u2192{0,1}$. We define communication matrix $MfX:Y$ as a $2|X|\xd72|Y|$ matrix where each row and column is indexed by $a\u2208{0,1}X$ and $b\u2208{0,1}Y$, respectively, and each entry is defined as $MfX:Y(a,b)=f(a,b)$. For $I\u2286{0,1}X$ and $J\u2286{0,1}Y$, we call $R=I\xd7J$ a combinatorial rectangle and say that $R$ is monochromatic with respect to $MfX:Y$ if $MfX:Y$ is constant on $R$. If a circuit $C$ computes $f$, we may write $MCX:Y$ instead of $MfX:Y$. Figure 1a shows a communication matrix $MfX:Y$ where $f(\xi )= sign (\u2211i=16\xi i-3)$, $X={1,2,3}$ and $Y={4,5,6}$. Figure 1b shows a monochromatic combinatorial rectangle $R=I\xd7J$ with respect to $MfX:Y$, where $I={(0,1,1),(1,0,1),(1,1,0)}$ and $J={(0,0,1),(0,1,0),(0,1,1),(1,0,1)}$.

For $X={1,...,n/2}$ and $Y={n/2+1,...,n}$, $ rk (M DISJ nX:Y)=2n/2$. Thus, $ rk ( DISJ n)=2n/2$.

$ rk ( CD n)=2n/2$ and $ rk ( EQ n)=2n/2$.

We also use well-known facts on the rank. Let $A$ and $B$ be two matrices of the same dimensions. We denote by $A+B$ the summation of $A$ and $B$, and by $A\u2218B$ the Hadamard product of $A$ and $B$.

For two matrices, $A$ and $B$, of the same dimensions, we have

$ rk (A+B)\u2264 rk (A)+ rk (B)$,

$ rk (A\u2218B)\u2264 rk (A)\xb7 rk (B)$.

## 3 Trade-Off for Threshold Circuits

In this section, we provide our main results showing the relationship among the four resources and trade-offs.

Let $C$ be a threshold circuit computing a Boolean function $f$ of $n$ variables. We prove the theorem by showing that for any partition $(X,Y)$ of $[n]$, we can express $MCX:Y$ as a sum of matrices, each of which corresponds to an internal representation that arises in $C$. Since $C$ has bounded energy, the number of internal representations is also bounded. We then show by the inclusion-exclusion principle that each matrix corresponding to an internal representation has a bounded rank. Thus, fact 1 implies the theorem.

*threshold function*if the input to the gate consists of the $n$ input variables alone. For each $g\u2208G$, and $(a,b)\u2208{0,1}X\xd7{0,1}Y$, we denote by $\tau [g,P](a,b)$ a threshold function defined as

$ rk (M[T])\u2264(nw+1)|T|$.

$MP=H[Q1]\u2218H[Q2]\u2218\cdots \u2218H[Qd]$.

We can obtain a similar relationship also for discretized circuits, as follows.

We establish the trade-off by showing that any discretized circuit can be simulated using a threshold circuit with a moderate increase in size, depth, energy, and weight. Theorem 5 then implies the claim. The detailed proof is included in the supplemental appendix.

## 4 Tightness of the Trade-Off

In this section, we show that the trade-off given in Theorem 5 is tight if the depth and energy are small.

### 4.1 Definitions

$B1,B2,...,Bz$ compose a partition of $[n]$.

$|Bj|\u2264\u2308n/z\u2309$ for every $j\u2208[z]$.

- For every assignment $\xi \u2208{0,1}[n]$,$f(\xi )=\u22c1j=1zfj(\xi )orf(\xi )=\u22c1j=1zfj(\xi )\xaf.$

We say that a set of threshold gates sharing input variables is a neural set, and a neural set is selective if at most one of the gates in the set outputs one for any input assignment. A selective neural set $S$ computes a Boolean function $f$ if for every assignment in $f-1(0)$, no gates in $S$ output one, while for every assignment in $f-1(1)$, exactly one gate in $S$ outputs one. We define the size and weight of $S$ as $|S|$ and $maxg\u2208Swg$, respectively. Below we assume that $S$ does not contain a threshold gate computing a constant function.

Since any conjunction of literals can be computed by a threshold gate, we can obtain by a DNF-like construction a selective neural set of exponential size that computes $f$ for any Boolean function $f$ (see example 7 and theorem 2.3 in Uchizawa, 2014).

For any Boolean function $f$ of $n$ variables, there exists a selective neural set of size $2n$ and weight one that computes $f$.

### 4.2 Upper Bounds

The following lemma shows that we can construct threshold circuits of small energy for piecewise functions.

We construct the desired threshold circuit $C$ by arranging and connecting the selective neural sets, where $C$ has a simple layered structure consisting of the selective neural sets. After we complete the construction of $C$, we show that $C$ computes $f$ and then evaluate its size, depth, energy, and weight.

We here show that $C$ computes $f$. By construction, the following claim is easy to verify:

If every gate at the levels $1,...,\u2113-1$ outputs zero, the output of $gC$ is identical to the counterpart of $g$, and hence $gC(\xi )=g(\xi )$. Otherwise, there is a gate outputting one at the lower levels. Since $gC$ receives an output from the gate at the lower level, the value $-w'n/z$ is added to the potential of $gC$. Since $g$ receives at most $n/z$ weights whose absolute values are bounded by $w'$, the potential of $gC$ is now below its threshold.

Suppose $f(\xi )=0$. In this case, for every $k\u2208[e-1]$, $\u2113\u2208[d-1]$, and $g\u2208Sk,\u2113$, $g(\xi )=0$. Therefore, claim 3 implies that no gate in $C$ outputs one.

Suppose $f(\xi )=1$. In this case, there exists $\u2113*\u2208[d-1]$ such that $fk,l*(a,b)=1$ for some $k\u2208[e-1]$, while $fk,l(a,b)=0$ for every $1\u2264k\u2264e-1$ and $1\u2264\u2113\u2264\u2113*-1$. Since $Sk,\u2113*$ computes $fk,\u2113*$, claim 3 implies that there exists $g\u2208Sk,l*$ such that $gC(\xi )=1$, which implies that $g clf (\xi )=1$.

Finally, we evaluate the size, depth, energy, and weight of $C$. Since $C$ contains at most $s'$ gates for each pair of $k\u2208[e-1]$ and $\u2113\u2208[d-1]$ where $z=(e-1)(d-1)$, we have in total $s\u2264zs'+1$. The additional one corresponds to the output gate. Because the gates $g\u2208Sk,\u2113$ are placed at the $\u2113$th level for $\u2113\u2208[d-1]$, the level of $g clf $ is clearly $d$, and, hence, $C$ has depth $d$. Claim 3 implies that if there is a gate outputting one at level $\u2113$, then no gate in higher levels outputs one. In addition, since $Sk,\u2113$ is selective, at most one gate $g\u2208Sk,\u2113$ outputs one. Therefore, at most $e-1$ gates at the $\u2113$th level output one, followed by $g clf $ gates outputting one. Thus, $C$ has an energy $e$. Any connection in $C$ has weight at most $w'$ or $w'n/z$. Thus, the weight of $C$ is $w'n/z$.

Clearly, $ CD n$ is a piecewise function, and so the lemma gives our upper bound for $ CD n$.

For simplicity, we consider the case where $n/2$ is a multiple of $z=(e-1)(d-1)$. It suffices to show that $ CD n$ is $z$-piecewise and computable by a neural set of size $s'=2n/(2z)$ and weight $w'=n/(2z)$.

Suppose $FjB(\xi )=0$. There are two cases: $B\u2260B*(\xi )$ and $\xi n/2+i=0$ for every $i\u2208B$.

We can also obtain a similar proposition for $ EQ n$.

## 5 Conclusion

We prove here that a threshold circuit is able to compute only a Boolean function of which the communication matrix has a rank bounded by a product of logarithmic factors of size and weight and linear factors of depth and energy. This bound implies a trade-off between depth and energy if we view the logarithmic terms as having negligible impact. We also prove that a similar trade-off exists for discretized circuits, which suggests that increasing the depth linearly improves the ability of neural networks to decrease the number of neurons outputting nonzero values, subject to hardware constraints on the number of neurons and weight resolution.

where the expectation is taken over $Q$. They then showed that the average energy complexity can be bounded by the entropy over the internal representations on $Q$. It would be interesting if there exists a trade-off with regard to the average energy complexity. For circuit complexity, our trade-off implies a lower bound $\Omega (n/logn)$ on the energy of polynomial size, constant depth and polynomial-weight threshold circuits computing $ CD n$. It would also be interesting to ask if there exists a Boolean function that needs linear or even superlinear energy for polynomial-size threshold circuits.

Since we simplified and ignored many aspects of neural computation, our results are not enough to perfectly explain representational power of neural networks in the brain. However, circuit complexity arguments can potentially aid in devising a plausible principle behind neural computation. Apart from the three-level approach to understanding brain computation (computational level, algorithmic level, and implementation level) reported by Marr (1982), Valiant (2014) added a requirement that it has to incorporate some understanding of the quantitative constraints faced by the cortex. Circuit complexity arguments could provide quantitative constraints through complexity measures. Further, Maass et al. (2019) identified the difficulty of uncovering a neural algorithm employed by the brain because its hardware could be extremely adapted to the task; consequently the algorithm vanishes. Even if its precise structure, connectivity, and vast array of numerical parameters are known in the minutest detail, extracting an algorithm implemented in the network would still be difficult. A trade-off does not provide a description of an explicit neural algorithm but can afford insights relevant for formulating computational principles because its argument necessarily concerns every algorithm that a theoretical model of a neural network can implement.

## Acknowledgments

The preliminary version of our article was presented at MFCS2023 (Uchizawa & Abe, 2023). We thank the anonymous reviewers of MFCS2023 for their careful reading and helpful comments. We also thank the anonymous reviewers of *Neural Computation* for constructive suggestions greatly improving the presentation and organization. This work was supported by JSPS KAKENHI grant JP22K11897.

## References

*Language and automata theory and applications*

*Proceedings of the 30th International Conference on Mathematical Foundations of Computer Science*

*Journal of Cerebral Blood Flow and Metabolism*

*Trends in Neurosciences*

*Theory of Computing*

*Advances in neural information processing systems, 28*

*Trends in Cognitive Sciences*

*Theoretical Computer Science*

*The handbook of brain theory and neural networks*

*Proceedings of the 21st International Conference on Foundations of Software Technology and Theoretical Computer Science*

*Journal of Computer and System Sciences*

*Computational Complexity*

*Journal of Machine Learning Research*

*SIAM Journal on Computing*

*Extremal combinatorics with applications in computer science*

*Proceedings of the 48th Annual ACM Symposium on Theory of Computing*

*Mathematical Problems in Cybernetics*

*Current Biology*

*Neural Computation*

*A basic compositional model for spiking neural networks*

*Neural Networks*

*Brain computation: A computer science perspective*

*IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences*

*Vision: A computational investigation into the human representation and processing of visual information*

*Bulletin of Mathematical Biophysics*

*Perceptrons: An introduction to computational geometry*

*Proceedings of Combinatorics, Paul Erdös Is Eighty*

*Journal of Experimental Biology*

*Current Opinion in Neurobiology*

*Circuit complexity and neural networks*

*Frontiers in Neuroscience*

^{0}

*SIAM Journal on Computing*

*Psychological Review*

^{0}from depth-2 majority circuits

*Proceedings of the Thirty-Ninth Annual ACM Symposium on Theory of Computing*

*Journal of Comparative Physiology A*

*Theoretical Computer Science*

*SIAM Journal on Discrete Mathematics*

*SIAM Journal on Discrete Mathematics*

*Discrete neural computation: A theoretical foundation*

*Proceedings of the 25th International Computing and Combinatorics Conference*

*Proceedings of the 17th Computing: The Australasian Theory Symposium*

*International Journal of Foundations of Computer Science*

*eLife*

*Interdisciplinary Information Sciences*

*Proceedings of the 31st International Symposium on Algorithms and Computation*

*Proceedings of the 48th International Symposium on Mathematical Foundations of Computer Science*

*Neural Computation*

*Theoretical Computer Science*

*Theoretical Computer Science*

*Doklady Akademii Nauk*

*Current Opinion in Neurobiology*

*Nature Communications*