## Abstract

Winner-take-all (WTA) refers to the neural operation that selects a (typically small) group of neurons from a large neuron pool. It is conjectured to underlie many of the brain's fundamental computational abilities. However, not much is known about the robustness of a spike-based WTA network to the inherent randomness of the input spike trains. In this work, we consider a spike-based $k$–WTA model wherein $n$ randomly generated input spike trains compete with each other based on their underlying firing rates and $k$ winners are supposed to be selected. We slot the time evenly with each time slot of length 1 ms and model the $n$ input spike trains as $n$ independent Bernoulli processes. We analytically characterize the minimum waiting time needed so that a target minimax decision accuracy (success probability) can be reached.

We first derive an information-theoretic lower bound on the waiting time. We show that to guarantee a (minimax) decision error $≤δ$ (where $δ∈(0,1)$), the waiting time of any WTA circuit is at least
$((1-δ)log(k(n-k)+1)-1)TR,$
where $R⊆(0,1)$ is a finite set of rates and $TR$ is a difficulty parameter of a WTA task with respect to set $R$ for independent input spike trains. Additionally, $TR$ is independent of $δ$, $n$, and $k$. We then design a simple WTA circuit whose waiting time is
$Olog1δ+logk(n-k)TR,$
provided that the local memory of each output neuron is sufficiently long. It turns out that for any fixed $δ$, this decision time is order-optimal (i.e., it matches the above lower bound up to a multiplicative constant factor) in terms of its scaling in $n$, $k$, and $TR$.

## 1  Introduction

Humans and animals can form a stable perception and make robust judgments under ambiguous conditions. For example, we can easily recognize a dog in a picture regardless of its posture, hair color, and whether it stands in the shadow or is occluded by other objects. One fundamental feature of brain computation is its robustness to the randomness introduced at different stages, such as sensory representations (Kinoshita & Komatsu, 2001; Hubel & Wiesel, 1959), feature integration (Kourtzi, Tolias, Altmann, Augath, & Logothetis, 2003; Majaj, Carandini, & Movshon, 2007), decision formation (Platt & Glimcher, 1999; Shadlen & Newsome, 2001), and motor planning (Harris & Wolpert, 1998; Li, Chen, Guo, Gerfen, & Svoboda, 2015). It has been shown that neurons encode information in a stochastic manner in the brain (Baddeley et al., 1997; Kara, Reinagel, & Reid, 2000; Maimon & Assad, 2009; Ferrari, Deny, Marre, & Mora, 2018); even when the exact same sensory stimulus is presented or when the same kinematics are achieved, no deterministic patterns in the spike trains exist. Facing environmental ambiguity, humans and animals adaptively refine their behaviors by incorporating prior knowledge with their current sensory measurements (Faisal, Selen, & Wolpert, 2008; Knill & Pouget, 2004; Stocker & Simoncelli, 2006; Ernst & Banks, 2002; Körding & Wolpert, 2004). Nevertheless, it remains relatively unclear how neurons carry out robust computation facing ambiguity. Sparse coding is a common strategy in brain computation; to encode a task-relevant variable, often only a small group of neurons from a large neuron pool are activated (Olshausen & Field, 2004; Perez-Orive et al., 2002; Hromádka, DeWeese, & Zador, 2008; Quiroga, Kreiman, Koch, & Fried, 2008; Karlsson & Frank, 2008; Redgrave, Prescott, & Gurney, 1999). Understanding the underlying neuron selection mechanism is highly challenging.

Winner-take-all (WTA) is a hypothesized mechanism to select proper neuron from a competitive network of neurons and is conjectured to be a fundamental primitive of cognitive functions such as attention and object recognition (Riesenhuber & Poggio, 1999; Itti, Koch, & Niebur, 1998; Yuille & Geiger, 1998; Maass, 2000; Hertz, Krogh, Palmer, & Horner, 1991; Shamir, 2006). Among these studies, it is commonly assumed that neurons transmit information with a continuous variable such as the firing rate. This assumption, however, ignores how temporal coding may also contribute to cortical computations. For example, some neurons in the auditory cortex will respond to auditory events with bursts at a fixed latency (Gerstner, Kempter, van Hemmen, & Wagner, 1996; Nelken, 2004). This phase-locking property is also observed in the hippocampus as well as the prefrontal cortex (Siapas, Lubenov, & Wilson, 2005; Hahn, Sakmann, & Mehta, 2006; Buzsáki & Chrobak, 1995). Another feature that has been neglected in a rate-based model is the inherent noise in the inputs. Although some studies used additive gaussian noise (Kriener, Chaudhuri, & Fiete, 2017; Li, Li, & Wang, 2013; Lee, Itti, Koch, & Braun, 1999; Rougier & Vitay, 2006) to account for input randomness, such WTA circuits are very sensitive to noise and could not successfully select even a single winner unless extra robustness strategy, such as an additional nonlinearity, is introduced into the dynamics (Kriener et al., 2017). Finally, neurons have a refractory period, which prevents spikes from backpropagating in axons (Berry & Meister, 1998), and such a feature is usually neglected in the rate-based models. In contrast, a spike-based model may capture these neglected features. Nevertheless, how WTA computation can be implemented and its algorithmic characterization remain relatively underexplored (Shamir, 2006, 2009).

In this letter, we study a spike-based $k$-WTA model wherein $n$ randomly generated input spike trains are competing with each other with their underlying firing rates, and the true winners are the $k$ input spike trains whose underlying firing rates are higher than others (Hertz et al., 1991). A desired WTA circuit should quickly respond to these random input spike trains and should successfully select the $k$ true winners with high probability. We analytically characterize the minimum amount of waiting time needed so that a target minimax decision accuracy (defined in section 3.2) can be reached. More precisely, we slot the time evenly with each time slot of length 1 ms and assume that these $n$ input spike trains are generated by $n$ independent Bernoulli processes with different rates. We use Bernoulli processes to capture the randomness in the input spike trains rather than using the popular Poisson processes because a Bernoulli process can be viewed as the time-slotted version of a refractory-period-modified Poisson process. Notably, a Bernoulli process with a 1 ms time slot is just a simplified approximation to the real dynamics in the brain, given that in the brain, the refractory period varies across neurons and the refractory period of some neuron could extend beyond 1 ms. In our model, we implicitly assume that the absolute refractory period is 1 ms, a value commonly reported in the literature (Teleńczuk, Kempter, Curio, & Destexhe, 2017; Nicholls, Martin, Wallace, & Fuchs, 2001).1 A WTA circuit contains $n$ output neurons, each of which is paired with an input spike train. In addition, the behaviors (spike patterns) of these output neurons encode which input spike trains are declared to be the winners. For special cases where $k=1$, different winner declaration strategies are considered in the literature (Shamir, 2006, 2009; Lynch, Musco, & Parter, 2016; Kriener et al., 2017), such as the identity of an output neuron that spikes much more frequently than the other output neurons (Kriener et al., 2017), of the neuron that fires the first spike in a population of neurons (Shamir, 2009, 2006), and of the output neuron that fires alone for a sufficiently long time (Lynch et al., 2016). Clearly, the minimum amount of waiting time needed to achieve a given accuracy varies with the choice of winner declaration strategy. Nevertheless, in order to derive a lower bound that holds for all winner declaration strategies, at this point, we do not specify the winner declaration strategy used in our circuit construction; this specification is postponed to section 5. In this letter, we investigate two closely related problems: (1) the fundamental limits of any WTA circuit in selecting $k$ true winners from $n$ independent Bernoulli input spike trains (in terms of waiting time to achieve a target accuracy) and (2) the existence of WTA circuits that can achieve the above fundamental limits.

To answer the first question, we consider a general model (formally described in section 2) without restricting the adopted network architectures, activation functions, winner declaration strategies, and so on, so that the derived lower bound can provide guidance for and insight into constructing a large family of WTA circuits. We derive a lower bound on the waiting and decision time in order to achieve a given decision accuracy. We show that no WTA circuit can have a waiting time strictly less than
$((1-δ)log(k(n-k)+1)-1)TR,$
(1.1)
where $R⊆[c,C]⊆(0,1)$ is a finite set of rates, $TR$ is a difficulty parameter of a WTA task with respect to set $R$ for independent input spike trains, $n$ is the number of input spike trains, $k$ is the number of winners, and $(1-δ)$ is the given target decision accuracy. Here $c,C$ are two absolute constants such that $0 and $δ∈(0,1)$. Moreover, $TR$ is independent of $δ$, $n$, and $k$. In many practical settings, we care about the sparse coding region where $k≪n$. Not surprisingly, the lower bound grows with the network size $n$ when other parameters are fixed. This is because the larger $n$ is, the noisier the WTA competition. Similarly, when $n$ and $k$ are fixed, the easier it is to distinguish two independent spike trains with different rates (i.e., the smaller $TR$), the shorter the necessary decision time is. Our lower bound is obtained by an information-theoretic argument and holds for all WTA circuits without restricting their winner declaration strategies, circuit architectures, and the adopted activation functions. Throughout this letter, we are interested in the decision time's scaling in $n$, $k$, and $TR$, while treating $δ∈(0,1)$ as a small but fixed constant.
To answer the second question, we construct a simple circuit whose decision time is
$Olog1δ+logk(n-k)TR,$
provided that the local memory of each output neuron is sufficiently long.2 In this circuit, there are $n$ pairs of input and output neurons and no hidden neurons. Each input neuron is connected to the corresponding output neuron, and the $n$ output neurons mutually inhibit each other. Each output neuron has a local memory of length $m$ (formally defined in section 2.2) and adopts a simple threshold activation function (specified in section 5.1.3). The first $k$ output neurons that spike in the same time slot are declared to be the winners; the identities of such $k$ output neurons are the circuit's estimate of the $k$ true winners. The formal circuit construction can be found in section 5. We show that for any fixed $δ∈(0,1)$, provided that
$m>8C2(1-c)c2(1-C)log3δ+logk(n-k)TR:=m*,$
(1.2)
with probability at least $1-δ$, it holds that by time $m*$, there exist exactly $k$ output neurons that spike in the same time slot, and the first set of such $k$ output neurons are indeed the true winners. It turns out that this decision time ($m*$) is order-optimal in terms of its scaling in $n$, $k$, and $TR$; $m*$ matches the lower bound in equation 1.1 up to a constant multiplicative factor. (The formal argument showing order-optimality can be found in remark 11.) In a sense, the local memory of each output neuron plays a crucial role in “denoising” the randomness in the input spike trains. In practice, an output neuron's local memory might not satisfy the condition in equation 1.2. Nevertheless, this does not exclude the application of our WTA circuit to the contexts where $m$ is small. This is because the memory variable might be implemented via some neural code near an output neuron. The detailed implementation of the local memory affects only the circuit's architecture; it does not affect the order optimality of our WTA circuit. The typical dynamics of our circuit are that the number of output neurons that spike simultaneously (i.e., spike at the same time) increase monotonically until exactly $k$ output neurons spike simultaneously. The simultaneous spikes of these $k$ output neurons cause strong inhibition of other output neurons; in particular, no other output neuron can spike within a sufficiently long period $Ωlog1δ+logk(n-k)TR$.

In addition, our results give a set of testable hypotheses on neural recordings and human and animal behaviors in decision making (detailed discussion is found in section 6).

## 2  Computational Model: Spiking Neuron Networks

In this section, we provide a general description of our computation model; there is much freedom in choosing the detailed specification of the model. We consider such a general model so that our derived lower bound applies to WTA circuits with, for example, many alternative network architectures, activation functions, and winner declaration strategies (i.e., the desired behaviors of the output neurons). In section 5, we provide a circuit construction (for solving the $k$–WTA competition) under this computation model but with specific choices for the adopted network architecture, activation function, and winner declaration strategy.

### 2.1  Network Structure

A spiking neuron network (SNN) $N=U,E$ consists of a collection of neurons $U$ that are connected through synapses $E$ (see Figure 1). We assume that an SNN can be conceptually partitioned into three nonoverlapping layers: input layer $Nin$, hidden layer $Nh$, and output layer $Nout$. The neurons in each of these layers are referred to as input neurons, hidden neurons, and output neurons, respectively. The synapses $E$ are essentially directed edges: $E:=(ν,ν'):ν,ν'∈U$. For each $ν∈U$, define $PREν:=ν':(ν',ν)∈E$ and $POSTν:=ν':(ν,ν')∈E$. Intuitively, $PREν$ is the collection of neurons that can directly influence neuron $ν$; similarly, $POSTν$ is the collection of neurons that can be directly influenced by neuron $ν$.3 We assume that the input neurons cannot be influenced by other neurons in the network: $PREν=⌀$ for all $ν∈Nin$. Each edge $(ν,ν')$ in $E$ has a weight, denoted by $w(ν,ν')$. The strength of the interaction between neuron $ν$ and neuron $ν'$ is captured as $w(ν,ν')$. The sign of $w(ν,ν')$ indicates whether neuron $ν$ excites or inhibits neuron $ν'$: In particular, if neuron $ν$ excites neuron $ν'$, then $w(ν,ν')>0$; if neuron $ν$ inhibits neuron $ν'$, then $w(ν,ν')<0$. The set $E$ might contain self-loops with $w(ν,ν)$ capturing the self-excitatory and self-inhibitory effects. Typically, in neuroscience, a neuron is either excitatory or inhibitory: $sign(w(ν,ν1))=sign(w(ν,ν2))$ for all $ν1,ν2∈POSTν$. Our order-optimal WTA circuit in section 5 indeed assumes this common sign restriction. Nevertheless, our lower bound holds even for the general case where there exist $ν1,ν2∈POSTν$ such that $sign(w(ν,ν1))≠sign(w(ν,ν2))$.

Figure 1:

An SNN consists of three layers: the input layer, the output layer, and the hidden layer. The hidden neurons might connect to both the input neurons and the output neurons to assist the computation of the neuron network. Neurons are connected through synapses. WTA circuits are a family of SNNs in which the number of output neurons equals the number of the input neurons.

Figure 1:

An SNN consists of three layers: the input layer, the output layer, and the hidden layer. The hidden neurons might connect to both the input neurons and the output neurons to assist the computation of the neuron network. Neurons are connected through synapses. WTA circuits are a family of SNNs in which the number of output neurons equals the number of the input neurons.

#### 2.1.1  Generic Network Structure for WTA Circuits

The family of WTA circuits under consideration is rather generic. We only assume that $Nin=Nout=n$ the numbers of the input neurons and of the output neurons are equal. For ease of exposition, denote
$Nin=u1,…,un,andNout=v1,…,vn.$
The hidden neuron subset $Nh$ can be arbitrary. The output neurons and the hidden neurons may be connected to each other in an arbitrary manner.

### 2.2  Network State

In an SNN, the communication among neurons is abstracted as spikes. We assume each neuron $ν$ has two local variables: spiking state variable $S(ν)$ and memory state variable $M(ν)$. Nevertheless, for input neurons, we consider only their spiking states, assuming that their memory states are not influenced by the dynamics of the spiking neuron network under consideration. We slot the time evenly with each time slot of length 1 ms. Let $t=1,2,…$ be the indices of the time slots. Henceforth, by saying time $t$, we mean the time interval $[t-1,t)$ ms. For $t≥1$, let $St(ν)∈0,1$ be the spiking state of neuron $ν$ at time $t$ indicating whether neuron $ν$ spikes at time $t$. For a noninput neuron $ν$ and for $t≥1$, let $Mt(ν)$ be the memory state of neuron $ν$ at time $t$ summarizing the cumulative influence caused by the spikes of the neurons in $PREi$ during the most recent $m$ times: times $t-1,t-2,…,t-m$. Concretely, let $Vt(ν)$ be the charge of (noninput) neuron $ν$ at time $t$ (for $t≥1$) defined as
$Vt(ν):=∑ν'∈PREνw(ν',ν)St(ν').$
Let $Vtν$ be the sequence of length $m$ such that
$Vtν:=Vt(ν),…,Vt-m+1(ν),$
and let $St(ν)$ be the sequence of length $m$ such that
$Stν:=St(ν),…,St-m+1(ν).$
By convention, when $1≤t≤m$, let
$Vtν:=Vt(ν),…,V1(ν),0,…,0$
and
$Stν:=St(ν),…,S1(ν),0,…,0.$
For $t≥2$, define the memory variable $Mt(ν)$ as a pair of vectors $St-1ν$ and $Vt-1ν$:
$Mt(ν):=St-1ν,Vt-1ν.$
By convention, let $M1(ν):=0,0$, where $0$ is the length $m$ zero vector. Notably, as can be seen from our analysis, our lower bound holds provided that $M1(ν)$ does not contain any information about the circuit's dynamics for time $t≤0$; that is, no information on the past $t≤0$ is used in determining the generation of a spike at time $t≥1$.
At time $t+1$, the memory variable $Mt+1(ν)$ is updated by shifting the two sequences forward by one time unit—fetching $St(ν)$ and $Vt(ν)$, respectively, and removing $St-m(ν)$ and $Vt-m(ν)$, respectively. The memory state $Mt(ν)$ is known to neuron $ν$ only, and it can influence the probability of generating a spike at time $t$ through an activation function $φν$:
$St(ν)=φνMt(ν),∀t≥1.$
(2.1)
Notably, $φν$ might be a random function.

In most neurons, the synaptic plasticity time window is about 80 to 120 msec, but it could also vary across brain regions and vary across different timescales under different behavioral contexts. In a sense, the synaptic plasticity time window is closely related to $m$. As can be seen in section 5, our order-optimal WTA circuit construction requires $m$ to be sufficiently high. Nevertheless, this does not exclude the application of our WTA circuit to the contexts where $m$ is small. This is because the memory variable can be implemented by a chain of hidden neurons near neuron $ν$. The detailed implementation of the local memory does not affect the order optimality of our WTA circuit.

## 3  Minimax Decision Accuracy and Success Probability

### 3.1  Random Input Spike Trains

We study the $k$–WTA model, wherein $n$ randomly generated input spike trains are competing with each other, and as a result of this competition, $k$ out of them are selected to be the winners. In contrast, most existing works (Verzi et al., 2018; Maass, 1997; Lynch et al., 2016) assume deterministic input spike trains.

Recall that time is slotted into intervals of length 1 ms. We assume that the $n$ input spike trains are generated from $n$ independent Bernoulli processes with unknown parameters $p1,…,pn$, respectively. We refer to $p=p1,…,pn$ as a rate assignment of the WTA competition for a given external stimulus. For example, suppose an external stimulus induces two input spike trains with rates 0.6 and 0.8, respectively: $n=2$ and $p=0.6,0.8$. In each time, with probability 0.6, the first input spike train has a spike independently of whether the second input spike train has a spike, and the same is true for the second input spike train. Notably, different external stimuli induce different rate assignment vectors $p$'s. Henceforth, we use the terms rate assignment and external stimulus interchangeably.

Note that in the most general scenario, the spikes of the input neurons might be correlated (see section 6 for detailed comments). We would like to explore the more general input spikes in our future work.

### 3.2  Minimax Performance Metric

We adopt the minimax framework (Wu, 2017) of a WTA circuit.

Let $R⊆[c,C]$ be an arbitrary but finite set of rates where $c$ and $C$ are two absolute constants such that $0. A rate assignment $p$ (i.e., an external stimulus) is chosen by nature from $Rn$ for which there exists a subset of $[n]:=1,…,n$, denoted by $W(p)$, such that
$W(p)=k,andpi>pj∀i∈W(p),j∉W(p).$
(3.1)
Recall that $·$ is the cardinality of a set. That is, $W(p)$ is the set of true winners that should be selected when the external stimulus that induces $p$ is presented. We refer to set $W(p)$ as the true winners with respect to the rate assignment $p$. For example, suppose $n=5$, $k=2$, and
$p=p1=0.2,p2=0.1,p3=0.2,p4=0.8,p5=0.85.$
Here the true winners are 4 and 5, that is, $W(p)=4,5$. In this letter, we consider the following collection of rate assignments, denoted by $AR⊂Rn$:
$AR:={p:∃W(p)⊆[n]s.t.W(p)=k,andpi>pj∀i∈W(p),j∉W(p)}.$
(3.2)
Intuitively, $AR$ corresponds to the collection of external stimuli considered. For ease of reference, we refer to an element in $AR$ as an admissible rate assignment. Recall that the input of a WTA circuit is a collection of $n$ independent spike trains. For a given rate assignment $p$, let $St(ui)t=1T$ denote the spike train of length $T$ at input neuron $ui$. The circuit designer wants to design a WTA circuit that outputs a good guess or estimate $win^$ of $W(p)$ for any choice of rate assignment $p$ in $AR$. Note that conditioning on
$S:=St(u1)t=1T,…,St(un)t=1T,$
the estimate $win^$ is independent of $p$. Here $S$ is used with a little abuse of notation as this notation hides its connection with $T$ and the rate parameter $p$.4 Later, we use the same notation to denote the $n$ spike trains with random rate assignment (i.e., where $p$ is randomly generated). Nevertheless, this abuse of notation significantly simplifies the exposition without sacrificing clarity.
Under minimax framework, we are interested in the minimax error probability
$minwin^maxp∈ARPwin^S≠W(p).$
(3.3)
For a given deterministic WTA circuit $win^$ (i.e., the activation functions used are deterministic), the probability in $Pwin^S≠W(p)$ is taken with regard to the randomness in the stochastic spikes of each input neuron. For a randomized WTA circuit $win^$ (i.e., the activation functions are stochastic), in addition to the aforementioned source of randomness, the probability in $Pwin^S≠W(p)$ is also taken with regard to the randomness in the activation functions. In equation 3.3, the performance metric of a WTA circuit is the worst-case error probability:
$maxp∈ARPwin^S≠W(p).$

## 4  Information-Theoretic Lower Bound on Decision Time

In this section, we provide a lower bound on the decision time for a given decision accuracy. The lower bounds derived in this section hold universally for all possible network structures (including the hidden layer), synapse weights, activation functions, and winner declaration strategies.

One observation is that the decision time is naturally lower-bounded by the sample complexity, which is closely related to the Kullback-Leibler (KL) divergence between two Bernoulli distributions.5 The KL divergence between Bernoulli random variables with parameters $r$ and $r'$, respectively, is defined as
$d(r∥r'):=rlogrr'+(1-r)log1-r1-r',$
(4.1)
where, by convention, $0log00:=0$. Notably, $d(·∥·)$ is not symmetric in $r$ and $r'$. In addition, if $r∈(0,1)$ and $r'∈{0,1}$, then $d(r∥r')=∞$. Recall that set $R$ is an arbitrary but finite set contained in the interval $[c,C]$, where $c,C∈(0,1)$ are two constants. It holds that $d(r∥r')<∞$ for all $r,r'∈R$. For the more general distributions over a common discrete alphabet $A$, say, distributions $P$ and $Q$, the Kullback-Leibler (KL) divergence between $P$ and $Q$ is defined as follows:
Definition 1
(KL Divergence). Let $A$ be a discrete alphabet (finite or countably infinite), and $P$ and $Q$ be two distributions over $A$. Then define
$D(P∥Q):=∑a∈AP(a)logP(a)Q(a),$
where $0·log00=0$ by convention.

Note that $D(P∥Q)≥0$ and $D(P∥Q)=0$ if and only if $P=Q$ except for measure 0. Similar to $d(·∥·)$, $D(P∥Q)$ is not symmetric in $P$ and $Q$. In this letter, we choose the base to be 2.6 Recall that the set of admissible rate assignments $AR$ is defined in equation 3.2.

Lemma 1.
Fix a finite set $R$ of rates. Let $p=p1,…,pn$ and $q=q1,…,qn$ be two rate assignments in $AR$. Let $PS$ and $QS$ be the distributions of the $n$ spike sequences of the input neurons under rate assignments $p$ and $q$, respectively. Then,
$D(PS∥QS)=T∑i=1nd(pi∥qi).$

The two different rate assignments $p$ and $q$ correspond to two different external stimuli, and $D(PS∥QS)$ is the “distance” between the $n$ input spike trains of length $T$ induced by the first external stimulus and those induced by the second external stimulus. Lemma 9 is proved in appendix B.

For a given $R$, define task complexity $TR$ as
$TR:=maxr1,r2∈Rs.t.r1≠r21d(r2∥r1)+d(r1∥r2).$
(4.2)
It is closely related to the smallest KL divergence between two distinct rates in $R$. The task complexity $TR$ kicks in due to the adoption of the minimax decision framework, equation 3.3. It turns out that if the input spike train length $T$ is not sufficiently large (specified in theorem 3), no matter how elegant the design of a WTA circuit is (no matter which activation function we choose, how many hidden neurons we use, and how we connect the hidden neurons and output neurons), its minimax decision accuracy is always lower than the target decision accuracy $(1-δ)$.
Theorem 1.
For any $1≤k≤n-1$ and any set $R$ and any $δ∈(0,1)$, if
$T≤(1-δ)log(k(n-k)+1)-1TR,$
then
$minwin^maxp∈ARPwin^S≠W(p)≥δ,$
where the min is taken over all possible WTA circuits with different choices of activation functions and circuit architectures.
Theorem 3 says that if $T<(1-δ)log(k(n-k)+1)-1TR$, the worst-case probability error of any WTA circuit is greater than $δ$:
$maxp∈ARPwin^S≠W(p)>δ.$
Theorem 11 is proved in appendix C.
Remark 1

(Tightness of the Lower Bound in Theorem 3). The proof of theorem 3 uses a technical supporting lemma (lemma 13, presented in appendix C). Following our line of argument, by considering a richer family of critical rate assignments in lemma 13, we might be able to obtain a tighter lower bound. Nevertheless, the constructed WTA circuit in section 5 turns out to be order-optimal; its decision time matches the lower bound in theorem 3 up to a multiplicative constant factor. This immediately implies that the lower bound obtained in theorem 3 is tight up to a multiplicative constant factor.

## 5  Order-Optimal WTA Circuits

In section 2, we provided a general description of the computation model we are interested in. In this section, we construct a specific WTA circuit whose decision time is order-optimal among the WTA circuits under the general computation model. To do that, we need to specify (1) the network structure, including the number of hidden neurons, the collection of synapses (directed communication links) between neurons, and the weights of these synapses; (2) the memorization capability of each neuron, that is, the magnitude of $m$; and (3) $φν$, the activation function used by neuron $ν$. In the constructed circuit, we declare the first k output neurons that spike simultaneously as winners.

### 5.1  Circuit Design

In our designed circuit, there are four parameters, $R$, $m$, $b$, and $δ$, where $R⊆[c,C]$7 is a finite set of rates from which the $pi$'s of the input spike trains are chosen, $m$ is the memory range, $b$ is the bias at the noninput neurons, and $(1-δ)$ is the target decision accuracy (i.e., success probability). Here, we assume that every noninput neuron has the same bias: $bν=b$ for all non-input neurons $ν$. The four parameters $R$, $m$, $b$, and $δ$ can be viewed as some prior knowledge of the WTA circuit; they might be learned through some unknown network development procedure outside the scope of this work. In sections 5.1.1, 5.1.3, and 5.1.4, we present the network structure and the activation functions adopted, and the requirement on $m$. For completeness, we specify the local memory update (in particular, the vector $V$) separately in section 5.1.2. The dynamics of our WTA circuit is summarized in section 5.1.5.

#### 5.1.1  Network Structure

We propose a WTA circuit with the following network structure:

• All output neurons are connected to each other by a complete graph. That is, $(vi,vj)∈E$ for all $vi,vj∈Nout$ such that $vi≠vj$.

• Each edge from an input neuron to an output neuron has weight 1: $w(ui,vi)=1$ for all $ui∈Nin,vi∈Nout$.

• All edges among the output neurons have weights $-1k$; that is, $w(vi,vj)=-1k$ for all $vi,vj∈Nout$ such that $vi≠vj$.

• There are no hidden neurons: $Nh=∅$.

#### 5.1.2  Update Local Charge Vector

With the above choice of network structure, the charge $Vt-1(vi)$ at the output neuron $vi$ at time $t-1$ is
$Vt-1(vi)=St-1(ui)-1k∑j:1≤j≤n,&j≠iSt-1(vj).$
(5.1)
When $k=1$, the above update becomes
$Vt-1(vi)=St-1(ui)-∑j:1≤j≤n,&j≠iSt-1(vj),$
which can be viewed as a spike model counterpart of the potential update under the traditional continuous rate model (Kriener et al., 2017; Mao & Massaquoi, 2007) with lateral inhibition.

It is easy to see the following claims hold. For brevity, their proofs are omitted:

• Claim 5. For $t≥1$ and for $i=1,…,n$, $Vt-1(vi)>0$ if and only if $St-1(ui)=1$ and $∑j:1≤j≤n,&j≠iSt-1(vj)≤k-1$; at time $t-1$, input neuron $ui$ spikes, and fewer than $k-1$ other output neurons spike.

• Claim 6. For $t≥1$ and for $i=1,…,n$, $Vt-1(vi)≤-1$ only if $∑j:1≤j≤n,&j≠iSt-1(vj)≥k$, that is, at time $t-1$, more than $k$ other output neurons spike. Note that $∑j:1≤j≤n,&j≠iSt-1(vj)≥k$ is not a sufficient condition to have $Vt-1(vi)≤-1$. To see this, suppose $∑j:1≤j≤n,&j≠iSt-1(vj)=k$ and $St-1(ui)=1$. In this case it holds that $Vt-1(vi)=0$.

• Claim 7. For $t≥1$ and for $i=1,…,n$, if $Vt-1(vi)=0$, one of the following holds: (1) $St-1(ui)=1$ and $∑j:1≤j≤n,&j≠iSt-1(vj)=k$, that is, at time $t-1$, input neuron $ui$ spikes, and exactly $k$ other output neurons spike, and (2) $St-1(ui)=0$ and $∑j:1≤j≤n,&j≠iSt-1(vj)=0$, that is, at time $t-1$, input neuron $ui$ does not spike, and no other output neurons spike.

#### 5.1.3  Activation Functions

There are many different choices of activation functions; see (Activation function, n.d.) for a detailed list. In our construction, we use a simple threshold activation function,
$St(vi)=1,if(b-1)1St-1(vi)=1+∑r=1m1Vt-r(vi)>0-m∑r=1m1Vt-r(vi)≤-1+≥b;0,otherwise,$
where $·+=max·,0$, and $b≥1$ is the bias at the output neuron $vi$ for $i=1,…,n$. It is easy to see that this activation function falls under the general form given by equation 2.1.
Remark 2.
If the output neuron $vi$ does not spike at time $t-1$ (i.e., $St-1(vi)=0$), then in order for $vi$ to spike at time $t$, the following needs to hold:
$∑r=1m1Vt-r(vi)>0-m∑r=1m1Vt-r(vi)≤-1+≥b.$
In contrast, if the output neuron $vi$ does spike at time $t-1$ (i.e., $St-1(vi)=1$), then
$∑r=1m1Vt-r(vi)>0-m∑r=1m1Vt-r(vi)≤-1+≥1$
is enough for $vi$ to spike at time $t$. That is, under our activation rule, $St-1(vi)=1$ makes the activation of $vi$ much easier in the next round. However, if there exists $r∈1,2,…,m$ such that
$1Vt-r(vi)≤-1=1,$
then
$∑r=1m1Vt-r(vi)>0-m∑r=1m1Vt-r(vi)≤-1≤∑r=1m1Vt-r(vi)>0-m≤m-m=0.$
Thus,
$(b-1)1St-1(vi)=1+∑r=1m1Vt-r(vi)>0-m∑r=1m1Vt-r(vi)≤-1+=(b-1)1St-1(vi)=1+0≤b-1
that is, the output neuron $vi$ does not spike at time $t$. In other words, provided that there exists $r∈1,2,…,m$ such that $1Vt-r(vi)≤-1=1$, the activation of $vi$ is inhibited at time $t$.

#### 5.1.4  Local Memorization Capability

In our proposed circuit, we require that $m$ satisfies
$m≥8C2(1-c)c2(1-C)log3δ+logk(n-k)TR:=m*$
(5.2)
for target decision accuracy $1-δ∈(0,1)$. In addition, we set $b=cm*$. Recall that $c,C∈(0,1)$ are two absolute constants that are lower and upper bounds, respectively, of any $R$.

Intuitively, when other parameters are fixed, the higher the desired accuracy (i.e., the smaller $δ$), the larger the required minimum memory $m*$, that is, the more memory is needed for selecting the winners in our WTA circuit. Similarly, the easier it is to distinguish two independent spike trains with different rates (i.e., the lower $TR$), the smaller $m*$ is. Interestingly, with other parameters fixed, $m*$ depends on $k$ as follows: $m*$ is increasing in $k$ when $k∈1,…,⌊n2⌋$, and $m*$ is decreasing in $k$ when $k∈⌈n2⌉,…,n-1$. In many practical settings, we care about the region where $k≪n$. Besides, with the choice of bias $b=cm*$, the larger $m*$ also implies longer time is needed for our WTA circuit to declare $k$ winners (details can be found in statement 1 in theorem 6).

On the other hand, in most neurons the synaptic plasticity time window is about 80 to 120 ms, and it is unclear whether equation 5.2 can be immediately satisfied. Fortunately, even if equation 5.2 is not immediately satisfied by a neuron due to its local bioplausibility, it is possible that its local memory might be realized via some population codes such as a chain of hidden neurons.

#### 5.1.5  Algorithm 1

The dynamics of our WTA circuit is summarized in algorithm 1, which is fully determined by what has been described in sections 5.1.1 to 5.1.4. For algorithm 1, we declare the first $k$ output neurons that spike simultaneously as winners.

### 5.2  Circuit Performance

Recall that $W(p)$ and $m*$ are defined in equations 3.1 and 5.2, respectively.

Theorem 2.

Fix $δ∈(0,1]$ and $1≤k≤n-1$. Choose $m≥m*$ and $b=maxcm*,2$. Then for any admissible rate assignment $p$, with probability at least $1-δ$, the following statements hold:

• There exist $k$ output neurons that spike simultaneously by time $m*$.

• The first set of such $k$ output neurons are the true winners $W(p)$.

• From the first time in which these $k$ output neurons spike simultaneously, these $k$ output neurons spike consecutively for at least $b$ times, and no other output neurons can spike within $b$ times.

The proof of theorem 6 can be found in appendix D. The first statement in that theorem implies that our WTA circuit can provide an output (a selection of $k$ output neurons) by time $m*$; the second statement says that the circuit's output indeed corresponds to the $k$ true winners; and the third says that the $k$ simultaneous spikes of the selected winners are stable—the $k$ selected winners continue to spike consecutively for at least $b$ times. The proof of theorem 6 essentially says that with high probability, under algorithm 1, the number of output neurons that spike simultaneously are monotonically increasing until they reach $k$. Upon the simultaneous spike of $k$ output neurons, by our threshold activation rule, we know that the other output neurons are likely to be inhibited. In particular, if these $k$ output neurons are the first $k$ output neurons that spike simultaneously, then the activation of the other output neurons is likely to be inhibited for at least $b$ times.

Remark 3
(Controlling Stability). As can be seen from the proof of theorem 6, in the activation function of algorithm 1,
$(b-1)1St-1(vi)=1+∑r=1m1Vt-r(vi)>0-m∑r=1m1Vt-r(vi)≤-1+≥b,$
the first term $(b-1)1St-r(vi)=1$ is crucial in achieving statement 3 in theorem 6. In fact, we can increase the stability period by introducing a stability parameter $s$ such that $1 and modifying the activation rule. Details can be found in algorithm 2. It is easy to see that the activation function falls under the general form in statement 3. In the new activation function in algorithm 2, for output neuron $vi$, once it spikes, it continues to spike for at least $s$ times. Following our line of analysis in the proof of theorem 6, it can be seen that the declared $k$ winners, from the first time they spike simultaneously, continue to spike consecutively for at least $s$ times.
Remark 4

(Order-Optimality). The decision time performance in statement 1 of theorem 6 matches the information-theoretical lower bound in theorem 3 up to a multiplicative constant factor both when $δ$ is sufficiently small and does not depend on $n$, $k$, $TR$, $c$, and $C$, and when $δ$ decays to zero at a speed at most $1(k(n-k))c0$ where $c0>0$ is some fixed constant. The detailed order-optimality argument is given next.

Suppose that $δ$ is sufficiently small and does not depend on $n$, $k$, $TR$, $c$, and $C$. Here, for ease of exposition, we illustrate the order-optimality with a specific choice of $δ$. In fact, the order-optimality holds generally for constant $δ∈(0,1)$ provided that it does not depend on $n$, $k$, $TR$, $c$, and $C$.

Suppose the target decision accuracy is $1-δ=0.9$ (i.e., $δ=0.1$). Then provided that $n≥31$, for any $1≤k≤n-1$,
$m*=8C2(1-c)c2(1-C)log30.1+logk(n-k)TR≤16C2(1-c)c2(1-C)logk(n-k)TR.$
On the other hand, recall from theorem 3 that to have $δ=0.1$, the decision time is no less than
$(1-δ)log(k(n-k)+1)-1TR≥12log(k(n-k)+1)TR≥12logk(n-k)TR,$
where the first inequality holds provided that $n≥8$. Thus, when $n≥31$, in order to achieve the decision accuracy $1-δ=0.9$, the decision time of our WTA circuit is on the same order of the information-theoretic lower bound in theorem 3.
Suppose $δ$ decays to zero at a moderate speed. The decision time of our WTA circuit is order-optimal even for diminishing decision error $δ$ provided that $δ=Ω(3(k(n-k))c0)$ where $c0>0$ – it does not decay to zero “too fast” in $k(n-k)$. To see this, let $δ=3k(n-k)c0$ for some constant $c0>0$. We have
$8C2(1-c)c2(1-C)log33(k(n-k))c0+logk(n-k)TR=8C2(1-c)(c0+1)c2(1-C)logk(n-k)TR.$
(5.3)

Resetting the circuit when the input spike trains become quiescent. In algorithm 1, if the input spike trains become quiescent, then the corresponding circuits also become quiescent despite some delay in this response.

Lemma 2.

If all input neurons are quiescent at time $t0$ and remain quiescent for all $t≥t0$, then $Vt(vi)=0$ and $St(vi)=0$ for any $t>t0+m$.

Lemma 9 is proved in appendix E.

## 6  Discussion

In this letter, we investigated how $k$-WTA computation is robustly achieved in the presence of inherent noise in the input spike trains. In a spike-based $k$-WTA model, $n$ randomly generated input spike trains are competing with each other, and the neurons with the top $k$ highest underlying firing rates are the true winners. Given the stochastic nature of the spike trains, it is not trivial to properly select winners among a group of neurons. We derived an information-theoretic lower bound on the decision time for a given decision accuracy. Notably, this lower bound holds universally for any WTA circuit that falls within our model framework, regardless of their circuit architectures or their adopted activation functions. Furthermore, we constructed a circuit whose decision time matches this lower bound up to a constant multiplicative factor, suggesting that our derived lower bound is order-optimal. Here the order-optimality is stated in terms of its scaling in $n$, $k$, and $TR$. In addition, our results also give a set of testable hypotheses on neural recordings and human and animal behaviors in decision making.

### 6.1  Comparison to Previous WTA Models

Randomness is introduced at different stages of brain computation, and the stochastic nature of the spike trains is well observed (Baddeley et al., 1997; Kara et al., 2000; Maimon & Assad, 2009; Shamir, 2009, 2006; Hertz et al., 1991; Ferrari et al., 2018). In our work, we focused on how to robustly achieve $k$-WTA computation in face of the intrinsic randomness in the spike trains. A common WTA model assumes that neurons transmit information by a continuous variable such as the firing rate (Dayan & Abbott, 2001; Hertz et al., 1991), which often ignores the intrinsic randomness in spiking trains. Although some studies used additive gaussian noise (Kriener et al., 2017; Li et al., 2013; Lee et al., 1999; Rougier & Vitay, 2006) in their rate-based WTA circuits to account for input randomness, these circuits are usually very sensitive to noise and could not successfully select even a single winner unless additional nonlinearity is added (Kriener et al., 2017). In fact, a neuron with a second nonlinearity is similar to an output neuron in our constructed WTA circuit in that both integrate their local inputs. Unfortunately, only simulation results were provided in Kriener et al. (2017); a theoretical justification of why such second nonlinearity makes their WTA circuit robust to input noise is lacking. Random response of rate-based WTA is also considered in Shamir (2006) with a focus on characterizing the scaling of WTA accuracy with the population size for a two-interval, two-alternative forced choice (2I2AFC) discrimination task.

Though we focused on a spike-based model, we hope our results can provide some insight into the rate-based model as well. On top of that, a rate-based model would require a high communication bandwidth, yet communication bandwidth is limited in the brain. Our spiking neural network model captures this feature by having a low communication cost, since it broadcasts 1 bit only. However, we did not try to model every biologically relevant feature. In several studies using spiking network models, individual units are often modeled with details like ion channels and specific synaptic connectivity. Though more biologically relevant than our spiking neuron network model, those details significantly complicate the analysis. In fact, it could be challenging and intricate to move beyond computer simulation to characterize the model dynamics (e.g., the spiking nature of each unit, the time it takes to stabilize), analytically.

Spike-based WTA is also considered in the insightful work (Shamir, 2009) under a statistical model for a two-alternative forced-choice (2AFC) discrimination task. In particular, Shamir (2009) undertook an elegant study on the accuracy of his WTA mechanism focusing on the effects of population size, noise correlations, and baseline firing. Compared to Shamir (2009), our model is more restrictive in the sense that we do not consider the effects of population size, noise correlations, and baseline firing, yet it is more general in the sense that we consider $n≥2$ alternatives. In addition, we take a slightly different but closely related angle; instead of focusing on characterizing the accuracy with regard to a particular WTA circuit, we provide a general lower bound that provides insight into the fundamental limits of a WTA circuit on the waiting time in deciding among independent Bernoulli input spike trains. Nevertheless, all of the features studied in Shamir (2009)—population size, noise correlations, and baseline firing—are interesting, and we would like to try to extend our results to incorporate these features in our future work.

### 6.2  Potential Applications for Physiological Experiments

Our work might further provide hypotheses on inferring the changes of the network sizes, of the similarities between input spike trains, and of the synaptic memory capacities base on the changes of the performance accuracy. For example, in behavioral experiments using electrolytic lesions or pharmacological inhibition (Clark, Manes, Antoun, Sahakian, & Robbins, 2003; Hanks, Ditterich, & Shadlen, 2006; Yttri, Liu, & Snyder, 2013; Katz, Yates, Pillow, & Huk, 2016), the changes in performance are often highly variable and nonlinear. Such variability and nonlinearity might arise from the experimental difficulties in precisely manipulating network size and disentangling sensory perception and motor planning from a core decision-making (winner-selecting) process. With an analytical characterization, one might be able to estimate changes in the network size given its performance changes. Several pioneering works studied the impact of the network size on accuracy (Seung & Sompolinsky, 1993; Shamir, 2009, 2006). While these works characterized this trade-off based on investigating specific WTA circuits, our work provides a complementary viewpoint by characterizing a lower bound on a large family of WTA circuits.

Besides the effect of network size, the distribution of feature representations (i.e., different set $R$s of different individual animals) could be used to account for between-subject variability in decision making. Consider a random-dot coherent motion task where animals need to decide in which of two directions the majority of dots are moving (Shadlen & Newsome, 2001). In this task, performance accuracy and reaction time vary across animals. If we perform neural recordings in their visual cortex (i.e., to record their $R$s), we might be able to decode their reaction time or accuracy, given population representations of dot motion in these cortical neurons (Shadlen & Newsome, 1996; Jazayeri & Movshon, 2006). For example, an animal whose stimulus-evoked responses are more heterogeneous in the visual cortex might be able to react faster given the same accuracy, governed by our derived lower bound.

Our work also offers predictions on how local memory capacity could affect performance in decision making. For example, when there is more ambiguity in input representations, to achieve the same accuracy, a larger minimum time window for memory storage in synapses (Knoblauch, Palm, & Sommer, 2010) is required. From previous experimental work (Bittner, Milstein, Grienberger, Romani, & Magee, 2017), we know that synaptic plasticity has timescale ranging from milliseconds to seconds across different brain regions, and such plasticity could efficiently store entire behavioral sequences within synaptic weights. Combined with our analytical characterization, when performance accuracy changes over time, assuming other parameters such as input rates, decision time, and network size are fixed, one might be able to predict how synaptic plasticity changes.

### 6.3  Limitations and Extensions

When $δ$ is a constant, our lower bound is order-optimal in terms of its scaling in $n$, $k$, and $TR$. Nevertheless, the scaling of the derived lower bound in terms of $δ$ is not tight. It would be interesting to know the optimal scaling in $δ$ when other parameters ($n$, $k$, and $TR$) are fixed. We leave it as one future direction.

To simplify complexity, our model poses a few assumptions that ignored some features in the brain (Shamir, 2009). One of these assumptions is that each input neuron is independent. However, various degrees of average noise correlations between cortical neurons have been reported. For example, average noise correlations in primary visual cortex could be close to 0.1 (Schölvinck, Saleem, Benucci, Harris, & Carandini, 2015), 0.18 (Smith & Kohn, 2008), or even much larger, as 0.35 (Gutnisky & Dragoi, 2008). Similarly, noise correlations have been observed in other sensory brain regions (Cohen & Kohn, 2011). In our work, we ignore correlations between these neurons, but it would be interesting as a future direction to extend in our spiking network model. Unfortunately, the impact of the noise correlation on the lower bound is unclear at first glance. One of the challenges in answering such question's is, in general, that the details of correlations might matter—especially when there is more than one true winner, and it is unclear whether general statements such as “correlations always hurt” or “correlations always help” can be concluded in the end. Specifically, on the one hand, the insightful work of Shamir (2009) showed that, similar to the effect of noise correlation that has been observed in population coding theory, noise correlations in their proposed temporal winner-take-all (tWTA) limits and harms the accuracy of the tWTA readout. In fact, in population coding theory, it is commonly reported that noise correlation harms decoding accuracy (Eyherabide & Samengo, 2013). On the other hand, correlations in the variabilities of neuronal firing rates do not, in general, limit the increase in coding accuracy; in some cases, correlations improve the accuracy of a population code (Abbott & Dayan, 1999; Averbeck, Latham, & Pouget, 2006). Additionally, for the problem of $k$-WTA where $k≥2$, it could be possible that the noise correlation is neither purely positively corrected nor purely negatively corrected. In particular, it could be possible that one true winner is positively correlated with other true winners and is negatively correlated with nonwinners, and another true winner is negatively correlated with other true winners and is positively correlated with nonwinners. Thus, extra care is needed when one is trying to make claims on the impact of noise correlation on a WTA circuit.

Second, our model uses a threshold activation function by assuming the synaptic transmission is basically noise free and that the only noise source comes from the input in this letter. However, synaptic transmission is highly unreliable in biological networks (Allen & Stevens, 1994; Faisal et al., 2008; Borst, 2010), and a deterministic activation function would fail to capture this feature compared to a stochastic activation function. Nevertheless, our lower bound in theorem 3 holds even if the activation functions are random. This is because the probability in $P{win^(S)≠W(p)}$ incorporates the possible randomness in the activation functions, and our lower-bound characterization is independent of the activation functions used.

Another assumption in our circuit is that the output neurons can inhibit each other. In common scenarios, an output neuron is usually excitatory and does not inhibit other neurons directly without recruiting inhibitory cells. We incorporate stability in these output neurons by assuming they can inhibit each other in our circuit implementation. For a model where an output neuron is limited to be excitatory only, we can add a chain of inhibitory neurons to achieve stability WTA computation.

Additionally, for our lower bound to hold, we need that the initial memory of each neuron, $M1(ν)$, contains no information about the system's state in the past $t≤0$. That is, except for the input spike trains, no side information (especially the one on previous network dynamics) is available at a WTA circuit, and nothing happens before the start of WTA competition to affect the WTA dynamics. We impose this assumption on $M1(ν)$ in order to derive an information-theoretic lower bound on the observation time. On the other hand, spontaneous firings before the presence of an external stimulus might affect the initial states of neurons' local memory. For those scenarios, our results are applicable provided that the spontaneous firings are very sparse or even negligible. Nevertheless, it would be interesting to relax this assumption and study how the spontaneous firings of the neurons in the past (i.e., $t≤0$) could affect $M1(ν)$ in general.

Finally, in our $k$-WTA circuit, the number of output neurons that spike simultaneously increase monotonically until there are exactly $k$ output neurons that spike simultaneously. We acknowledge that this might not be biologically plausible in most cases in the brain, especially considering the possibility of spontaneous firings. From large-scale neural recordings, we know that the number of neurons that spike simultaneously is usually variable, so this could be a future direction to construct a circuit that better matches experimental observations.

## Appendix A:  Preliminaries

In this section, we present some preliminaries on information measures and Fano's inequality. Interested readers are referred to Polyanskiy and Wu (2014) for comprehensive background.

### A.1  Information Measures

Let $X$ and $Y$ be two random variables. The mutual information between $X$ and $Y$, denoted by $I(X;Y)$, measures the dependence between $X$ and $Y$, or, the information about $X$ (resp. $T$) provided by $Y$ (resp. $X$).

Definition 2
(Mutual Information). Let $X$ and $Y$ be two random variables,
$I(X;Y):=D(PXY∥PXPY),D(P∥Q):=∑a∈AP(a)logP(a)Q(a),$
where $PXY$ denotes the joint distribution of $X$ and $Y$, and $PXPY$ denotes the product of the marginal distributions of $X$ and $Y$.

In the following, we use the notation $X→Y$ to denote that $Y$ is a (possibly random) function of $X$. Thus, $W→X→Y→W^$ means that $X$ is a (possibly random) function of $W$; $Y$ is a (possibly random) function of $X$; and $W^$ is a (possibly random) function of $Y$. Fano's inequality:

Theorem 3
(Polyanskiy & Wu, 2014, Corollary 5.1). Let $T:Θ→[M]$, and let $θ→X→Y→T^(θ)$ be an arbitrary Markov chain. Suppose both $θ$ and $T(θ)$ are uniformly distributed over a set of size $M$. Then
$Pe:=PT(θ)≠T^(θ)≥1-I(X;Y)+1logM.$
Theorem 4

(Chernoff Bound). Let $X1,…,Xn$ be $i.i.d.$ with $Xi∈0,1$ and $PX1=1=p$. Set $X=∑i=1nXi$. Then

• For any $t∈[0,1-p]$, we have $PX≥p+tn≤exp-nd(p+t∥p)$.

• For any $t∈[0,p]$, we have $PX≤p-tn≤exp-nd(p-t∥p)$.

## Appendix B:  Proof of Lemma 2

Lemma 2 follows easily from the independence between input spike trains and the assumption that the spikes in each input spike train are i.i.d. For completeness, we present the proof as follows.

Proof of Lemma 2.
Recall that
$S:=St(u1)t=1T,…,St(un)t=1T.$
Denote $s=s1,…,sn$ as one realization of $S$, wherein each component $si$ is a binary sequence of length $T$:
$si=b1i,…,bTi∈0,1T.$
For each $i=1,…,n$, let $PS(St(ui)t=1T)$ and $QS(St(ui)t=1T)$ be the marginal distributions of the $i$th length $T$ input spike train $St(ui)t=1T$ under joint distributions $PS$ and $QS$, respectively. Similarly, $PS(St(ui))$ and $QS(St(ui))$ are the corresponding two marginal distributions of $St(ui)$—the spiking state of input neuron $ui$ at time $t$. Thus, we have
$DPS(St(ui)t=1T)∥QS(St(ui)t=1T)=(a)∑b1i,…,bTiPS(St(ui)t=1T=b1i,…,bTi)logPS(St(ui)t=1T=b1i,…,bTi)QS(St(ui)t=1T=b1i,…,bTi),=(b)∑b1i,…,bTi∏t'=0T-1PS(St'(ui)=bt'i)log∏t=1TPS(St(ui)=bti)∏t=1TQS(St(ui)=bti)=∑t=1T∑b1i,…,bTi∏t'=0T-1PS(St'(ui)=bt'i)logPS(St(ui)=bti)QS(St(ui)=bti)=∑t=1T∑b1i,…,bTi∏t'=0&t'≠tT-1PS(St'(ui)=bt'i)PS(St(ui)=bt)logPS(St(ui)=bti)QS(St(ui)=bti),=(c)∑t=1T∑btiPS(St(ui)=bti)logPS(St(ui)=bti)QS(St(ui)=bti)=∑t=1Tpilogpiqi+(1-pi)log1-pi1-qi=∑t=1Td(pi∥qi)=T·d(pi∥qi),$
where $∑b1i,…,bTi$ is the summation over all binary sequences of length $T$. In the last displayed equation, equality a follows from the definition of KL divergence; equality b is true because of the independence of spikes; equality c follows from the fact that for any fixed $bti$,
$∑b1i,…,bTi∖{t}∏t'=1&t'≠tTPS(St'(ui)=bt'i)=1,$
where we use $∑b1i,…,bTi∖{t}$ to denote the summation over all binary sequences of length $T$ with the $t$th entry fixed.
Similarly, we get
$D(PS∥QS)=∑s=s1,…,snPS(S=s)logPS(S=s)QS(S=s)=∑i=1nDPS(St(ui)t=1T)∥QS(St(ui)t=1T)=∑i=1nTd(pi∥qi)=T∑i=1nd(pi∥qi),$
proving the lemma.$□$

## Appendix C:  Proof of Theorem 3

The following lemma is used in the proof of our information-theoretic lower bound. This is a technical supporting lemma, and the choice of the specific rate assignments is due to some technical convenience in proving theorem 3. See appendix A for definition of $I(·;·)$.

Lemma 3.
For any finite set $R$, let $r1,r2∈R$ such that $r1≠r2$. Let $p0=p10,…,pn0$ be
$pℓ0=r1,ifℓ=1,…,k;r2,otherwise.$
(C.1)
For $i=1,…,k$ and $j=k+1,…,n$, define rate assignment $pij$ as
$pℓij=pℓ0,ifℓ≠i,≠j;pj0,ifℓ=i;pi0,ifℓ=j.$
Let $Xp$ be a random rate assignment. If $Xp$ is uniformly distributed over
${p0}∪pij:i=1,…,k,&j=k+1,…,n,$
then the mutual information $I(Xp;S)$ satisfies the following:
$I(Xp;S)≤Td(r2∥r1)+d(r1∥r2).$
Proof.
Since mutual information can be viewed as distance to product distributions, by Polyanskiy and Wu (2014, theorem 3.4), we have
$I(Xp;S)=minQXpQSDPXp,S∥QXpQS,$
where $PXp,S$ is the joint distribution of $Xp$ and $S$, and $QXp$ and $QS$ are any distributions of $Xp$ and $S$, respectively.
For any fixed $QS$, it holds that
$minQXpDPXp,S∥QXpQS=minQXpDPS∣XpPXp∥QXpQS≤DPS∣XpPXp∥PXpQS,$
where the equality follows from conditioning, and the inequality is true because the best choice over all $QXp$ cannot be worse than any specific choice of $QXp$. Here $S∣Xp$ denotes the $n$ input spike trains conditioning on the choice of rate assignment.
For any fixed $QS$, we have
$DPS∣XpPXp∥PXpQS=PXp(Xp=p0)∑sPS∣Xp=p0(S=s)logPS∣Xp=p0(S=s)PXp(Xp=p0)QS(S=s)PXp(Xp=p0)+∑i=1k∑j=k+1nPXp(Xp=pij)∑sPS∣Xp=pij(S=s)×logPS∣Xp=pij(S=s)PXp(Xp=pij)QS(S=s)PXp(Xp=pij)=1k(n-k)+1∑sPS∣Xp=p0(S=s)logPS∣Xp=p0(S=s)QS(S=s)+1k(n-k)+1∑i=1k∑j=k+1n∑sPS∣Xp=pij(S=s)logPS∣Xp=pij(S=s)QS(S=s)=1k(n-k)+1DPS∣Xp=p0∥QS+1k(n-k)+1∑i=1k∑j=k+1nDPS∣Xp=pij∥QS,$
where $∑s$ is summation over all possible $n$ binary sequences of length $T$. Here $PS∣Xp=p0$ is the distribution of $S$ with the rate assignment $p0$, and $PS∣Xp=pij$ is the distribution of $S$ with the rate assignment $pij$. Choosing $QS$ to be the distribution of $S$ with rate assignment $p0$ defined in equation C.1, then for any $i=1,…,k$ and $j=k+1,…,n$, we have
$DPS∣Xpij∥QS=T(d(r2∥r1)+d(r1∥r2)).$
Therefore,
$IXp∥S≤1k(n-k)+1∑i=1k∑j=k+1nT(d(r2∥r1)+d(r1∥r2))≤T(d(r2∥r1)+d(r1∥r2)).$
$□$
Proof of Theorem 3.

We prove this via a genie-aided argument (Jacobs & Berlekamp, 1967) by assuming that there is a genie that can access the firing sequences of all the $n$ input neurons. By assuming the existence of a genie, we are essentially considering the centralized setting. Clearly, if the error probability is high even in the centralized setting, then no SNNs (which are distributed algorithms) can achieve lower error probability.

Suppose that $T≤(1-δ)log(k(n-k)+1)-1TR$. By equation 4.2, there exists $r1,r2$ such that $r1≠r2$ and
$T≤(1-δ)log(k(n-k)+1)-11d(r2∥r1)+d(r1∥r2).$
Without loss of generality, assume that $r1>r2$.
Consider the $k(n-k)+1$ possible rate assignments defined in lemma 13. Let $P$ be the set of such rate assignments. By Yao's minimax principle, we know the minimax probability of error is always lower-bounded by Bayes' probability of error with any prior distribution:
$maxp∈ARkPwin^S≠W(p)≥EXp∼Unif(P)Pwin^S≠W(Xp),$
where $Xp∼Unif(P)$ is uniformly distributed over set $P$. In addition, by Fano's inequality (see theorem 11), we have
$EXp∼Unif(P)Pwin^S≠W(Xp)≥1-I(Xp;S)+1log(k(n-k)+1).$
(C.2)
Applying lemma 13, we get
$maxp∈ARkPwin^S≠W(p)≥1-I(Xp;S)+1log(k(n-k)+1)≥1-Td(r2∥r1)+d(r1∥r2)+1log(k(n-k)+1)≥δ.$
The last inequality holds as $T≤(1-δ)log(k(n-k)+1)-1TR$.$□$

## Appendix D:  Proof of Theorem 6

The proof of theorem 6 uses the following technical fact and lemma.

Fact 1.
For any given $p∈(0,1)$ and $b>0$, let $fp,b:R→R$, defined as: for all $t>0$,
$fp,b(t):=exp-tdbt∥p.$
Function $fp,b(·)$ is increasing when $t∈0,bp$ and decreasing when $t≥bp$.$□$

This fact follows immediately from a simple algebra:

Lemma 4.
Assume $u,v∈[c,C]⊆(0,1)$. Then for any $α∈(0,1)$,
$d(1-α)u+αv∥u≥α2c(1-C)2C(1-c)du∥v+dv∥u.$
Proof.
Note that for any fixed $q∈[c,C]$, $dx∥q$ is a function of $x$, where $x∈[c,C]$. In addition, by simple algebra, we have
$d'x∥q=log(1-q)xq(1-x),andd''x∥q=1x(1-x).$
(D.1)
By Taylor expansion, we have
$d(1-α)u+αv∥u=du∥u+(1-α)u+αv-ud'u∥u+(1-α)u+αv-u22d''ξ∥u,$
where $ξ∈min{u,(1-α)u+αv},max{u,(1-α)u+αv}$. By equation D.1,
$d(1-α)u+αv∥u=0+0+1ξ(1-ξ)α2(u-v)22≥α2(u-v)22C(1-c).$
Since $du∥v+du∥v$ is symmetric in $u$ and $v$, without loss of generality, assume that $u≥v$. We have
$du∥v+du∥v=(u-v)logu(1-v)v(1-u)=(u-v)log1+u-vv(1-u)≤(u-v)u-vv(1-u)=(u-v)2v(1-u)≤(u-v)2c(1-C)≤2C(1-c)c(1-C)α2d(1-α)u+αv∥u,$
proving the lemma.$□$

Now we are ready to prove theorem 6.

Proof of Theorem 6.
Without loss of generality, assume that
$p1≥⋯≥pk>pk+1≥⋯≥pn.$
For a given rate assignment $p∈AR$, define $τ1,τ2,…,τn$ as
$τi:=inftt:∑r=1min{t,m*}Sr(ui)≥b,∀i=1,…,n.$
Notably, in the above definition, $t:∑r=1min{t,m*}Sr(ui)≥b$ could be empty. In that case, we define $τi:=∞$ by convention. To show theorem 6, it is enough to show that with probability $1-δ$,
$τi<τj∀i=1,…,k,andj=k+1,…,n;$
(D.2)
$andτi≤m*∀i=1,…,k.$
(D.3)
Before diving into proving that equations D.2 and D.3 hold with high probability, we check the sufficiency of those two equations. Let $t0:=max1≤i≤kτi$. Let $E$ be the event on which equations D.2 and D.3 hold. Clearly, conditioning on event $E$, we have
$max1≤i≤kτi∣E=t0∣E≤m*-1≤m-1,$
where the last inequality follows from the assumption in theorem 6, and
$max1≤i≤kτi∣E=t0∣E<τj∣E∀j=k+1,…,n.$
Notably, for any $t≤t0≤m-1$ and for $i=1,…,n$,
$∑r=1t1Vr(vi)>0-m∑r=1t1Vr(vi)≤-1+≤∑r=1t1Vr(vi)>0≤∑r=1tSr(ui),$
recalling that $Vr(vi)$ is defined in equation 5.1. Thus, conditioning on $E$, at most $k-1$ output neurons ever spike by time $t0$. So we have (1) $1Vt(vi)≤-1=0$, and (2) $1Vt(vi)>0=St(ui)$, for all $i=1,…,n$ and for all $t≤t0$. In addition, we have for all $t≤t0$,
$(b-1)1St(vi)=1+∑r=1t1Vr(vi)>0-m∑r=1t1Vr(vi)≤-1+=(b-1)1St(vi)=1+∑r=1t1Vr(vi)>0=(b-1)1St(vi)=1+∑r=1tSr(ui).$
By the activation rules in algorithm 1, we know, conditioning on $E$, at time $t0+1≤m*$, output neurons $v1,…,vk$ spike simultaneously, and output neurons $vk+1,…,vn$ do not spike, proving statement 1 in theorem 6. By the choice of $t0$, we know that on $E$, $t0+1$ is the first time that $k$ output neurons spike simultaneously, and no other $k$ output neurons ever spike simultaneously, proving statement 2 in theorem 6.
By a simple induction argument, it can be shown that conditioning on $E$, in each time slot $t$ such that $t0+1≤t≤m+1$, output neurons $v1,…,vk$ spike, and no other output neurons (i.e., output neurons $vk+1,…,vn$ do not spike). Consider the case when $t=(m+1)+1$. As among output neurons, only $v1,…,vk$ spike, and no other output neurons spike for any $t'≤m+1$, it follows that
$m∑r=1m1Vt-r(vi)≤-1=0,∀v1,…,vk.$
Thus, for these $k$ output neurons,
$(b-1)1St-1(vi)=1+∑r=1m1Vt-r(vi)>0-m∑r=1m1Vt-r(vi)≤-1+=(b-1)+∑r=1m1Vt-r(vi)>0=(b-1)+∑r=1m1Vt-1-r(vi)>0+1Vt-1(vi)>0-1Vt-1-m(vi)>0≥b-2+∑r=1m1Vt-1-r(vi)>0=b-2+∑r=1mSr(ui)≥2b-2≥b,$
where the last inequality holds provided that $b≥2$. For output neurons $vk+1,…,vn$, we have
$(b-1)1St-1(vi)=1+∑r=1m1Vt-r(vi)>0-m∑r=1m1Vt-r(vi)≤-1+≤∑r=1m1Vt-r(vi)>0=(a)∑r=1m1Vt-1-r(vi)>0+1Vt-1(vi)>0-1Vt-1-m(vi)>0=∑r=1m1Vt-1-r(vi)>0-1Vt-1-m(vi)>0≤∑r=1m1Vt-1-r(vi)>0=∑r=1m1Vr(vi)>0
Equality a follows because at time $t-1$, output neurons $v1,…,vk$ spike, resulting in $1Vt-1-r(vi)>0=0$ for $i≠1,…,k$. Thus, we know conditioning on event $E$, at time $(m+1)+1$, the output neurons $v1,…,vk$ spike, and no other output neurons spike. It can be shown by a simple induction that at each time $t$ such that $t0+1≤t≤m+b$, the output neurons $v1,…,vk$ spike, and no other output neurons spike. This proves statement 3 in theorem 6.
Next we prove equations D.2 and D.3. By definition of $τj$, we know that $τj≤m*$ for all $j=1,…,n$. Thus, we only need to show that with probability $1-δ$,
$τi<τj∀i=1,…,k,andj=k+1,…,n,$

which is the focus of the remainder of our proof.

Note that
$Pτi<τj,∀i∈{1,…,k},∀j∈{k+1,…,n}=Pτi<τj,&τi
(D.4)
For each term in the summation of equation D.4, we have
$Pτi≥τj,orτi=m*=Pτi=m*+Pτi≥τj,&τi
(D.5)
which follows from the fact that $PA∪B=PA+PB-A$ for any sets $A$ and $B$. Note that $m*pi≥b$. By Chernoff bound (see theorem 12), the first term in equation D.5 is bounded as
$Pτi=m*=P∑r=0m*Sr(ui)≤b≤exp-m*·dbm*∥pi.$
(D.6)
For the second term in equation D.5, we have
$Pτi≥τjandτi
where $t*∈bpk+1,bpk$. Thus, equation D.5 is upper-bounded as
$Pτi≥τj,orτi=m*≤exp-m*·dbm*∥pk+1+exp-t*·dbt*∥pk+1+exp-t*·dbt*∥pk≤exp-t*·dbt*∥pk+1+2exp-t*·dbt*∥pk.$
Equation D.4 is bounded as
$Pτi<τj,∀i∈{1,…,k},∀j∈{k+1,…,n}≥1-∑i=1k∑j=k+1nPτi≥τj,orτi=m*≥1-∑i=1k∑j=k+1nexp-t*·dbt*∥pk+1+2exp-t*·dbt*∥pk=1-k(n-k)exp-t*·dbt*∥pk+1+2exp-t*·dbt*∥pk.$
Letting $t*=b(pk+pk+1)/2$, it holds that
$exp-t*·dbt*∥pk+1=exp-b(pk+pk+1)/2·dpk+pk+12∥pk+1,2exp-t*·dbt*∥pk=2exp-b(pk+pk+1)/2·dpk+pk+12∥pk.$
By lemma 14, we know
$dpk+pk+12∥pk+1≥c(1-C)8C(1-c)d(pk+1∥pk)+d(pk∥pk+1),$
and
$dpk+pk+12∥pk≥c(1-C)8C(1-c)d(pk+1∥pk)+d(pk∥pk+1).$
Thus, we get
$Pτi<τj,∀i∈{1,…,k},∀j∈{k+1,…,n}≥1-3k(n-k)exp-2bpk+pk+1c(1-C)8C(1-c)d(pk∥pk+1)+d(pk+1∥pk).$
Since $b=8C2(1-c)c(1-C)log3δ+logk(n-k)TR$, we have
$3k(n-k)exp-2bpk+pk+1c(1-C)8C(1-c)d(pk∥pk+1)+d(pk+1∥pk)≤δ.$
Thus, $Pτi<τj,∀i∈{1,…,k},∀j∈{k+1,…,n}≤1-δ$.
$t*=2bpk+pk+1≤1cb=m*≤m,$
completing the proof of Theorem 6.$□$

## Appendix E:  Proof of Lemma 9

By the activation rules in algorithm 1, we know that
$St0+m=1,if(b-1)1St0+m-1(vi)=1+∑r=1m1Vt0+m-r>0-m1Vt0+m-r≤-1>b;0,otherwise.$
As all input neurons are quiescent at time $t0$ and remain to be quiescent for all $t≥t0$, it follows that
$(b-1)1St0+m-1(vi)=1+∑r=1m1Vt0+m-r>0-m1Vt0+m-r≤-1=(b-1)1St0+m-1(vi)=1-m∑r=1m1Vt0+m-r≤-1≤b-1
Thus, $St0+m(vi)=0$ for all $i=1,…,n$. So we have $Vt0+m+1(vi)=0$ for all $i=1,…,n$, which again implies that $St0+m+1(vi)=0$ for all $i=1,…,n$. Therefore, we conclude that $St(vi)=0$ and $Vt(vi)=0$ for all $t>t0+m$.

## Notes

1

We plan to investigate the impact of the heterogeneity in the refractory period on waiting time in our future work.

2

In this letter, the notations $O(·)$ and $Ω(·)$ are used to describe the limiting behavior of a function when the argument tends toward a particular value or infinity. In our case, the waiting time can be viewed as a function of several other parameters such as $δ$, $R$, $n$, and $k$. Formally, for any sequences ${aN}$ and ${bN}$, we say $aN=O(bN)$ if there exists an absolute constant $c>0$ such that $aN≤c×bN$. Similarly, we say $aN=Ω(bN)$ if there exists an absolute constant $c>0$ such that $aN≥c×bN$.

3

In the languages of computational neuroscience, the incoming neighbors and outgoing neighbors are often referred, respectively, as presynaptic units and postsynaptic units.

4

A more rigorous notation should be $S(T,p):=St(u1)t=1T,…,St(un)t=1T$. We use $S$ for $S(T,p)$ for ease of exposition.

5

The Kullback-Leibler (KL) divergence gauges the dissimilarity between two distributions.

6

Note that any base would work. See (Polyanskiy & Wu, 2014, chap. 1.1).

7

Recall that $c,C∈(0,1)$ are two absolute constants; they do not change with other parameters of the WTA circuit such as $n$, $k$, and $δ$.

## Acknowledgments

We thank Christopher Quinn at Purdue University and Zhi-Hong Mao at the University of Pittsburgh for the helpful discussions and references.

## References

Abbott
,
L. F.
, &
Dayan
,
P.
(
1999
).
The effect of correlated variability on the accuracy of a population code
.
Neural Computation
,
11
(
1
),
91
101
.
Activation function
. (
N.d.
).
Accessed August 8, 2018
, at https://en.wikipedia.org/wiki/Activation_function
Allen
,
C.
, &
Stevens
,
C. F.
(
1994
).
An evaluation of causes for unreliability of synaptic transmission
.
Proceedings of the National Academy of Sciences
,
91
(
22
),
10380
10383
.
Averbeck
,
B. B.
,
Latham
,
P. E.
, &
Pouget
,
A.
(
2006
).
Neural correlations, population coding and computation
.
Nature Reviews Neuroscience
,
7
(
5
),
358
.
,
R.
,
Abbott
,
L. F.
,
Booth
,
M. C.
,
Sengpiel
,
F.
,
Freeman
,
T.
,
Wakeman
,
E. A.
, &
Rolls
,
E. T.
(
1997
).
Responses of neurons in primary and inferior temporal visual cortices to natural scenes
.
Proceedings of the Royal Society of London B: Biological Sciences
,
264
(
1389
),
1775
1783
.
Berry II
,
M. J.
, &
Meister
,
M.
(
1998
).
Refractoriness and neural precision
. In
M. J.
Kearns
,
S. A.
Solla
, &
D. A.
Cohn
(Eds.),
Advances in neural information processing systems
,
11
(pp.
110
116
).
Cambridge, MA
:
MIT Press
.
Bittner
,
K. C.
,
Milstein
,
A. D.
,
Grienberger
,
C.
,
Romani
,
S.
, &
Magee
,
J. C.
(
2017
).
Behavioral time scale synaptic plasticity underlies CA1 place fields
.
Science
,
357
(
6355
),
1033
1036
.
Borst
,
J. G. G.
(
2010
).
The low synaptic release probability in vivo
.
Trends in Neurosciences
,
33
(
6
),
259
266
.
Buzsáki
,
G.
, &
Chrobak
,
J. J.
(
1995
).
Temporal structure in spatially organized neuronal ensembles: A role for interneuronal networks
.
Current Opinion in Neurobiology
,
5
(
4
),
504
510
.
Clark
,
L.
,
Manes
,
F.
,
Antoun
,
N.
,
Sahakian
,
B. J.
, &
Robbins
,
T. W.
(
2003
).
The contributions of lesion laterality and lesion volume to decision-making impairment following frontal lobe damage
.
Neuropsychologia
,
41
(
11
),
1474
1483
.
Cohen
,
M. R.
, &
Kohn
,
A.
(
2011
).
Measuring and interpreting neuronal correlations
.
Nature Neuroscience
,
14
(
7
), 811.
Dayan
,
P.
, &
Abbott
,
L. F.
(
2001
).
Theoretical neuroscience: Computational and mathematical modeling of neural systems
.
Cambridge, MA
:
MIT Press
.
Ernst
,
M. O.
, &
Banks
,
M. S.
(
2002
).
Humans integrate visual and haptic information in a statistically optimal fashion
.
Nature
,
415
(
6870
),
429
.
Eyherabide
,
H. G.
, &
Samengo
,
I.
(
2013
).
When and why noise correlations are important in neural decoding
.
Journal of Neuroscience
,
33
(
45
),
17921
17936
.
Faisal
,
A. A.
,
Selen
,
L. P.
, &
Wolpert
,
D. M.
(
2008
).
Noise in the nervous system
.
Nature Reviews Neuroscience
,
9
(
4
),
292
.
Ferrari
,
U.
,
Deny
,
S.
,
Marre
,
O.
, &
Mora
,
T.
(
2018
).
A simple model for low variability in neural spike trains.
Neural Computation
,
30
(
11
),
3009
3036
. doi:10.1162/neco_a_01125.
Gerstner
,
W.
,
Kempter
,
R.
,
van Hemmen
,
J. L.
, &
Wagner
,
H.
(
1996
).
A neuronal learning rule for sub-millisecond temporal coding
.
Nature
,
383
(
6595
),
76
.
Gutnisky
,
D. A.
, &
Dragoi
,
V.
(
2008
).
Adaptive coding of visual information in neural populations
.
Nature
,
452
(
7184
),
220
.
Hahn
,
T. T.
,
Sakmann
,
B.
, &
Mehta
,
M. R.
(
2006
).
Phase-locking of hippocampal interneurons' membrane potential to neocortical up-down states
.
Nature Neuroscience
,
9
(
11
),
1359
.
Hanks
,
T. D.
,
Ditterich
,
J.
, &
,
M. N.
(
2006
).
Microstimulation of macaque area lip affects decision-making in a motion discrimination task
.
Nature Neuroscience
,
9
(
5
),
682
.
Harris
,
C. M.
, &
Wolpert
,
D. M.
(
1998
).
Signal-dependent noise determines motor planning
.
Nature
,
394
(
6695
),
780
.
Hertz
,
J.
,
Krogh
,
A.
,
Palmer
,
R. G.
, &
Horner
,
H.
(
1991
).
Introduction to the theory of neural computation
.
Physics Today
,
44
,
70
.
Hromádka
,
T.
,
DeWeese
,
M. R.
, &
,
A. M.
(
2008
).
Sparse representation of sounds in the unanesthetized auditory cortex
.
PLoS Biology
,
6
(
1
),
e16
.
Hubel
,
D. H.
, &
Wiesel
,
T. N.
(
1959
).
Receptive fields of single neurones in the cat's striate cortex
.
Journal of Physiology
,
148
(
3
),
574
591
.
Itti
,
L.
,
Koch
,
C.
, &
Niebur
,
E.
(
1998
).
A model of saliency-based visual attention for rapid scene analysis
.
IEEE Transactions on Pattern Analysis and Machine Intelligence
,
20
(
11
),
1254
1259
.
Jacobs
,
I.
, &
Berlekamp
,
E.
(
1967
).
A lower bound to the distribution of computation for sequential decoding
.
IEEE Transactions on Information Theory
,
13
(
2
),
167
174
.
Jazayeri
,
M.
, &
Movshon
,
J. A.
(
2006
).
Optimal representation of sensory information by neural populations
.
Nature Neuroscience
,
9
(
5
),
690
.
Kara
,
P.
,
Reinagel
,
P.
, &
Reid
,
R. C.
(
2000
).
Low response variability in simultaneously recorded retinal, thalamic, and cortical neurons
.
Neuron
,
27
(
3
),
635
646
.
Karlsson
,
M. P.
, &
Frank
,
L. M.
(
2008
).
Network dynamics underlying the formation of sparse, informative representations in the hippocampus
.
Journal of Neuroscience
,
28
(
52
),
14271
14281
.
Katz
,
L. N.
,
Yates
,
J. L.
,
Pillow
,
J. W.
, &
Huk
,
A. C.
(
2016
).
Dissociated functional significance of decision-related activity in the primate dorsal stream
.
Nature
,
535
(
7611
),
285
.
Kinoshita
,
M.
, &
Komatsu
,
H.
(
2001
).
Neural representation of the luminance and brightness of a uniform surface in the macaque primary visual cortex
.
Journal of Neurophysiology
,
86
(
5
),
2559
2570
.
Knill
,
D. C.
, &
Pouget
,
A.
(
2004
).
The Bayesian brain: The role of uncertainty in neural coding and computation
.
Trends in Neurosciences
,
27
(
12
),
712
719
.
Knoblauch
,
A.
,
Palm
,
G.
, &
Sommer
,
F. T.
(
2010
).
Memory capacities for synaptic and structural plasticity
.
Neural Computation
,
22
(
2
),
289
341
.
Körding
,
K. P.
, &
Wolpert
,
D. M.
(
2004
).
Bayesian integration in sensorimotor learning
.
Nature
,
427
(
6971
),
244
.
Kourtzi
,
Z.
,
Tolias
,
A. S.
,
Altmann
,
C. F.
,
Augath
,
M.
, &
Logothetis
,
N. K.
(
2003
).
Integration of local features into global shapes: Monkey and human fmri studies
.
Neuron
,
37
(
2
),
333
346
.
Kriener
,
B.
,
Chaudhuri
,
R.
, &
Fiete
,
I.
(
2017
).
How fast is neural winner-take-all when deciding between many options?
bioRxiv:231753
.
Lee
,
D. K.
,
Itti
,
L.
,
Koch
,
C.
, &
Braun
,
J.
(
1999
).
Attention activates winner-take-all competition among visual filters
.
Nature Neuroscience
,
2
(
4
),
375
.
Li
,
N.
,
Chen
,
T.-W.
,
Guo
,
Z. V.
,
Gerfen
,
C. R.
, &
Svoboda
,
K.
(
2015
).
A motor cortex circuit for motor planning and movement
.
Nature
,
519
(
7541
),
51
.
Li
,
S.
,
Li
,
Y.
, &
Wang
,
Z.
(
2013
).
A class of finite-time dual neural networks for solving quadratic programming problems and its k-winners-take-all application
.
Neural Networks
,
39
,
27
39
.
Lynch
,
N.
,
Musco
,
C.
, &
Parter
,
M.
(
2016
).
Computational tradeoffs in biological neural networks: Self-stabilizing winner-take-all networks
.
arXiv:1610.02084
.
Maass
,
W.
(
1997
).
Networks of spiking neurons: The third generation of neural network models
.
Neural Networks
,
10
(
9
),
1659
1671
.
Maass
,
W.
(
2000
).
On the computational power of winner-take-all
.
Neural Computation
,
12
(
11
),
2519
2535
.
Maimon
,
G.
, &
,
J. A.
(
2009
).
Beyond Poisson: Increased spike-time regularity across primate parietal cortex
.
Neuron
,
62
(
3
),
426
440
.
Majaj
,
N. J.
,
Carandini
,
M.
, &
Movshon
,
J. A.
(
2007
).
Motion integration by neurons in macaque MT is local, not global
.
Journal of Neuroscience
,
27
(
2
),
366
370
.
Mao
,
Z.-H.
, &
Massaquoi
,
S. G.
(
2007
).
Dynamics of winner-take-all competition in recurrent neural networks with lateral inhibition
.
IEEE Transactions on Neural Networks
,
18
(
1
),
55
69
.
Nelken
,
I.
(
2004
).
Processing of complex stimuli and natural scenes in the auditory cortex
.
Current Opinion in Neurobiology
,
14
(
4
),
474
480
.
Nicholls
,
J. G.
,
Martin
,
A. R.
,
Wallace
,
B. G.
, &
Fuchs
,
P. A.
(
2001
).
From neuron to brain.
Sunderland, MA
:
Sinauer
.
Olshausen
,
B. A.
, &
Field
,
D. J.
(
2004
).
Sparse coding of sensory inputs
.
Current Opinion in Neurobiology
,
14
(
4
),
481
487
.
Perez-Orive
,
J.
,
Mazor
,
O.
,
Turner
,
G. C.
,
Cassenaer
,
S.
,
Wilson
,
R. I.
, &
Laurent
,
G.
(
2002
).
Oscillations and sparsening of odor representations in the mushroom body
.
Science
,
297
(
5580
),
359
365
.
Platt
,
M. L.
, &
Glimcher
,
P. W.
(
1999
).
Neural correlates of decision variables in parietal cortex
.
Nature
,
400
(
6741
),
233
.
Polyanskiy
,
Y.
, &
Wu
,
Y.
(
2014
).
Lecture notes on information theory
.
Lecture Notes for ECE563 (UIUC) and 6
,
2012
2016
.
Quiroga
,
R. Q.
,
Kreiman
,
G.
,
Koch
,
C.
, &
Fried
,
I.
(
2008
).
Sparse but not “grandmother-cell” coding in the medial temporal lobe
.
Trends in Cognitive Sciences
,
12
(
3
),
87
91
.
Redgrave
,
P.
,
Prescott
,
T. J.
, &
Gurney
,
K.
(
1999
).
The basal ganglia: A vertebrate solution to the selection problem?
Neuroscience
,
89
(
4
),
1009
1023
.
Riesenhuber
,
M.
, &
Poggio
,
T.
(
1999
).
Hierarchical models of object recognition in cortex
.
Nature Neuroscience
,
2
(
11
),
1019
.
Rougier
,
N. P.
, &
Vitay
,
J.
(
2006
).
Emergence of attention within a neural population
.
Neural Networks
,
19
(
5
),
573
581
.
Schölvinck
,
M. L.
,
Saleem
,
A. B.
,
Benucci
,
A.
,
Harris
,
K. D.
, &
Carandini
,
M.
(
2015
).
Cortical state determines global variability and correlations in visual cortex
.
Journal of Neuroscience
,
35
(
1
),
170
178
.
Seung
,
H. S.
, &
Sompolinsky
,
H.
(
1993
).
Simple models for reading neuronal population codes
.
Proceedings of the National Academy of Sciences
,
90
(
22
),
10749
10753
. doi:10.1073/pnas.90.22.10749
,
M. N.
, &
Newsome
,
W. T.
(
1996
).
Motion perception: Seeing and deciding
.
Proceedings of the National Academy of Sciences
,
93
(
2
),
628
633
.
,
M. N.
, &
Newsome
,
W. T.
(
2001
).
Neural basis of a perceptual decision in the parietal cortex (area LIP) of the rhesus monkey
.
Journal of Neurophysiology
,
86
(
4
),
1916
1936
.
Shamir
,
M.
(
2006
).
The scaling of winner-takes-all accuracy with population size
.
Neural Computation
,
18
(
11
),
2719
2729
. doi:10.1162/neco.2006.18.11.2719.
Shamir
,
M.
(
2009
).
.
PLOS Computational Biology
,
5
(
2
),
1
13
. doi:10.1371/journal.pcbi.1000286.
Siapas
,
A. G.
,
Lubenov
,
E. V.
, &
Wilson
,
M. A.
(
2005
).
Prefrontal phase locking to hippocampal theta oscillations
.
Neuron
,
46
(
1
),
141
151
.
Smith
,
M. A.
, &
Kohn
,
A.
(
2008
).
Spatial and temporal scales of neuronal correlation in primary visual cortex
.
Journal of Neuroscience
,
28
(
48
),
12591
12603
.
Stocker
,
A. A.
, &
Simoncelli
,
E. P.
(
2006
).
Noise characteristics and prior expectations in human visual speed perception
.
Nature Neuroscience
,
9
(
4
),
578
.
Teleńczuk
,
B.
,
Kempter
,
R.
,
Curio
,
G.
, &
Destexhe
,
A.
(
2017
).
Refractoriness accounts for variable spike burst responses in somatosensory cortex
.
eNeuro
,
4
(
4
).
Verzi
,
S. J.
,
Rothganger
,
F.
,
Parekh
,
O. D.
,
Quach
,
T.-T.
,
Miner
,
N. E.
,
Vineyard
,
C. M.
,
Aimone
,
J. B.
(
2018
).
Computing with spikes: The advantage of fine-grained timing
.
Neural Computation
,
30
(
10
),
2660
2690
.
Wu
,
Y.
(
2017
).
Lecture notes on information-theoretic methods for high-dimensional statistics
.
Lecture Notes for ECE598YW (UIUC)
.
Yttri
,
E. A.
,
Liu
,
Y.
, &
Snyder
,
L. H.
(
2013
).
Lesions of cortical area LIP affect reach onset only when the reach is accompanied by a saccade, revealing an active eye-hand coordination circuit
.
Proceedings of the National Academy of Sciences
,
110
(
6
),
2371
2376
.
Yuille
,
A.
, &
Geiger
,
D.
(
1998
).
The handbook of brain theory and neural networks.
Cambridge, MA
:
MIT Press
.