## Abstract

Information transfer through a single neuron is a fundamental component of information processing in the brain, and computing the information channel capacity is important to understand this information processing. The problem is difficult since the capacity depends on coding, characteristics of the communication channel, and optimization over input distributions, among other issues. In this letter, we consider two models. The temporal coding model of a neuron as a communication channel assumes the output is ** τ** where

**is a gamma-distributed random variable corresponding to the interspike interval, that is, the time it takes for the neuron to fire once. The rate coding model is similar; the output is the actual rate of firing over a fixed period of time. Theoretical studies prove that the distribution of inputs, which achieves channel capacity, is a discrete distribution with finite mass points for temporal and rate coding under a reasonable assumption. This allows us to compute numerically the capacity of a neuron. Numerical results are in a plausible range based on biological evidence to date.**

*τ*## 1. Introduction

It is widely believed that neurons send information to other neurons in the form of spike trains. Although precise timings of spikes are important for information transfer, it appears that spike patterns are not deterministic but noisy (Mainen & Sejnowski, 1995). Information theory shows that when a communication channel is corrupted with noise, the rate at which the information can be transmitted reliably through the channel is limited. The upper bound on the rate is known as the channel capacity (Shannon, 1948) (in the rest of the letter, it is referred to simply as “capacity”). When a single neuron is considered as a channel, the capacity is one of the fundamental problems in neuroscience.

The problem has been studied theoretically (MacKay & McCulloch, 1952; Rapoport & Horvath, 1960; Stein, 1967) and biologically (Borst & Theunissen, 1999). Computing capacity is difficult since it depends on multiple factors—type of coding, characteristics of the channel, and input distributions. The type of coding has long been a subject of discussion (MacKay & McCulloch, 1952; Baker & Lemon, 2000; Rullen & Thorpe, 2001). Mainly two types of coding, temporal and rate coding, have been considered. Temporal coding uses interspike intervals (ISIs) to code information, and rate coding uses the number of spikes in a fixed interval. This letter examines both of them.

The channel model is deeply related to the noise of ISIs. Baker and Lemon (2000) reported that the statistical properties of ISIs recorded from primary motor cortex and supplementary motor area (SMA) of monkeys are similar to the gamma distribution. Shinomoto, Shima, and Tanji (2003) and Shinomoto, Miyazaki, Tamura, and Fujita (2005) studied spike trains from multiple areas and proposed a statistical index that describes the randomness of ISIs.^{1} The index is deeply related to the gamma distribution (Shinomoto et al., 2003; Ikeda, 2005). In this letter, ISIs are modeled with a gamma distribution. The model is different from the channel model in MacKay and McCulloch (1952), where spikes are assumed to be aligned within a fixed time precision.

The capacity is defined as the supremum of mutual information over possible input distributions. In this letter, a natural assumption is posed, that is, the average firing rate of a single neuron is restricted in an interval. Under this assumption, we consider all possible input distributions and prove that the capacity of each coding is achieved by a discrete distribution that has only finite mass points. The proof of the discreteness of capacity-achieving distributions for each coding shares the steps with other studies of information theory (Smith, 1971; Shamai (Shitz), 1990; Abou-Faycal, Trott, & Shamai (Shitz), 2001; Gursoy, Poor, & Verdú, 2002, 2005). These studies have shown the discreteness for some channels with appropriate assumptions on the input distributions. Our result shows that the information is maximally transmitted through a single neuron when the inputs to the neuron have only a fixed number of “modes.” This is important for biological experiments, since if the input distribution is discrete, the experimentalists have to consider only discrete and finite modes of inputs or stimuli. After the proof, the capacity and the capacity-achieving distribution for each coding are computed. Unfortunately we have not obtained any analytical solution, and they are computed numerically. The results show that the capacity is around 15 to 50 bits per sec, the same order with the values reported in Borst and Theunissen (1999).

## 2. Single Neuron Channel

### 2.1. ISIs and Communication Channel.

It has been reported that a gamma distribution is a suitable model to describe the stochastic nature of ISIs (Baker & Lemon, 2000; Shinomoto et al., 2003). The gamma distribution has two parameters: the shape parameter κ and the scale parameter θ. From some studies, κ of individual neuron appears to be constant (the value of κ may depends on the type of neuron), while θ changes dynamically over time.

Figure 1 shows simulated spike trains with two different shape parameter κ's. It is 0.75 in Figure 1A and 4.5 in Figure 1B. When κ is small, spike trains become more irregular. Ikeda (2005) and Miura, Okada, and Amari (2006) studied the estimation methods of κ from spike trains. Estimation of κ is regarded as the semiparametric statistical estimation (Bickel, Klaassen, Ritov, & Wellner, 1993).

In this letter, we focus not on the estimation but on the information processing of a single neuron. Based on the gamma distribution model, the capacity of a neuron is investigated in the following sections.

### 2.2. Communication Channel and Capacity.

*X*be the input to a noisy channel and

*Y*be the output. In the following, we assume is a one-dimensional stochastic variable, and let

*F*(·) be a cumulative distribution function of

*X*. Communication channel is defined as a stochastic model described as

*p*(

*y*∣

*x*), and the mutual information is defined as Here, μ(

*y*) denotes the measure of . Since the channel is defined as

*p*(

*y*∣

*x*),

*I*(

*X*;

*Y*) is a functional of

*F*(·), and we denote it as

*I*(

*F*).

### 2.3. Single Neuron: Channel and Coding.

Let us discuss a neuron model. First, we have to define *X* and *Y* of a neuron communication channel.

The distribution of each ISI is assumed to be independent and to follow a gamma distribution. Let *T* denote an ISI, a stochastic variable following a gamma distribution, that is, *T* ∼ Γ(κ, θ), where κ>0 and θ>0 are the shape and the scale parameter, respectively.

We assume κ of each neuron is fixed and known. Shinomoto et al. (2003) define a statistical index *L _{V}* (local index), to characterize each neuron. For a

*T*∼ Γ(κ, θ), holds. From their investigation with biological data, it seems most of the cells'

*L*are lying in an interval (0.3–1.2), and κ is thus assumed to be in an interval κ ∈ [κ

_{V}_{m}, κ

_{M}] (κ

_{m}and κ

_{m}are set to 0.75 and 4.5, respectively, in section 4).

*X*in section 2.2. The density function of

*t*is where we denote it as

*p*(

*t*∣ θ; κ) to show θ is a stochastic variable and κ is a parameter. The gamma distribution is an exponential family: The sufficient statistics are

*T*and log

*T*. The expectations of them are where ψ(·) is the digamma function defined as ψ(

*x*) = Γ′(

*x*)/Γ(

*x*) for

*x*>0. The conditional entropy becomes Next, let us consider the family of all the possible distributions of input θ. Noting that ISI is positive and is not infinite if the neuron is active, it is natural to assume that the average ISI, which depends on θ and κ, is limited between

*a*

_{0}and

*b*

_{0}(

*a*

_{0}and

*b*

_{0}are set to 5 msec and 50 msec, respectively, in section 4), that is, Thus, θ is bounded in Θ(κ) = {θ ∣

*a*(κ) ⩽ θ ⩽

*b*(κ)}, where

*a*(κ) and

*b*(κ) are defined as In the following,

*a*(κ),

*b*(κ), and Θ(κ) are denoted as

*a*,

*b*, and θ, respectively, as far as no confusion arises. Let us define

*F*(θ) as the cumulative distribution function of θ and as the set of all possible

*F*(θ), that is, Note that every is right-continuous and nondecreasing on θ, and includes continuous and discrete distributions of θ.

^{2}

Next, let us consider *Y*, the output of the channel of a neuron communication channel. There are mainly two different ideas in neuroscience. One idea is that *Y* is ISI, *T*, itself (see MacKay & McCulloch, 1952, for example). This is called temporal coding (see Figure 2). The other is that *Y* is the rate, which is the number of spikes in fixed time intervals (see Stein, 1967). This is called rate coding (see Figure 2). In communication theory, “coding” is often used for “source coding,” “error-control coding,” and “cryptography coding.” It seems that *modulation* is a more suitable term for the above definition. However, we follow the standard usage of the neuroscience community. How to encode (or to modulate) the input θ to the neuron channel depends on which coding is used. For temporal coding, θ is fixed during the interval *t*, while θ is fixed during Δ for the rate coding. We discuss this in section 5.

Mutual information and the capacity also depend on coding. The capacity of each coding is formally defined in the following.

#### 2.3.1. Temporal Coding.

*T*. For , we define the marginal distribution as where

*p*(

*t*∣ θ κ) is defined in equation 2.3. The existence of

*p*(

*t*;

*F*, κ) follows from the existence of

*p*(

*t*∣ θ κ). The mutual information of

*T*and θ is defined as Let us define

*g*(

*t;F*, κ) and rewrite

*p*(

*t*;

*F*, κ) as The mutual information

*I*(

_{T}*F*) is rewritten as where Hence, the capacity per channel use or equivalently per spike is defined as The capacity

*C*and the distribution that achieves

_{T}*C*are studied in the next section.

_{T}#### 2.3.2. Rate Coding.

In rate coding, a time window is set, and the spikes in the interval are counted. Let us denote the interval and the rate as Δ and *R*, respectively, and define the distribution of *R* as *p*(*r* ∣ θ; κ, Δ). The form of the distribution of *R* is shown in the following lemma:

See Appendix A. The same distribution is discussed in Pawlas, Klevanov, and Prokop (2008).

*R*becomes a Poisson distribution: For an , let us define the following marginal distribution

*p*(

*r;F*, κ, Δ): The existence of the integral follows from the existence of

*p*(

*r*∣ θ; κ, Δ). The mutual information of

*R*and θ is defined as Hence, the capacity per channel use or, equivalently, per Δ is defined as The capacity

*C*and the distribution that achieves

_{R}*C*are studied in the next section.

_{R}## 3. Theoretical Studies

The cumulative distribution is a right-continuous nondecreasing function on a interval Θ. Thus, θ can be a discrete or continuous random variable over Θ. In this section, the capacity-achieving distribution of a single neuron channel is proved to be a discrete distribution with finite mass points for both temporal and rate coding.

For some channels, the capacity-achieving distributions have been shown to be discrete under some conditions (Smith, 1971; Shamai (Shitz), 1990; Abou-Faycal et al., 2001; Gursoy et al., 2002; Tchamkerten, 2004). The neuron channel with temporal coding is different from those because it does not have an additive noise and the proof must be provided independently. The rate coding with κ = 1 is equivalent to the Poisson channel, and the discreteness of the capacity-achieving distribution is proved in Shamai (Shitz) (1990). The proof is easily extended to the case where κ is a positive integer. But we need to prove it for positive real κ's. Note that although the proofs of the discreteness in this letter are original, they follow the same steps of those papers.

### 3.1. Steps to Prove the Discreteness of the Capacity-Achieving Distribution.

The common steps of the proof for the discreteness of the capacity-achieving distributions are shown in this section. In the following, the results of optimization theory and probability theory will be used. Suppose *X* is a normed linear space. In optimization theory, the space of all bounded linear functionals of *X* is called the normed dual of *X* and is denoted *X*^{*}. The weak^{*} convergence is defined as follows:

*A sequence {x* _{n}} in X^{*} is said to converge weak^{*} to the element X^{*} if for every x ∈ X, x^{*}_{n}(x) → x^{*}(x). In this case, we write (Luenberger, 1969, 5.10)*.

If *X* is the real normed linear space of all bounded continuous functions on , *X*^{*} includes the set of all probability measures, and it is clear that “weak convergence” of probability measures is “weak^{*} convergence” on *X*^{*}. The results of optimization theory are applied to probability measures with this equivalence. The following theorem is used to prove the existence and the uniqueness of the capacity-achieving distribution:

From the above discussion, in equation 2.4 is a subset of *X*^{*}. It is clear that is convex. Thus, if is weak^{*} compact and *I _{T}*(

*F*) (or

*I*(

_{R}*F*)) is a weak

^{*}continuous function on and strictly concave in , the capacity is achieved by a unique distribution

*F*

_{0}in . This is the first step of the proof. The following proposition states is compact.

* in equation 2.4 is compact in the Lévy metric topology*.

The Kuhn-Tucker (K-T) condition on the mutual information is used for the next step of the proof. Before showing the condition, we define the weak differentiability:

*Let J be a function on a convex set . Let F*

_{0}be a fixed element of and η ∈ [0, 1]. Suppose there exists a map such that*Then J is said to be weakly differentiable in at F*.

_{0}and J_{F0}^{′}(F) is the weak derivative in at F_{0}. If J is weakly differentiable in at F_{0}for all , J is said to be weakly differentiable inThe K-T condition is described as follows:

See proposition 1 in Smith (1971).

If *I _{T}*(

*F*) (or

*I*(

_{R}*F*)) is weakly differentiable, the K-T condition is derived immediately with the theorem. Finally, the discreteness is proved by deriving a contradiction based on the K-T condition and the assumption that

*F*

_{0}has infinite mass points as its support. Thus, in order to show the discreteness of the capacity-achieving distribution for temporal and rate codings, the following properties must be shown:

*I*(_{T}*F*) and*I*(_{R}*F*) are weak^{*}continuous on and strictly concave.*I*(_{T}*F*) and*I*(_{R}*F*) are weakly differentiable.

After these are shown, the K-T condition is derived, and the discreteness and the finiteness will be checked.

### 3.2. Discreteness of the Capacity-Achieving Distribution for Temporal Coding.

In this section, the capacity-achieving distribution for temporal coding is shown to be a discrete distribution with a finite number of points. We start with the following lemma:

*I _{T}(F) in equation 2.6 is a weak^{*} continuous function on and strictly concave in *

Lemma 2 and theorem 1 imply that the capacity for temporal coding is achieved by a unique distribution in In order to show it is a discrete distribution, the following lemma and corollary are used:

*I*

_{T}(F) in equation 2.6 is weakly differentiable in . The weak derivative at has the formSee section B.2.

The main result of this section is summarized in the following theorem:

*Under the constraint θ ∈ Θ, the channel capacity of a single neuron channel with temporal coding is achieved by a discrete distribution with a finite number of mass points*.

*i*(θ;

_{T}*F*

_{0}) to the complex plain

*z*is analytic for Re

*z*> 0, which is defined as If

*E*

_{0}in corollary 1 has infinite points, since Θ is bounded and closed,

*E*

_{0}has a limit point. Hence, from corollary 1, the identity theorem implies

*i*(

_{T}*z*;

*F*

_{0}) =

*I*(

_{T}*F*

_{0}) + κ for the region Re

*z*> 0. This region includes positive real line , and is implied. The left-hand side of equation 3.4 is bound as follows (see section B.1, equation B.4): Since the expectation of

*T*with regard to

*p*(

*t*∣ θ κ) is κ θ, equation 3.5 shows that the left-hand side of equation 3.4 grows linearly with θ. Since the right-hand side increases only with log θ, equation 3.4 cannot hold for all . This is the contradiction, and the optimal distribution has a finite number of mass points.

### 3.3. Discreteness of the Capacity-Achieving Distribution for Rate Coding.

The capacity-achieving distribution for rate coding is shown to be a discrete distribution with a finite number of points. Shamai (Shitz) (1990) proved that the capacity-achieving distribution of a Poisson channel under peak and average power constraints is a discrete distribution with a finite point of supports. Since θ ∈ Θ is a peak constraint, this directly proves the case κ = 1. For κ ≠ 1, further study is needed.

*I _{R}(F) in equation 2.9 is a weak^{*} continuous function on and strictly concave in *.

Lemma 4 and theorem 1 imply that the capacity for rate coding is achieved by a unique distribution in :

*I*

_{R}(F) in equation 2.9 is weakly differentiable in . The weak derivative at has the formThe proof is identical to the proof of lemma 3 in section B.2.

Finally, the following theorem proves that the capacity-achieving distribution is a discrete distribution with a finite number of mass points:

*Under a peak constraint, the channel capacity of a single neuron channel with the rate coding is achieved by a discrete distribution with a finite number of mass points*.

*i*(θ;

_{R}*F*) to the complex plain

*z*is defined as Since

*P*(α,

*z*) and log

*z*is analytic for Re

*z*> 0,

*i*(

_{R}*z; F*

_{0}) is analytic for Re

*z*> 0.

*E*

_{0}in corollary 2 has infinite points, since Θ is bounded and closed,

*E*

_{0}has a limit point and, hence, from equation 3.7, the identity theorem implies

*i*(

_{R}*z; F*

_{0}) =

*I*(

_{R}*F*

_{0}) for the region Re

*z*> 0. This region includes positive real line , and is implied. The proof (see section C.2) is completed by deriving a contradiction for equation 3.8. The contradiction is derived for κ ≥ 1 and κ < 1 separately.

## 4. Numerical Studies

Although the capacity-achieving distribution of each coding has been proved to be discrete with a finite number of mass points, position and probability of each point are not provided. Unfortunately, we do not have an analytic solution. This is also the case for related work (Smith 1971; Shamai (Shitz), 1990; Abou-Faycal et al., 2001; Gursoy et al., 2005). In this section, the capacity and the capacity-achieving distribution are computed numerically for temporal and rate coding.

### 4.1. Common Steps of Numerical Experiments.

Computing the capacity and the capacity-achieving distribution of the neuron channel is difficult since the closed-form expression of *i _{T}*(θ;

*F*) in equation 2.6 and

*i*(θ;

_{R}*F*) in equation 2.9 is not provided for a general discrete

*F*(θ). Instead, we need to evaluate integrals for

*i*(θ;

_{T}*F*) and summations of infinite series for

*i*(θ;

_{R}*F*). For the numerical studies, integrals for

*i*(θ;

_{T}*F*) are evaluated with the Gauss-Laguerre quadrature, and infinite series for

*i*(θ;

_{R}*F*) are truncated to sufficiently long finite series.

The strategy to compute the capacity and the capacity-achieving distributions for temporal and rate coding is as follows. Note that related work uses similar methods (Smith, 1971; Abou-Faycal et al., 2001; Gursoy et al., 2005).

Initialize

*N*, the number of points, as 2.Starting from some initial values, maximize the corresponding mutual information (

*I*(_{T}*F*) or*I*(_{R}*F*)) with respect to {θ_{i}} and {π_{i}} until convergence with a gradient method.When it converges, check the corresponding K-T condition in (equations 3.3 or 3.7) to see if it is the capacity-achieving distribution.

If the K-T condition is satisfied, the capacity and the capacity-achieving distribution are obtained. Otherwise, increase

*N*by 1 and go to step 2.

The range of θ must be specified for the numerical studies. The range of the expected firing rate is defined as from 5 msec to 50 msec and 5/κ ≤ θ ≤ 50/κ. The choice of the range is discussed in section 5.

The capacity and the capacity-achieving distribution for temporal and rate coding are computed for multiple values of κ. As described in section 2.3, a statistical index *L _{V}* has been proposed that characterizes spike trains (Shinomoto et al., 2003). Its expectation is related to κ as . In the following numerical studies, we vary κ from 0.75 to 4.5 (the corresponding is from 0.3 to 1.2) for every 0.05. The range corresponds to the most of the cells'

*L*in Shinomoto et al. (2003, 2005).

_{V}### 4.2. Temporal Coding.

Figure 3A shows the computed capacity for each κ. The capacity *C _{T}* (bit per channel use) increases monotonically as κ increases.

^{3}This is natural since as κ increases, ISIs become more regular, and more information can be transferred. The capacity becomes larger than 1 bit when κ becomes 3.85.

The capacity-achieving distributions are shown in Figures 3C and 3D. For each κ, the distribution has only two or three points. Moreover, two of them are ends of the range Θ(κ) (*a*_{0}/κ and *b*_{0}/κ). If κ is smaller than 2.10, there are only two points. When it is equal to 2.10, the number of points becomes three. The position of the third point is very stable for different κ's. The probability of each point is shown in Figure 3D. The probabilities of both ends tend to be similar, while the probability of the third point increases gradually as κ increases.

*C*, is the maximum information transferred per spike. It is also important to show the information rate. Since the capacity-achieving distribution is computed, the following

_{T}*C*

^{′}

_{T}(bit per sec) is defined: Note that is around 25 msec for all κ in the experiments. The information rate is shown as the function of κ in Figure 3B. Further discussion is provided in section 5.

### 4.3. Rate Coding.

In rate coding, the time window Δ must be defined. Since the average time for sending a symbol with temporal coding is around 25 msec, Δ is set to 25 msec in the numerical experiment.

Figure 4A shows the computed channel capacity for each κ. *C _{R}* increases monotonically as κ increases. The value is larger than

*C*for the same κ. It becomes larger than 1 bit when κ becomes 2.15.

_{T}The capacity-achieving distributions are shown in Figures 4C and 4D. For each κ, the distribution has two to four discrete points, and two of them are ends of the range Θ(κ) (*a*_{0}/κ and *b*_{0}/κ). For κ < 1.25, there are only two points. For 1.25 ≤ κ < 4, there are three points, and it becomes four for κ ≥ 4.0. The probability of each point is shown in Figure 3D. The probabilities of both ends tend to be similar, while the probability of the third point increases gradually as κ increases. When the number of mass points is four, two middle points have similar probability.

In rate coding, the information rate is easily computed. Since Δ is fixed, the rate is computed as *C*^{′}_{R} = *C _{R}*/Δ (bit per sec), which is shown in Figure 4B.

## 5. Discussion and Conclusion

We have proved the channel capacities of a single neuron with temporal and rate coding are achieved with discrete distributions. Numerical studies show that the number of mass points is from two to four depending on coding and κ. The capacity of a single neuron evaluated in this letter is lower than what has been reported in MacKay and McCulloch (1952) and Rapoport and Horvath (1960) (1000 to 4000 bits per sec), and its order is similar to biologically measured capacities of sensory neurons (Borst & Theunissen, 1999). However, this does not mean the capacity can be achieved biologically. The problem has been simplified in our study, and the details should be discussed. Since channel capacity depends on various factors, each factor is discussed separately in the rest of this section.

### 5.1. Encoding: Input Distribution of θ.

First, we discuss the input θ. Since the ISI is positive and is not infinite if the neuron is active, the constraint (α ⩽ θ ⩽ β) seems to be natural. The range of θ has been set to [5 msec, 50 msec] throughout the letter. The firing rate of each neuron depends on its type, and this range may not be plausible for some neurons. Note that for temporal coding, if the “dynamic range” of the firing rate is 10 dB, the capacity per channel use is identical to the result of this letter. The capacity of rate coding depends on the dynamic range and Δ; therefore, the capacity result of this letter may not be appropriate for some neurons.

In the range Θ(κ), the distribution of θ has been assumed to be memoryless, that is, θ can be different for every channel use. Scale parameter θ must be changed every 5 msec at most in temporal coding and 25 msec in rate coding. Biologically speaking, θ corresponds to the input to a neuron, and it cannot be changed quickly since the neuron has capacitance. Thus, the source would have memory. This implies the biologically achievable rate should be smaller than the capacity obtained in the numerical studies.

Another problem is the duration to keep the input θ, especially for temporal coding. When θ is fixed for some duration, the neuron fires according to the gamma distribution; however, the “sender” cannot know when the “receiver” receives the spike. In order to detect an ISI, the receiver must receive two spikes, and it is not clear how the sender can be synchronized with the receiver. One idea is to have a common clock and fix θ in an interval. This situation turns out to be rate coding. Another idea is to fix θ for a time proportional to the expected ISI, κ θ. In this case, the receiver may miss some spikes. In either case, the transmitted information will be lower than the numerically computed capacity.

When κ = 1, the rate coding becomes identical to the “Poisson channel” (Bar-David, 1969; Shamai (Shitz), 1990; Guo, Shamai (Shitz), & Verdú, 2008). There is a great deal of work on the Poisson channel communication, and many types of constraints on the input distributions have been considered (Verdú, 1999, provides a summary of Poisson channel communications). Our constraint is a memoryless peak energy constraint, and other constraints can be added. One of the commonly used constraints is the average energy constraint, that is, . Even if we add an average energy constraint to a peak power constraint, we believe the optimal distribution is still discrete for each coding. This has been proved for Poisson channel in Shamai (Shitz) (1990), and its extension to general values of κ seems possible. For the temporal coding, the proof can be straightforwardly extended, as in Smith (1971). However, we do not know how to set *C*, which prevents us from employing an average energy constraint. Note that adding an average energy constraint possibly makes the set , and thus the capacity, smaller, and our result is the upper bound of the capacity with an average energy constraint.

The capacity-achieving distributions are discrete distributions with finite mass points. Although this is good in the sense that neurons can transfer information maximumly with discrete numbers of “firing modes,” this does not imply neurons are using only discrete modes. The input of each neuron may vary continuously. The result in this letter shows that even if the input has rich information, the sender cannot send more information than a Markovian source with finite discrete states.

### 5.2. Noisy Channel Model.

Characteristics of neurons strongly depend on their types. MacKay and McCulloch (1952) assumed that a neuron is be able to fire within a fixed time precision. They have concluded that the each spike can carry up to 9 bits of information, and approximately 1000 to 3000 bits per second could be transferred theoretically. Compared to some biological studies summarized in Borst and Theunissen (1999), this observation might be optimistic. We modeled the stochastic property of them with a gamma function. This is quite different from the model in MacKay and McCulloch (1952).

We set the value of κ between 0.75 to 4.5, which has been indicated in Shinomoto et al. (2003); however, in Baker and Lemon (2000), κ is set to 16, which is much larger than our choice.^{4} As κ increases, the capacity and the number of mass points of the capacity achieving distribution increase; therefore, the capacity and the number of mass points for κ = 16 would be much larger than our numerical results. We have not shown numerical results for κ = 16, since it is difficult to carry out numerical experiments with a large κ because of numerical precision. This may be solved in the future.

It is also interesting to consider the communication channel with multiple neurons. If there are *m* neurons, which follows the same gamma distribution Γ(κ, θ), the sum of ISIs follows Γ(*m*κ, θ) and the average of ISIs follows Γ(*m*κ, θ/*m*). Since the channel capacity *C _{T}* and

*C*increases as κ increases, the channel capacity will be larger with multiple neurons. Note that the capacity-achieving distribution is still a discrete distribution with finite probability mass points.

_{R}### 5.3. Decoding.

_{T,i}, π

_{T,i}} and {θ

_{R,i}, π

_{R,i}},

*i*= 1, …,

*N*, respectively. The optimal decoder for the temporal decoder computes the following posterior probability when

*t*is observed, while the optimal decoder of the rate coding computes the following posterior probability when

*r*is observed: The discrete distributions ϖ

_{T,i}and ϖ

_{R,i}are the posterior distributions of the input θ conditioned on the observations. This “soft decoding” is natural from a mathematical viewpoint; however, it may not be plausible to assume that the postsynaptic neuron is computing ϖ

_{T,i}and ϖ

_{R,i}since the computation is complicated and the value of κ must be known by the neuron.

*t*or

*r*, only a single θ is considered as the decoding result. The Bayes optimal hard decoding is to choose the θ

_{i}which maximizes the posterior distribution. In the case of single-neuron information channels, the hard coding results for temporal and rate coding are defined as, respectively, Each decoder becomes a simple threshold function. Figure 5 shows the hard decoding boundaries for temporal and rate coding. In temporal coding,

*t*is a nonnegative real number, and decision boundaries are shown in Figure 5A. In rate coding,

*r*is a nonnegative integer, and decisions for integers are shown in Figure 5B.

Note that the boundary in Figure 5A is stable between κ = 0.75 to 2.6, and even if the capacity-achieving distribution has three states for κ ⩾ 2.10 (see Figure 3), the third point does not appear in decisions until κ>2.6. Similar results are observed for rate coding. Although the number of points is more than 3 if κ>1.20 (see Figure 4), the decision becomes three points only when κ>1.55. Even if the number of points is four for κ ⩾ 4.00, it does not appear as the hard decision. Decision boundaries are not sensitive to small changes of κ.

When hard decoding is employed, both input and output are discrete, and transferred information can be computed easily. The transferred information with the capacity-achieving distribution and the optimal hard decoders are shown in Figure 6. It shows that the transferred information is degraded from the optimal soft decoder; however, the lost information is not very large.

### 5.4. Related Work.

Stein (1967) has discussed the channel capacity of the rate coding where a gamma distribution with a fixed κ was the ISI model. The input was assumed to be a discrete distribution of the scale parameter θ on an interval, which happened to be optimal, and the capacity was computed numerically in a similar manner.

Although the assumption corresponds to the optimal distribution, the discreteness had not been proved. We believe this letter is the first to prove the discreteness of the optimal distribution for general κ, not only for rate coding but also for temporal coding.

### 5.5. Conclusion.

The channel capacity and the capacity-achieving distribution are obtained for a single neuron information channel. ISIs are modeled with a gamma distribution, and two types of coding, temporal and rate, are considered. Capacity-achieving distributions are proved to be discrete distributions with a finite number of points. Numerical studies show that the number of the points is relatively small for a moderate choice of κ. It should also be noted that neurons may not use efficient error-control codes, which requires a fairly long delay. Instead, the actual encoding and decoding may be very simple and far from optimal as far as the rate is concerned.

The result does not necessarily imply that the neuron is using discrete states as ISIs or that the decoding is soft decoding. However, the information capacity gives the upper bound of the information, which can be transferred through a single neuron. This limit has implications. If the input is a continuous distribution, the transferred information is lower than the capacity, and if hard decoding is employed, the transferred information is lower than the capacity.

In neurophysiological experiments, many trials are accumulated because signals are generally noisy. The results of this letter provide a general guide for how much information could be obtained through a single recording. Also it gives suggestions for the field of brain-machine interface or brain-computed interface (BCI), which tries to extract information from neurons' spikes.

## Appendix A: Proof of Lemma 1

The lemma is proved by induction.

*p*(0 ∣ θ; κ, Δ) is the probability that

*T*is larger than Δ. Since

*T*∼ Γ(κ, θ) where Δ

_{θ}= Δ/θ. Assuming equation 2.8 is true for a ,

*p*(

*m*+ 1 ∣ θ; κ, Δ) is written as follows: If the following relation holds for , it completes the proof: Equation A.2 is easily checked for

*m*= 0: ( denotes the set of positive integers) is justified as follows: where the following relations of

*P*(α,

*x*) and the beta function have been used: Note that equation A.3 follows from the following equation: Since equation A.2 holds for , equation A.1 becomes Equation 2.8 holds for every .

## Appendix B: Capacity-Achieving Distribution for Temporal Coding

### B.1. Proof of Lemma 2: *I*_{T}(F) is Weak^{*} Continuous.

*I*

_{T}(F)*p*(

*t*;

*F*, κ) and

*g*(

*t;F*, κ) are bounded as follows: From these bounds,

*p*(

*t*;

*F*, κ)log

_{n}*g*(

*t*;

*F*, κ) is bounded for all

_{n}*F*with finite

_{n}*A*

_{1}and

*A*

_{2}as follows: The right-hand side of equation B.5 is integrable as Since equation B.5 is bounded from above with an integrable function, equation B.2 is justified by the Lebesgue-dominated convergence theorem. Since

*p*(

*t*∣ θ κ) and exp [−

*t*/θ]/θ

^{κ}are continuous bounded functions of θ ∈ Θ,

*p*(

*t*;

*F*, κ) and

*g*(

*t;F*, κ) are a continuous function on

*F*,

*p*(

*t*;

*F*, κ)log

_{n}*g*(

*t*;

*F*, κ) is also continuous for every . These arguments justify equation B.3, and equation B.1 is justified.

_{n}### B.2. Proof of Lemma 3.

*F*

_{η}and rewrite

*i*(θ;

_{T}*F*) in equation 2.6 as follows: Then The weak derivative of

*I*(

_{T}*F*) at

*F*

_{0}is defined as

*I*

_{T, F0}

^{′}(

*F*) = lim

_{η ↓ 0}(

*I*(

_{T}*F*

_{η}) −

*I*(

_{T}*F*))/η. By dividing the term in equation B.6 with η and by taking η ↓ 0, it becomes By noting

_{0}*g*(

*t*;

*F*

_{η}, κ) = (1 − η)

*g*(

*t*;

*F*

_{0}, κ) + η

*g*(

*t*;

*F*, κ), the term in equation B.7 becomes 0. Thus, the weak derivative becomes which does exist and

*I*(

_{T}*F*) is weakly differentiable.

## Appendix C: Capacity-Achieving Distribution for Rate Coding

### C.1. Proof of Lemma 4.

First, the following proposition is shown:

*The expectation of R with respect to p(r ∣ θ; κ, Δ) is finite*.

*R*is Since

*P*(α,

*x*) is a strictly decreasing function of α for α>0,

*x*>0, if κ ≥ 1 Thus, the upper bound is given as where

*R*

_{1, Δθ}= Δ

_{θ}holds from the fact that

*p*(

*r*∣ θ; 1, Δ

_{θ}) is a Poisson distribution. For κ < 1,

*P*(

*r*κ, Δ

_{θ}) ⩽

*P*(⌊

*r*κ⌋, Δ

_{θ}) holds, and is bounded as follows:

*I*(

_{R}*F*) is weak

^{*}continuous if the following relation holds: From the definitions of

*I*(

_{R}*F*) and

*i*(θ,

_{R}*F*) in equation 2.9, Since

*i*(θ,

_{R}*F*) is a positive continuous function of θ, if it is bounded from above, this is justified from the Helly-Bray theorem. It will be shown separately for κ ≥ 1 and κ < 1.

**For**κ ⩾ 1: Since

*P*(α, Δ

_{θ}) is a decreasing function of

*a*, the following inequality holds from equation A.4. With the above equation,

*p*(

*r;F*, κ, Δ) is bounded from below as follows: where Δ

_{m}= Δ/

*b*and Δ

_{M}= Δ/

*a*are the minimum and the maximum of Δ

_{θ}, respectively. Thus, where

*B*is the following upper bound: With the result of proposition 3,

*i*(θ,

_{R}*F*) is bounded from above:

**For**κ < 1: When κ < 1, the following relation holds from equation A.4: Since

*P*(α,

*x*) is a decreasing function, the following relation holds: The above equation gives the following bound of

*p*(

*r;F*, κ, Δ): From the property of the gamma function, Γ(

*r*κ + 1)/Γ(

*r*κ + κ + 1) decreases as

*r*increases for

*r*>1/κ, and there exists a finite positive integer

*r*

_{0}⩾ 1/κ such that, for all

*r*⩾

*r*

_{0}, the following inequality holds for a positive real number

*C*

_{1}: Thus, With the result of proposition 3, where

*S*

_{1}is finite. It can be shown that there exists a real number

*C*

_{2}>0, such that

*p*(

*r*∣ θ; κ, Δ)>

*C*

_{2}for all θ ∈ Θ,

*r*∈ {0, …,

*r*

_{0}− 1} and the following sum is finite: Thus

*i*(θ,

_{R}*F*) =

*S*

_{1}+

*S*

_{2}is bounded from above.

### C.2. Proof of Theorem 3.

First, the following proposition is shown:

^{∞}

_{r=1}

*P*(

*rm*,

*x*) is bounded from above with a linear function of

*x*. Let us define the sum as

*S*(

_{m}*x*). From equation A.3, It is easily checked that Thus, the following linear differential equation is derived: When the differential equation is solved, the general solution gives the following form of

*S*(

_{m}*x*): Since ∣Re α

_{k}∣ <1, Re(−1 + α

_{k}) < 0 holds for

*k*∈ {1, …,

*m*− 1}, and lim

_{x→∞}

*S*(

_{m}*x*)/

*x*= 1/

*m*.

*As θ ↓ 0, the expectation of R with respect to p(r ∣ θ; κ, Δ) grows proportional to Δ _{θ} = Δ/θ*.

Let us prove theorem 3.

**For**κ ≥ 1: From equation C.2,

*p*(

*r;F*, κ, Δ) is bounded from above as follows: and where

*D*is the following lower bound: This shows

*i*(θ,

_{R}*F*) is bounded from below as Since grows with Δ

_{θ}as θ ↓ 0, the lower bound of

*i*(θ,

_{R}*F*) grows with Δ

_{θ}log Δ

_{θ}. Thus,

*i*(θ,

_{R}*F*) cannot be finite and constant for , which brings the contradiction.

**For**κ < 1: From equation C.3,

*p*(

*r;F*, κ, Δ) is bounded from above as follows: Let us denote

*r*as

*r*

^{′}and

*b*can be considered as stochastic variables, and the following relation holds: Let

*H*and

_{R}*H*

_{R}^{prime}be the entropy of

*R*and

*R*′, respectively and

*H*be the conditional entropy of

_{B ∣ R′}*B*given

*R*′. The following relation holds: which is justified from 0 ≤

*H*≤ log

_{B ∣ R′}*K*. With this result, Since ⌊κ

*K*⌋ = 1 holds, the probability

*q*(

*r*′ ∣ θ; κ, Δ) is bounded as follows: With equations C.6 and C.7, where

*E*is the following lower bound: This shows

*i*(θ,

_{R}*F*) is bounded from below as Since ∑

_{r′ = 0}

^{∞}

*r*

^{′}

*q*(

*r*

^{′}∣ θ; κ, Δ) is equivalent to , proposition 4 shows that it grows proportional to Δ

_{θ}as θ ↓ 0. Thus,

*i*(θ,

_{R}*F*) is lower bounded with a term that grows with Δ

_{θ}log Δ

_{θ}and

*i*(θ,

_{R}*F*) cannot be finite and constant for , which brings the contradiction.

## Acknowledgments

We are grateful for helpful discussions with Mark D. McDonnell. We also thank the anonymous reviewers for valuable feedback. This work was supported by Grant-in-Aid for Scientific Research No. 18079013, MEXT, Japan.

## Notes

In Stein (1967), the distribution of θ was assumed to be discrete. We do not assume it in this letter.

We used bit instead of nat by dividing capacity defined in equation 2.2 by log 2.

One of the reviewers indicated the value of κ might be much larger than that given by current references in the literature.