## Abstract

Fox and Lu introduced a Langevin framework for discrete-time stochastic models of randomly gated ion channels such as the Hodgkin-Huxley (HH) system. They derived a Fokker-Planck equation with state-dependent diffusion tensor $D$ and suggested a Langevin formulation with noise coefficient matrix $S$ such that *SS*$\u22a4=D$. Subsequently, several authors introduced a variety of Langevin equations for the HH system. In this article, we present a natural 14-dimensional dynamics for the HH system in which each directed edge in the ion channel state transition graph acts as an independent noise source, leading to a 14 $\xd7$ 28 noise coefficient matrix $S$. We show that (1) the corresponding 14D system of ordinary differential equations is consistent with the classical 4D representation of the HH system; (2) the 14D representation leads to a noise coefficient matrix $S$ that can be obtained cheaply on each time step, without requiring a matrix decomposition; (3) sample trajectories of the 14D representation are pathwise equivalent to trajectories of Fox and Lu's system, as well as trajectories of several existing Langevin models; (4) our 14D representation (and those equivalent to it) gives the most accurate interspike interval distribution, not only with respect to moments but under both the $L1$ and $L\u221e$ metric-space norms; and (5) the 14D representation gives an approximation to exact Markov chain simulations that are as fast and as efficient as all equivalent models. Our approach goes beyond existing models, in that it supports a stochastic shielding decomposition that dramatically simplifies $S$ with minimal loss of accuracy under both voltage- and current-clamp conditions.

## 1 Introduction

Many natural phenomena exhibit stochastic fluctuations arising at the molecular scale, the effects of which impact macroscopic quantities. Understanding when and how microscale fluctuations will significantly contribute to macroscale behavior is a fundamental problem spanning the sciences. The impact of random ion channel fluctuations on the timing of action potentials in nerve cells provides an important example. Channel noise can have a significant effect on spike generation (Mainen & Sejnowski, 1995; Schneidman, Freedman, & Segev, 1998), propagation along axons (Faisal & Laughlin, 2007), and spontaneous (ectopic) action potential generation in the absence of stimulation (O'Donnell & van Rossum, 2015). At the network level, channel noise can drive the endogenous variability of vital rhythms such as respiratory activity (Yu, Dhingra, Dick, & Galán, 2017).

Hodgkin and Huxley's quantitative model for active sodium and potassium currents producing action potential generation in the giant axon of *Loligo* suggested an underlying system of gating variables consistent with a multistate Markov process description (Hill & Chen, 1972). The discrete nature of individual ion channel conductances was confirmed experimentally (Neher & Sakmann, 1976). Subsequently, numerical studies of high-dimensional discrete-state, continuous-time Markov chain models produced insights into the effects of fluctuations in discrete ion channel populations on action potentials (Skaugen & Walløe, 1979; Strassberg & DeFelice, 1993), also known as *channel noise* (White, Klink, Alonso, & Kay, 1998; White, Rubinstein, & Kay, 2000).

In the standard molecular-level HH model, which we adopt here, the K$+$ channel comprises four identical $n$ gates that open and close independently, giving a five-vertex channel-state diagram with eight directed edges; the channel conducts a current only when in the rightmost state (see Figure 1, top). The Na$+$ channel comprises three identical $m$ gates and a single $h$ gate, all independent, giving an eight-vertex diagram with 20 directed edges, of which one is conducting (see Figure 1, bottom).

Discrete-state channel noise models are numerically intensive, whether implemented using discrete-time binomial approximations to the underlying continuous-time Markov process (Skaugen & Walløe, 1979; Schmandt & Galán, 2012) or continuous-time hybrid Markov models with exponentially distributed state transitions and continuously varying membrane potential. The latter were introduced by Clay and DeFelice (1983) and are in principle exact (Anderson, Ermentrout, & Thomas, 2015). Under voltage-clamp conditions, the hybrid conductance-based model reduces to a time-homogeneous Markov chain (Colquhoun & Hawkes, 1981) that can be simulated using standard methods such as Gillespie's exact algorithm (Gillespie, 1977, 2007). Even with this simplification, such Markov chain (MC) algorithms are numerically expensive to simulate with realistic population sizes of thousands of channels or greater. Therefore, there is an ongoing need for efficient and accurate approximation methods.

Although more accurate approximations based on Gillespie's algorithm (using a piecewise constant propensity approximation: Bruce, 2009; Mino et al., 2002) and even based on exact simulations (Clay & DeFelice, 1983; Newby, Bressloff, & Keener, 2013; Anderson et al., 2015) became available, they remained prohibitively expensive for large network simulations. Meanwhile, Goldwyn and Shea-Brown's rediscovery of Fox and Lu's earlier conductance-based model (Goldwyn & Shea-Brown, 2011; Goldwyn, Shea-Brown, & Rubinstein, 2010) launched a flurry of activity seeking the best Langevin-type approximation. Goldwyn and Shea-Brown (2011) introduced a faster decomposition algorithm to simulate equations 1.1 to 1.3 and showed that Fox and Lu's (1994) method accurately captured the fractions of open channels and the interspike interval (ISI) statistics, in comparison with Gillespie-type Monte Carlo (MC) simulations. However, despite the development of efficient singular value decomposition based algorithms for solving $S=D$, this step still causes a bottleneck in the algorithms based on Fox and Lu (1994), Goldwyn and Shea-Brown (2011), and Goldwyn, Imennov, Famulare, and Shea-Brown (2011).

Many variations on Fox and Lu's (1994) Langevin model have been proposed in recent years (Dangerfield, Kay, & Burrage, 2010, 2012; Linaro, Storace, & Giugliano, 2011; Orio & Soudry, 2012; Güler, 2013b; Huang, Rüdiger, & Shuai, 2013, 2015; Pezo, Soudry, & Orio, 2014; Fox, 2018), including Goldwyn et al.'s work (Goldwyn & Shea-Brown, 2011; Goldwyn et al., 2011), each with its own strengths and weaknesses. One class of methods imposes projected boundary conditions (Dangerfield et al., 2010, 2012); as we will show in section 5, this approach leads to inaccurate interspike interval distribution and is inconsistent with a natural multinomial invariant manifold structure for the ion channels. Several methods implement correlated noise at the subunit level, as in equations 1.5 and 1.6 (Fox, 1997; Linaro et al., 2011; Güler, 2013a, 2013b). However, if one recognizes that at the molecular level, the individual directed edges represent the independent noise sources in ion channel dynamics, then the approach incorporating noise at the subunit level obscures the biophysical origin of ion channel fluctuations. Some methods introduce the noisy dynamics at the level of edges rather than nodes but lump reciprocal edges together into pairs (Orio & Soudry, 2012; Dangerfield et al., 2012; Huang et al., 2013; Pezo et al., 2014). This approach implicitly assumes, in effect, that the ion channel probability distribution satisfies a detailed balance (or microscropic reversibility) condition. However, while detailed balance holds for the HH model under stationary voltage clamp, this condition is violated during active spiking. Finally, the stochastic shielding approximation (Schmandt & Galán, 2012; Schmidt & Thomas, 2014; Schmidt et al., 2018) does not have a natural formulation in the representation associated with an $n\xd7n$ noise coefficient matrix $S$; in the cases of rectangular $S$ matrices used in Orio and Soudry (2012) and Dangerfield et al. (2012) stochastic shielding can only be applied to reciprocal pairs of edges. We elaborate on these points in section 6.

In this article, we introduce a new variation of Fox and Lu's conductance-noise model that avoids the limitations we have described. We show that preserving each directed edge in the channel transition graph (see Figure 1) as an independent noise source leads to a natural, biophysically motivated Langevin model that does not require any matrix decomposition step. Our construction lends itself to direct application of stochastic shielding methods, leading to faster simulations that retain the accuracy of Fox and Lu's method.

As an additional benefit, our method answers an open question in the literature, arising from the fact that the decomposition $D=SS\u22ba$ is not unique. As Fox (2018) recently pointed out, subblock determinants of the $D$ matrices play a major role in the structure of the $S$ matrix elements. Fox conjectured that “a universal form for $S$ may exist.” In this article, we obtain the universal form for the noise coefficient matrix $S$. Moreover, we prove that our model is equivalent to Fox and Lu's 1994 model in the strong sense of pathwise equivalence.

The remainder of the article is organized as follows. In section 2, we review the canonical deterministic 14D version of the HH model. We prove a series of lemmas that show (1) the multinomial submanifold $M$ is an invariant manifold within the 14D space and (2) the velocity on the 14D space and the pushforward of the velocity on the 4D space are identical. Moreover, we show (numerically) that (3) the submanifold $M$ is globally attracting, even under current clamp conditions. Figure 2 illustrates the relationship between the 4D and 14D deterministic HH models. Section 3 lays out our $14\xd728$ Langevin HH model. Like Orio and Soudry (2012), Dangerfield et al. (2012), and Pezo et al. (2014), we avoid matrix decomposition by computing $S$ directly. The key difference between our approach and its closest relative (Pezo et al., 2014) is to use a rectangular $n\xd7k$ matrix $S$ for which each directed edge is treated as an independent noise source rather than lumping reciprocal edges together in pairs. In the new Langevin model, the form of our $S$ matrix reflects the biophysical origins of the underlying channel noise and allows us to apply the stochastic shielding approximation by neglecting the noise on selected individual directed edges. As we prove in section 4, our model (without the stochastic shielding approximation) is pathwise equivalent to all those in a particular class of biophysically derived Langevin models, including those used in Fox and Lu (1994), Goldwyn et al. (2011), Goldwyn and Shea-Brown (2011), Orio and Soudry (2012), Pezo et al. (2014), and Fox (2018). In addition to 4D and 14D deterministic trajectories, Figure 2 shows a stochastic trajectory generated by our Langevin model. Finally, we compare our Langevin model to several alternative stochastic neural models in terms of accuracy (of the full ISI distribution) and numerical efficiency in section 5.

Matlab code to generate the figures throughout the article is available on github from https://github.com/shusenpu/Stochastic_shielding and on ModelDB at accession number 266551.

## 2 The Deterministic 4D and 14D HH Models

In this section, we review the classical four-dimensional model of Hodgkin and Huxley (1952) HH), as well as its natural 14-dimensional version (Dayan & Abbott, 2001, sec. 5.7), with variables comprising membrane voltage and the occupancies of five potassium channel states and eight sodium channel states. The deterministic 14D model is the mean field of the channel-based Langevin model proposed by Fox and Lu (1994); this article describes both the Langevin and the mean field versions of the 14D Hodgkin-Huxley system. For completeness of exposition, we briefly review the 4D deterministic HH system and its 14D deterministic counterpart. In section 4, we prove that the sample paths of a class of Langevin stochastic HH models are equivalent; in section 2.3, we review analogous results relating trajectories of the 4D and 14D deterministic ODE systems.

In particular, we show that the deterministic 14D model and the original 4D HH model are dynamically equivalent, in the sense that every flow (solution) of the 4D model corresponds to a flow of the 14D model. The consistency of trajectories between the 14D and 4D models is easy to verify for initial data on a 4D submanifold of the 14D space given by choosing multinomial distributions for the gating variables (Dayan & Abbott, 2001; Goldwyn et al., 2011). Similarly, Keener established results on multinomial distributions as invariant submanifolds of Markov models with ion channel kinetics under several circumstances (Keener, 2009, 2010; Earnshaw & Keener, 2010a, 2010b), but without treating the general current-clamped case. Consistent with these results, we show that the set of all 4D flows maps to an invariant submanifold of the state space of the 14D model. Moreover, we show numerically that solutions of the 14D model with arbitrary initial conditions converge to this submanifold. Thus the original HH model “lives inside” the 14D deterministic model in the sense that the former embeds naturally and consistently within the latter (see Figure 2).

In the stochastic case, the 14D model has a natural interpretation as a hybrid stochastic system with independent noise forcing along each edge of the potassium (8 directed edges) and sodium (20 directed edges) channel state transition graphs. The hybrid model leads naturally to a biophysically grounded Langevin model that we describe in section 3. In contrast to the ODE case, the stochastic versions of the 4D and 14D models are *not* equivalent (Goldwyn & Shea-Brown, 2011).

### 2.1 The 4D Hodgkin-Huxley Model

This system is a $C\u221e$ vector field on a four-dimensional manifold (with boundary) contained in $R4$: $X={-\u221e<v<\u221e,0\u2264m,h,n\u22641}=R\xd7[0,1]3.$ The manifold is forward and backward invariant in time. If $Iapp$ is constant then $X$ has an invariant subset given by $X\u2229{vmin\u2264v\u2264vmax}$, where $vmin$ and $vmax$ are calculated in lemma 1.

As pointed Keener and Sneyd (1998) and Keener (2009) pointed out, for voltage either fixed or given as a prescribed function of time, the equations for $m,h$, and $n$ can be interpreted as the parameterization of an invariant manifold embedded in a higher-dimensional time-varying Markov system. Several papers developed this idea for a variety of ion channel models and related systems (Keener, 2009; Earnshaw & Keener, 2010b), but the theory developed is restricted to the voltage-clamped case.

Under a fixed voltage clamp, the ion channels form a time-homogeneous Markov process with a unique (voltage-dependent) stationary probability distribution. Under a time-varying current clamp, the ion channels nevertheless form a Markov process, albeit no longer time homogeneous. Under these conditions, the ion channel state converges rapidly to a multinomial distribution indexed by a low-dimensional set of time-varying parameters ($m(t),h(t),n(t)$) (Keener, 2010). In the current-clamped case, the ion channel process, considered alone, is neither stationary nor Markovian, making the analysis of this case significantly more challenging from a mathematical point of view.

### 2.2 The Deterministic 14D Hodgkin-Huxley Model

### 2.3 Relation Between the 14D and 4D Deterministic HH Models

Earnshaw and Keener (2010b) suggest that it is reasonable to expect that the global flow of the 14D system should converge to the 4D submanifold but also that it is far from obvious that it must. Existing theory applies to the voltage-clamped case (Keener, 2009; Earnshaw & Keener, 2010b). Here, we consider instead the current-clamped case, in which the fluctuations of the ion channel state influences the voltage evolution, and vice versa.

In the remainder of this section, we (1) define a multinomial submanifold $M$ and show that it is an invariant manifold within the 14D space, and (2) we show that the velocity on the 14D space and the pushforward of the velocity on the 4D space are identical. In section 2.4, we (3) provide numerical evidence that $M$ is globally attracting within the higher-dimensional space.

^{1}and a second multinomial distribution on the Na$+$-channel,

^{2}form a submanifold of $\Delta 7\xd7\Delta 4$. In this way we define a submanifold, denoted $M=H(X)$, the image of $X$ under $H$.

14D Model . | 4D Model . |
---|---|

$(v,m00,\u2026,m31,n0,\u2026,n4)$ | $(v,m,h,n)$ |

$v$ | $v$ |

$13(m11+m10)+23(m21+m20)+m31+m30$ | $m$ |

$m01+m11+m21+m31$ | $h$ |

$n1/4+n2/2+3n3/4+n4$ | $n$‘ |

14D Model . | 4D Model . |
---|---|

$(v,m00,\u2026,m31,n0,\u2026,n4)$ | $(v,m,h,n)$ |

$v$ | $v$ |

$13(m11+m10)+23(m21+m20)+m31+m30$ | $m$ |

$m01+m11+m21+m31$ | $h$ |

$n1/4+n2/2+3n3/4+n4$ | $n$‘ |

Note: Note that both ${m00,\u2026,m31}$ and ${n0,\u2026,n4}$ follow multinomial distributions.

4D Model . | 14D Model . |
---|---|

$(v,m,h,n)$ | $(v,m00,\u2026,m31,n0,\u2026,n4)$ |

$v$ | $v$ |

$(1-n)4$ | $n0$ |

$4(1-n)3n$ | $n1$ |

$6(1-n)2n2$ | $n2$ |

$4(1-n)n3$ | $n3$ |

$n4$ | $n4$ |

$(1-m)3(1-h)$ | $m00$ |

$3(1-m)2m(1-h)$ | $m10$ |

$3(1-m)m2(1-h)$ | $m20$ |

$m3(1-h)$ | $m30$ |

$(1-m)3h$ | $m01$ |

$3(1-m)2mh$ | $m11$ |

$3(1-m)m2h$ | $m21$ |

$m3h$ | $m31$ |

4D Model . | 14D Model . |
---|---|

$(v,m,h,n)$ | $(v,m00,\u2026,m31,n0,\u2026,n4)$ |

$v$ | $v$ |

$(1-n)4$ | $n0$ |

$4(1-n)3n$ | $n1$ |

$6(1-n)2n2$ | $n2$ |

$4(1-n)n3$ | $n3$ |

$n4$ | $n4$ |

$(1-m)3(1-h)$ | $m00$ |

$3(1-m)2m(1-h)$ | $m10$ |

$3(1-m)m2(1-h)$ | $m20$ |

$m3(1-h)$ | $m30$ |

$(1-m)3h$ | $m01$ |

$3(1-m)2mh$ | $m11$ |

$3(1-m)m2h$ | $m21$ |

$m3h$ | $m31$ |

For a conductance-based model of the form 2.13 to 2.15 and for any bounded applied current $I-\u2264Iapp\u2264I+$, there exist upper and lower bounds $Vmax$ and $Vmin$ such that trajectories with initial voltage condition $V\u2208[Vmin,Vmax]$ remain within this interval for all times $t>0$, regardless of the initial channel state.

We conclude that if $V$ takes an initial condition in the interval $[Vmin,Vmax],$ then $V(t)$ remains within this interval for all $t\u22650$.$\u25a1$

Given that $y\u2208Y$ has a bounded domain, lemma 2 follows directly and establishes that the multinomial submanifold $M$ is a forward-time–invariant manifold within the 14D space. The proof of lemma 2 is in appendix C.

Let $X$ and $Y$ be the lower-dimensional and higher-dimensional Hodgkin-Huxley manifolds given by equation 2.12, and let $F$ and $G$ be the vector fields on $X$ and $Y$ defined by equations 2.1 to 2.4 and 2.7 to 2.9, respectively. Let $H:X\u2192M\u2282Y$ and $R:Y\u2192X$ be the mappings given in Tables 2 and 1, respectively, and define the multinomial submanifold $M=H(X)$. Then $M$ is forward-time-invariant under the flow generated by $G$. Moreover, the vector field $G$, when restricted to $M$, coincides with the vector field induced by $F$ and the map $H$. That is, for all $y\u2208M$, $G(y)=DxH(R(y))\xb7F(R(y)).$

Lemma 2 establishes that the 14D HH model given by equations 2.7 to 2.9 is dynamically consistent with the original 4D HH model given by equations 2.1 to 2.4.

In section 2.4 we provide numerical evidence that the flow induced by $G$ on $Y$ converges to $M$ exponentially fast. Thus, an initial probability distribution over the ion channel states that is not multinomial quickly approaches a multinomial distribution with dynamics induced by the 4D HH equations. Similar results, restricted to the voltage-clamp setting, were established by Keener and Earnshaw (Keener & Sneyd, 1998; Keener, 2009; Earnshaw & Keener, 2010b).

### 2.4 Local Convergence Rate

Keener and Earnshaw (Keener & Sneyd, 1998; Keener, 2009; Earnshaw & Keener, 2010b) showed that for Markov chains with constant (even time-varying) transition rates, (1) the multinomial probability distributions corresponding to mean-field models (such as the HH sodium or potassium models) form invariant submanifolds within the space of probability distributions over the channel states, and (2) arbitrary initial probability distributions converged exponentially quickly to the invariant manifold. For systems with prescribed time-varying transition rates, such as for an ion channel system under voltage clamp with a prescribed voltage $V(t)$ as a function of time, the distribution of channel states had an invariant submanifold, again corresponding to the multinomial distributions, and the flow on that manifold induced by the evolution equations was consistent with the flow of the full system.

In the preceding section, we established the dynamical consistency of the 14D and 4D models with enough generality to cover both the voltage-clamp and current-clamp systems; the latter is distinguished by not having a prescribed voltage trace, but rather having the voltage coevolve along with the (randomly fluctuating) ion channel states. Here, we give numerical evidence for exponential convergence under current clamp similar to that established under voltage clamp by Keener and Earnshaw.

Rather than providing a rigorous proof, we give numerical evidence for the standard deterministic HH model that $y\u2192M$ under current clamp (spontaneous firing conditions) in the following sense: if $y(t)$ is a solution of $y\u02d9=G(y)$ with arbitrary initial data $y0\u2208Y$, then $||y(t)-H(R(y(t)))||\u21920$ as $t\u2192\u221e$, exponentially quickly. Moreover, the convergence rate is bounded by $\lambda =max(\lambda v,\lambda Na,\lambda K)$, where $\lambda ion$ is the least negative nontrivial eigenvalue of the channel state transition matrix (over the voltage range $Vmin\u2264v\u2264Vmax$) for a given ion, and $-1/\lambda v$ is the largest value taken by the membrane time constant (for $Vmin\u2264v\u2264Vmax$). In practice, we find that the membrane time constant does not determine the slowest timescale for convergence to $M$. In fact it appears that the second-least-negative eigenvalues (not the least-negative eigenvalues) of the ion channel matrices set the convergence rate.

Note that $y\u2208Y$ can be written as $y=[V;M;N]$. As shown in appendix C, the Jacobian matrix $\u2202H\u2202x$ consists of three block matrices: one for the voltage terms, $\u2202V\u2202v$; one associated with the Na$+$ gates, given by $\u2202M\u2202m$ and $\u2202M\u2202h$; and one corresponding to the K$+$ gates, $\u2202N\u2202n$. Fixing a particular voltage $v$, let $\lambda i,i\u2208{0,1,2,\u2026,7}$ be the eight eigenvalues of $ANa$ and $vi$ be the associated eigenvectors, that is, $ANavi=\lambda ivi$ for the rate matrix in equations 2.8. Similarly, let $\eta i,wi,i\u2208{0,1,2,\u2026,4}$ be the five eigenvalues and the associated eigenvectors of $AK$, that is, $AKwi=\eta iwi$, for the rate matrix in equations 2.9. If we rank the eigenvalues of either matrix in descending order, the leading eigenvalue is always zero (because the sum of each column for $ANa$ and $AK$ is zero for every $V$) and the remainder are real and negative. Let $\lambda 1$ and $\eta 1$ denote the largest (least negative) nontrivial eigenvalues of $ANa$ and $AK$, respectively, and let $v1\u2208R8$ and $w1\u2208R5$ be the corresponding eigenvectors.

## 3 Stochastic 14D Hodgkin-Huxley Models

Finite populations of ion channels generate stochastic fluctuations (“channel noise”) in ionic currents that influence action potential initiation and timing (White et al., 1998; Schneidman et al., 1998). At the molecular level, fluctuations arise because transitions between individual ion channel states are stochastic (Hill & Chen, 1972; Neher & Sakmann, 1976; Skaugen & Walløe, 1979). Each directed edge in the ion channel state transition diagrams (see Figure 1) introduces an independent noise source. It is of interest to be able to attribute variability of the interspike interval timing distribution to specific molecular noise sources, specifically individual directed edges in each channel state graph. In order to explore these contributions, we develop a system of Langevin equations for the Hodgkin-Huxley equations, set in a 14-dimensional phase space.

Working with a higher-dimensional stochastic model may appear inconvenient, but in fact has several advantages. First, any projection of an underlying 14D model onto a lower (e.g., 4D) stochastic model generally entails loss of the Markov property. Second, the higher-dimensional representation allows us to assess the contribution of individual molecular transitions to the macroscopically observable variability of timing in the interspike interval distribution. Third, by using a rectangular noise coefficient matrix constructed directly from the transitions in the ion channel graphs, we avoid a matrix decomposition step. This approach leads to a fast algorithm that is equivalent to the slower algorithm due to (Fox & Lu, 1994; Goldwyn & Shea-Brown, 2011) in a strong sense (pathwise equivalence) that we detail in section 4.

### 3.1 Exact Stochastic Simulation of HH Kinetics: The Random–Time-Change Representation

^{3}Thus, they are constrained, respectively, to a 7D simplex embedded in $R8$ and a 4D simplex embedded in $R5$. In the random-time-change representation (Anderson & Kurtz, 2015) the exact evolution equations are written in terms of sums over the directed edges $E$ for each ion channel, $ENa={1,\u2026,20}$ and $EK={1,\u2026,8}$, (see Figure 1),

^{4}Each $Ykion(\tau )$ is an independent unit-rate Poisson process, evaluated at “internal time” (or integrated intensity) $\tau $, representing the independent channel noise arising from transitions along the $k$th edge. The voltage-dependent per capita transition rate along the $k$th edge is $\alpha kion(v)$, and $Mi(k)(s)$ (resp. $Ni(k)(s)$) is the fractional occupancy of the source node for the $k$th transition at time $s$. Thus, for example, the quantity $Mtot\alpha kNa(V(s))Mi(k)(s)$ gives the net intensity along the $k$th directed edge in the Na$+$ channel graph at time $s$.

The random-time-change representation, equations 3.1 to 3.3, leads to an exact stochastic simulation algorithm, given in Anderson and Kurtz (2015); equivalent simulation algorithms have been used previously (Clay & DeFelice, 1983; Newby et al., 2013). Many authors substitute a simplified Gillespie algorithm that imposes a piecewise-constant propensity approximation, ignoring the voltage dependence of the transition rates $\alpha kion$ between channel transition events (Goldwyn et al., 2011; Goldwyn & Shea-Brown, 2011; Orio & Soudry, 2012; Pezo et al., 2014). The two methods give similar moment statistics, provided $Ntot,Mtot\u227340$ (Anderson & Kurtz, 2015); their similarity regarding path-dependent properties (including interspike interval distributions) has not been studied in detail. Moreover, both Markov chain algorithms are prohibitively slow for modest numbers (e.g., thousands) of channels; the exact algorithm may be even slower than the approximate Gillespie algorithm. For consistency with previous studies, in this article, we use the piecewise-constant propensity Gillespie algorithm with $Mtot=6000$ Na$+$ and $Ntot=1800$ K$+$ channels as our gold standard Markov chain (MC) model, as in Goldwyn and Shea-Brown (2011).

In section 3.2 we develop a 14D conductance-based Langevin model with 28 independent noise sources – one for each directed edge – derived from the random-time-change representation equations 3.1 to 3.3. In previous work (Schmidt & Thomas, 2014), we established a quantitative measure of edge importance, namely, the contribution of individual transitions (directed edges) to the variance of channel state occupancy under steady-state voltage-clamp conditions. Under voltage clamp, the edge importance was identical for each reciprocal pair of directed edges in the graph, a consequence of detailed balance. Some Langevin models lump the noise contributions of each pair of edges (Dangerfield et al., 2010, 2012; Orio & Soudry, 2012; Pezo et al., 2014). Under conditions of detailed balance, this simplification is well justified. However, as we will show (see Figure 5), under current-clamp conditions (e.g., for an actively spiking neuron) detailed balance is violated, the reciprocal edge symmetry is broken, and each pair of directed edges makes a distinct contribution to ISI variability.

### 3.2 Langevin Equations of the 14D HH Model

^{5}

Although the ion channel state trajectories generated by equation 3.6 are not strictly bounded to remain within the nonnegative simplex, empirically, the voltage nevertheless remains within specified limits with overwhelming probability.

Fox and Lu's original approach (Fox & Lu, 1994) requires solving a matrix square root equation $SS\u22ba=D$ to obtain a square ($8\xd78$ for Na$+$ or $5\xd75$ for K$+$) noise coefficient matrix consistent with the state-dependent diffusion matrix $D$. As an advantage, the ion channel representation equations 3.7 to 3.9, uses sparse, nonsquare noise coefficient matrices ($8\xd720$ for the Na$+$ channel and $5\xd78$ for the K$+$ channel), which exposes the independent sources of noise for the system.

The new Langevin model in equations 3.7 to 3.9 does not require detailed balance, which gives more insight into the underlying kinetics. Review papers (e.g., Goldwyn & Shea-Brown, 2011; Pezo et al., 2014; Huang et al., 2015) did systematic comparison of various stochastic versions of the HH model. In sections 4 and 5, we quantitatively analyze the connection between the new model and other existing models (Fox & Lu, 1994; Goldwyn et al., 2011; Goldwyn & Shea-Brown, 2011; Dangerfield et al., 2010, 2012; Orio & Soudry, 2012; Huang et al., 2013, 2015; Pezo et al., 2014; Fox, 2018). Problems such as the boundary constraints are beyond the scope of this article; however, we would like to connect the new model to another type of approximation to the MC model; the stochastic shielding approximation.

### 3.3 Stochastic Shielding for the 14D HH Model

*edge importance*; assuming detailed balance, it is given by

Figure 4 shows the edge importance for each pair of edges in the HH Na$+$ and K$+$ ion channel graph, as a function of voltage in the range $[-100,100]$ mV. Note that reciprocal edges have identical $Rk$ due to detailed balance. Under voltage clamp, the largest value of $Rk$ for the HH channels always corresponds to directly observable transitions, that is, edges $k$ such that $|c\u22ba\zeta k|>0$, although this condition need not hold in general (Schmidt, Galán, & Thomas, 2018).

Schmandt and Galán (2012) assumed that the edges with the largest contribution to current fluctuations under voltage clamp would also make the largest contributions to variability in voltage and timing under current clamp and included edges 7 and 8 of the K$+$channel ($EK'={7,8}$) and edges 11 and 12 and 19 and 20 of the Na$+$ channel ($ENa'={11,12,19,20}$), yielding an $8\xd74$ matrix $SNa'$ and a $5\xd72$ matrix $SK'$. They demonstrated numerically that restricting stochastic forcing to these edges gave a significantly faster simulation with little appreciable change in statistical behavior: under voltage clamp, the mean current remained the same, with a small (but noticeable) decrease in the current variance; meanwhile, similar interspike interval (ISI) statistics were observed.

Figure 5A plots the logarithm of the ISI variance for each edge in $EK$. Vertical bars (cyan) show the ensemble mean of the ISI variance, with a 95% confidence interval superimposed (magenta). Several observations are in order. First, the ISI variance driven by the noise in each edge decreases rapidly the farther the edge is from the observable transitions (edges 7,8), reflecting the underlying “stochastic shielding” phenomenon. Second, the symmetry of the edge importance for reciprocal edge pairs—(1,2), (3,4), (5,6), and (7,8)—that is observed under voltage clamp is broken under current clamp. The contribution of individual directed edges to timing variability under current clamp has another important difference compared with the edge importance (current fluctuations) under voltage clamp. A similar breaking of symmetry for reciprocal edges is seen for the Na$+$ channel, again reflecting the lack of detailed balance during active spiking.

Figure 5B shows the ISI variance when channel noise is included on individual edges of $ENa.$ Here, the difference between voltage and current clamp is striking. Under voltage clamp, the four most important edges are always those representing observable transitions, in the sense that the transition's stoichiometry vector $\zeta $ is not orthogonal to the conductance vector $c$. That is, the four most important pairs are always 11 and 12 and 19 and 20, regardless of voltage (see Figure 4). Under current clamp, the most important edges are 17, 18, 19, and 20. Although edges 11 and 12 are among the four most important sources of channel population fluctuations under voltage clamp, they are not even among the top 10 contributors to ISI variance when taken singly. Even though edges 17 and 18 are hidden, meaning they do not directly change the instantaneous channel conductance, these edges are nevertheless the second most important pair under current clamp. Therefore, when we implement the stochastic-shielding based approximation, we include the pairs 17 and 18 and 19 and 20 in equation 3.11. We refer to the approximate SS model driven by these six most important edges as the $14\xd76$D HH model.

Given the other parameters we use for the HH model (see Table 5 in appendix B), the input current of $Iapp=10$ nA is slightly beyond the region of multistability associated with a subcritical Andronov-Hopf bifurcation. In order to make sure the results are robust against increases in the applied current, we tried current injections ranging from 20 to 100 nA. While injecting larger currents decreased the ISI variance, it did not change the rank order of the contributions from the most important edges.

## 4 Pathwise Equivalence for a Class of Langevin Models

Fox and Lu's method has been widely used since its appearance (see references in Bruce, 2009; Goldwyn & Shea-Brown, 2011; Huang et al., 2015), and the “best” approximation for the underlying Markov chain (MC) model has been a subject of ongoing discussion for decades. Several studies (Mino et al., 2002; Bruce, 2009; Sengupta, Laughlin, & Niven, 2010) attested to discrepancies between Fox's later approach (Fox, 1997) and the discrete-state MC model, raising the question of whether Langevin approximations could ever accurately represent the underlying fluctuations seen in the gold standard MC models. An influential review paper (Goldwyn & Shea-Brown, 2011) found that these discrepancies were due to the way in which noise is added to the stochastic differential equations, 1.1 to 1.3. Recent studies, including Dangerfield et al. (2010, 2012); Linaro et al. (2011); Goldwyn and Shea-Brown (2011); Goldwyn et al. (2011); Orio and Soudry (2012); Güler (2013b); Huang et al. (2013, 2015); Pezo et al. (2014); and Fox (2018), discussed various ways of incorporating channel noise into HH kinetics based on the original work by Fox and Lu (Fox & Lu, 1994; Fox, 1997), some of which have the same SDEs but with different boundary conditions. Different boundary conditions (BCs) are not expected to have much impact on computational efficiency. Indeed, if BCs are neglected, the main difference between channel-based (or conductance-based) models is the diffusion matrix $S$ in the Langevin euqations 1.2 and 1.3. As the discussion about where and how to incorporate noise into the HH model framework goes on, Fox (2018) recently asked whether there is a way of relating different models with different $S$ matrices. We give a positive answer to this question below.

In section 4.1 we demonstrate the equivalence (neglecting the boundary conditions) of a broad class of previously proposed channel-based Langevin models (Fox & Lu, 1994; Dangerfield et al., 2010, 2012; Goldwyn & Shea-Brown, 2011; Orio & Soudry, 2012; Huang et al., 2013; Pezo et al., 2014; Fox, 2018) and the 14D Langevin HH model with 28 independent noise sources (one for each directed edge in the channel state transition graph), that is, our $14\xd728$D Langevin model.

### 4.1 When Are Two Langevin Equations Equivalent?

### 4.2 Map Channel-based Langevin Models to Fox and Lu's Model

First, we prove that given a Wiener trajectory, $W(t),t\u2208[0,T]$ and the solution to equation 4.16$X(t)$, there exists a Wiener trajectory $W*(t)$ such that the solution to equation 4.17, $X*$, is also a solution to equation 4.16. In other words, for a Wiener process $W(t)$, we can construct a $W*(t)$, such that $X*(t)=X(t)$, for $0\u2264t\u2264T$.

To illustrate pathwise equivalence, Figure 6 plots trajectories of the $14\xd728$D stochastic HH model and Fox and Lu's model using noise traces dictated by the preceding construction. In panel A, we generated a sample path for equation 4.16 and plot three variables in $X$: the voltage $V$, Na$+$ channel open probability $M31$, and K$+$ channel open probability $N4$. The corresponding trajectory, $X*$, for Fox and Lu's model was generated from equation 4.17 and the corresponding Wiener trajectory was calculated using equation 4.18. The top three subplots in panel A superposed the voltage $V*$, Na$+$ channel open probability $M31*$ and K$+$ channel open probability $N4*$ in $X*$ against those in $X$. The bottom three subplots in panel A plot the point-wise differences of each variable. Equations 4.16 and 4.17 are numerically solved in Matlab using the Euler-Maruyama method with a time step $dt=0.001$ ms. The slight differences observed arise in part due to numerical errors in calculating the singular value decomposition of $S$ (in equation 4.16); another source of error is the finite accuracy of the Euler-Maruyama method.^{6} As shown in Figure 6, most differences occur near the spiking region, where the system is numerically very stiff and the numerical accuracy of the SDE solver accounts for most of the discrepancies (analysis of which is beyond the scope of this article). To illustrate pathwise equivalence, Figure 6 shows superposed voltage trajectories for each simulation, as well as the point-wise voltage differences of each. We can conclude from the comparison in Figure 6 that the $14\xd728$D Langevin model is pathwise equivalent with Fox and Lu's model. Similarly, the same analogy applies for other channel-based Langevin models such that with the same diffusion matrix $D(X)$.

We have shown that our $14\xd728$D model, with a 14-dimensional state space and 28 independent noise sources (one for each directed edge), is pathwise equivalent to Fox and Lu's original 1994 model, as well as other channel-based models (under corresponding boundary conditions) including Goldwyn et al. (2011); Goldwyn and Shea-Brown (2011); Orio and Soudry (2012); Pezo et al. (2014); Fox (2018). As we shall see in section 5, the pathwise equivalent models give statistically indistinguishable interspike interval distributions under the same BCs. We emphasize the importance of boundary conditions for pathwise equivalence. Two simulation algorithms with the same $Ai$ and $Si$ matrices will generally have nonequivalent trajectories if different boundary conditions are imposed. For example, Dangerfield et al. (2012) employ the same dynamics as Orio and Soudry (2012) away from the boundary, where ion channel state occupancy approaches zero or one. But where the latter allow trajectories to move freely across this boundary (which leads only to small, short-lived excursions into “nonphysical” values), Dangerfield imposes reflecting boundary conditions through a projection step at the boundary. As we will see ln section 5, this difference in boundary conditions leads to a statistically significant difference in the ISI distribution, as well as a loss of accuracy when compared with the gold standard Markov chain simulation.

## 5 Model Comparison

In section 3, we studied the contribution of every directed edge to the ISI variability and proposed how stochastic shielding could be applied under current clamp. Moreover, in section 4, we proved that a family of Langevin models is pathwise equivalent.

Here we compare the accuracy and computational efficiency of several models, including the “subunit model” (Fox, 1997; Goldwyn & Shea-Brown, 2011), Langevin models with different $S$ matrices or boundary conditions (Fox & Lu, 1994; Goldwyn & Shea-Brown, 2011; Dangerfield et al., 2012; Orio & Soudry, 2012; Pezo et al., 2014), the 14D HH model (proposed in section 3.2), the 14D stochastic shielding model with six independent noise sources (proposed in section 3.3), and the gold standard Markov chain model (discussed in section 3.1). Where other studies have compared moment statistics such as the mean firing frequency (under current clamp) and stationary channel open probababilies (under voltage clamp), we base our comparison on the entire interspike interval (ISI) distributions, under current clamp with a common fixed driving current. We use two different comparisons of ISI distributions, the first based on the $L1$ norm of the difference between two distributions (the Wasserstein distance; Wasserstein, 1969), in section 5.1 and the second based on the $L\u221e$ norm (the Kolmogorov-Smirnov test; Kolmogorov, 1933; Smirnov, 1948), in section 5.2. We find similar results using both measures: as expected, the models that produce pathwise equivalent trajectories (Fox & Lu, 1994; Orio & Soudry; and our $14\xd728$D model) have indistinguishable ISI statistics, while the nonequivalent models (Fox, 1997; Dangerfield, 2010, 2012; Goldwyn & Shea-Brown, 2011; our $14\xd76$D stochastic-shielding model) have significantly different ISI distributions. Of these, the $14\xd76$D SS model is the closest to the models in the $14\xd728$D class and as fast as any other model considered.

### 5.1 $L1$ Norm Difference of ISIs

We first evaluate the accuracy of different stochastic simulation algorithms by comparing their ISI distributions under current clamp to that produced by a reference algorithm, namely, the discrete-state Markov chain (MC) algorithm.

Model . | Variables V+M+N . | $S$ Matrix . | Noise Dimensions Na+K . | $L1$ Norm (msec.) (Wasserstein Dist.) . | Run Time (sec.) . |
---|---|---|---|---|---|

MC | 1+8+5 | n/a | 20+8 | $2.27e-4\xb17.15e-5$ | 3790 |

Fox94 | 1+7+4 | $S=D$ | 7+4 | $4.74e-2\xb11.93e-4$ | 2436 |

Fox97 | 1+2+1 | n/a | 3 | $8.01e-1\xb19.48e-4$ | 67 |

Dangerfield | 1+8+5 | $SEF$ | 10+4 | $2.18e-1\xb12.14e-4$ | 655 |

Goldwyn | 1+8+5 | $S=D$ | 8+5 | $1.83e-1\xb11.93e-4$ | 2363 |

Orio | 1+8+5 | $Spaired$ | 10+4 | $4.52e-2\xb12.08e-4$ | 577 |

$14\xd728$D | 1+8+5 | $Ssingle$ | 20+8 | $4.93e-2\xb11.94e-4$ | 605 |

SS | 1+8+5 | $Sss$ | 4+2 | $7.62e-2\xb17.57e-5$ | 73 |

Model . | Variables V+M+N . | $S$ Matrix . | Noise Dimensions Na+K . | $L1$ Norm (msec.) (Wasserstein Dist.) . | Run Time (sec.) . |
---|---|---|---|---|---|

MC | 1+8+5 | n/a | 20+8 | $2.27e-4\xb17.15e-5$ | 3790 |

Fox94 | 1+7+4 | $S=D$ | 7+4 | $4.74e-2\xb11.93e-4$ | 2436 |

Fox97 | 1+2+1 | n/a | 3 | $8.01e-1\xb19.48e-4$ | 67 |

Dangerfield | 1+8+5 | $SEF$ | 10+4 | $2.18e-1\xb12.14e-4$ | 655 |

Goldwyn | 1+8+5 | $S=D$ | 8+5 | $1.83e-1\xb11.93e-4$ | 2363 |

Orio | 1+8+5 | $Spaired$ | 10+4 | $4.52e-2\xb12.08e-4$ | 577 |

$14\xd728$D | 1+8+5 | $Ssingle$ | 20+8 | $4.93e-2\xb11.94e-4$ | 605 |

SS | 1+8+5 | $Sss$ | 4+2 | $7.62e-2\xb17.57e-5$ | 73 |

Notes: Model (see text for details): MC: Markov chain. Fox94; From Fox and Lu (1994). Fox97: Fox (1997). Goldwyn: Goldwyn and Shea-Brown (2011). Dangerfield: Dangerfield et al. (2010, 2012). $14\xd728$D: model proposed in section 3.2. SS: stochastic-shielding model (see section 3.3). Variables: number of degrees of freedom in Langevin equation representing voltage, sodium gates, and potassium gates, respectively. $S$ Matrix: Form of the noise coefficient matrix in equations 1.1 to 1.3. Noise dimensions: number of independent gaussian white noise sources represented for sodium and potassium, respectively. $L1$ Norm: Empirically estimated $L1$-Wasserstein distance between the model's ISI distribution and the MC model's ISI distribution. For MC versus MC, independent trials were compared. $a\xb1b$: mean$\xb1$standard deviation. Run time (in seconds): see text for details.

We numerically calculate $\rho 1(Fn,FnM)$ to compare several Langevin models against the MC model. We consider the following models: “Fox94” denotes the original model proposed by Fox and Lu (1994), which requires a square root decomposition ($S=D$) for each step in the simulation (see equations 1.1 to 1.3). “Fox97” is the widely used “subunit model” of Fox (1997); see equations 1.4 and 1.5). “Goldwyn” denotes the method taken from Goldwyn and Shea-Brown (2011), where they restrict the 14D system ($V$, 5 K$+$ gates and 8 Na$+$ gates) to the 4D multinomial submanifold ($V,m,n,andh$; see section 4.2), with gating variables truncated to [0,1]. We write “Orio” for the model proposed by Orio and Soudry (2012), where they constructed a rectangular matrix $S$ such that $SS\u22ba=D$ (referred to as $Spaired$ in Table 3) combining fluctuations driven by pairs of reciprocal edges, thereby avoiding taking matrix square roots at each time step. The model “Dangerfield” represents Dangerfield et al. (2012), which used the same $S$ matrix as in Orio and Soudry (2012) but added a reflecting (no-flux) boundary condition via orthogonal projection (referred to as $SEF$ in Table 3). Finally, we include the $14\xd728$D model we proposed in section 3.2, or “14D” (referred to as $Ssingle$ in Table 3); “SS” is the stochastic shielding model specified in section 3.3.

For each model, we ran 10,000 independent samples of the simulation, holding channel number, injected current ($Iapp=10$ nA), and initial conditions fixed. Throughout the article, we presume a fixed channel density of 60 $channels/\mu m2$ for sodium and 18 $channels/\mu m2$ for potassium in a membrane patch of area $100\mu m2$, consistent with prior work such as Goldwyn and Shea-Brown (2011) and Orio and Soudry (2012). The initial condition is taken to be the point on the deterministic limit cycle at which the voltage crosses upward through $-60$ mV. An initial transient corresponding to 10 to 15 ISIs is discarded, to remove the effects of the initial condition. (See Table 5 in appendix B for a complete specification of simulation parameters.) We compared the efficiency and accuracy of each model through the following steps:

For each model, a single run simulates a total time of 84,000 milliseconds (ms) with time step 0.008 ms, recording at least 5000 ISIs.

For each model, repeat 10,000 runs in step 1.

Create a reference ISI distribution by aggregating all 10,000 runs of the MC model, that is, based on roughly $5\xd7107$ ISIs.

For each of $104$ individual runs, align all ISI data into a single vector and calculate the ECDF using equation 5.1.

Compare the ISI distribution of each model with the reference MC distribution by calculating the $L1$-difference of the ECDFs using equation 5.3.

To compare the computational efficiency, we take the average execution time of the MC model as the reference. The relative computational efficiency is the ratio of the average execution time of a model with that of the MC model (about 3790 sec).

Table 3 gives the empiricially measured $L1$ difference in ISI distribution between several pairs of models.^{7} The first row (“MC”) gives the average $L1$ distance between individual MC simulations and the reference distribution generated by aggregating all MC simulations, in order to give an estimate of the intrinsic variability of the measure. Figure 8 plots the $L1$-Wasserstein differences versus the relative computational efficiency of several models against the MC model. These results suggest that the Fox94, Orio, and $14\xd728$D models are statistically indistinguishable when compared with the MC model using the $L1$-Wasserstein distance. This result is expected in light of our results (see section 4) showing that these three models are pathwise-equivalent. (We will make pairwise statistical comparisons between the ISI distributions of each model in see section 5.2.) Among these equivalent models, however, the $14\xd728$D and Orio models are significantly faster than the original Fox94 model (and the Goldwyn model) because they avoid the matrix square root computation. The Dangerfield model has speed similar to the $14\xd728$D model, but the use of reflecting boundary conditions introduces significant inaccuracy in the ISI distribution. The imposition of truncating boundary conditions in the Goldwyn model also appears to affect the ISI distribution. Of the models considered, the Fox97 subunit model is the fastest; however, it makes a particularly poor approximation to the ISI distribution of the MC model. Note that the maximum $L1$-Wasserstein distance between two distributions is 2. The ISI distribution of the Fox97 subunit model to that of the MC model is more than 0.8, which is 10 times larger than the $L1$-Wasserstein distance of the SS model, and almost half of the maximum distance. As shown in Figure 7,^{8} the Fox97 subunit model fails to achieve the spike firing threshold and produces longer ISIs. Because of its inaccuracy, we do not include the subunit model in our remaining comparisons. The stochastic shielding model, on the other hand, has nearly the same speed as the Fox97 model, but it is over 100 times more accurate (in the $L1$ sense). The SS model is an order of magnitude faster than the $14\xd728$D model and has less than twice the $L1$ discrepancy versus the MC model ($L1$ norm 76.2 versus 49.3 microseconds). While this difference in accuracy is statistically significant, it may not be practically significant, depending on the application (see section 6 for further discussion of this point).

### 5.2 Two-Sample Kolmogorov-Smirnov Test

In addition to using the $L1$-Wasserstein distances to test the differences between two CDFs, we can also make a pairwise comparison between each model by applying the Dvoretzky-Kiefer-Wolfowitz inequality (Dvoretzky, Kiefer, & Wolfowitz, 1956) and the two-sample Kolmogorov-Smirnov (KS) test (Kolmogorov, 1933; Smirnov, 1948). While the Wasserstein distance is based on the $L1$ norm, the KS statistic is based on the $L\u221e$ (or supremum) norm.

Figure 9 plots the logarithm of ratio of the two-sample KS statistics, $Dn,mRn,m(0.01)$, for Fox94 (Fox & Lu, 1994), Goldwyn (Goldwyn & Shea-Brown, 2011), Dangerfiled (Dangerfield et al., 2012), Orio (Orio & Soudry, 2012), 14D (the $14\xd728$D model we proposed in section 3.2). Data of “self-comparison” (e.g. Fox94 versus Fox 94) was obtained by comparing two ISI ECDFs from independent simulations. As shown in Figure 9, models that we previously proved were pathwise equivalent in section 4, namely, the Fox94, Orio, and 14D models, are not distinguishable at any confidence level justified by our data. Note that those three models use the same boundary conditions (free boundary condition as in Orio & Soudry, 2012) and the ratio $Dn,m/Rn,m(\alpha )$ of pairwise comparison is on the same magnitude of that for the self-comparisons. The models of Fox and Lu and Orio and Soudry, and our 14D model generate indistinguishable ISI distributions but are distinguishable from Dangerfield's model and Fox's 97 model. Thus, as the bottom panel of Figure 9 shows, Fox94, Orio, and $14\xd728$D form a block of statistically indistinguishable samples. However, as pointed out above, these statistically equivalent simulation algorithms have different computational efficiencies (see Figure 8). Among these methods, Orio and Soundry's algorithm (14-dimensional state space with 14 undirected edges as noise sources) and our method (14-dimensional state space with 28 directed edges as noise sources) have similar efficiencies, with Orio's method being about 5% faster than our method. Our $14\xd728$D method provides the additional advantage that it facilitates further acceleration under the stochastic shielding approximation (see section 6).

In contrast to the statistically equivalent Orio, $14\xd728$D, and Fox94 models, algorithms using different boundary conditions are not pathwise equivalent, which is again verified in Figure 9. Algorithms with subunit approximation and truncated boundary condition (i.e., “Goldwyn”) and the reflecting boundary condition (i.e., “Dangerfield”) are significantly different in accuracy (and, in particular, they are less accurate) than models in the $14\xd728$D class.

## 6 Discussion and Conclusions

### 6.1 Summary

The exact method for Markov chain (MC) simulation for an electrotonically compact (single compartment) conductance-based stochastic model under current clamp is a hybrid discrete (channel state)/continuous (voltage) model of the sort used by Clay and DeFelice (1983), Newby et al. (2013), and Anderson et al. (2015). While MC methods are computationally expensive, simulations based on gaussian/Langevin approximation can capture the effects of stochastic ion channel fluctuations with reasonable accuracy and excellent computational efficiency. Since Goldwyn and Shea Brown's work focusing the attention of the computational neuroscience community on Fox and Lu's Langevin algorithm for the Hodgkin-Huxley system (Fox & Lu, 1994; Goldwyn & Shea-Brown, 2011), several variants of this approach have appeared in the literature.

In this article, we advocate for a class of models combining the best features of conductance-based Langevin models with the recently developed stochastic shielding approximation (Schmandt & Galán, 2012; Schmidt & Thomas, 2014; Schmidt et al., 2018). We propose a Langevin model with a 14-dimensional state space, representing the voltage, five states of the K$+$ channel, and eight states of the Na$+$-channel; and a 28-dimensional representation of the driving noise: one independent gaussian noise term for each directed edge in the channel-state transition graph. We showed in section 2 that the corresponding mean-field 14D ordinary differential equation model is consistent with the classical HH equations in the sense that the latter correspond to an invariant submanifold of the higher-dimensional model, to which all trajectories converge exponentially quickly. Figure 2 illustrated the relation between the deterministic 4D and 14D Hodgkin-Huxley systems. Building on this framework, we introduced the $14\xd728$D model, with independent noise sources corresponding to each ion channel transition (see section 3). We proved in section 4 that given identical boundary conditions, our $14\xd728$D model is pathwise equivalent to Fox and Lu's original Langevin model and to a 14-state model with 14 independent noise sources due to Orio and Soudry (2012).

The original 4D HH model, the 14D deterministic HH model, and the family of equivalent 14D Langevin models we consider here form a nested family, each contained within the next. Specifically, the 14D ODE model is the mean-field version of the 14D Langevin model, and the 4D ODE model forms an attracting invariant submanifold within the 14D ODE model, as we establish in our lemma 2. So in a very specific sense, the original HH equations “live inside” the 14D Langevin equations. Thus, these three models enjoy a special relationship. In contrast, the 4D Langevin equations studied in Fox (1997) do not bear an especially close relationship to the other three.

In addition to rigorous mathematical analysis, we also performed numerical comparisons (see section 5) showing that, as expected, the pathwise equivalent models produced statistically indistinguishable interspike interval (ISI) distributions. Moreover, the ISI distributions for our model (and its equivalents) were closer to the ISI distribution of the gold standard MC model under two different metric space measures. Our method (along with Orio and Soudry's) proved computationally more efficient than Fox and Lu's original method and Dangerfield's model (Dangerfield et al., 2012). In addition, our method lends itself naturally to model reduction (via the stochastic shielding approximation) to a significantly faster $14\xd76$D simulation that preserves a surprisingly high level of accuracy.

### 6.2 Discrete Gillespie Markov Chain Algorithms

The discrete-state MC algorithm due to Gillespie is often taken to be the gold standard simulation for a single-compartment stochastic conductance-based model. Most former literature on Langevin HH models (Goldwyn & Shea-Brown, 2011; Linaro et al., 2011; Orio & Soudry, 2012; Huang et al., 2013), when establishing a reference MC model, consider a version of the discrete Gillespie algorithm that assumes a piecewise-constant propensity approximation, that is, it does not take into account that the voltage changes between transitions, which changes the transition rates. This approximation can lead to biophysically unrealistic voltage traces for very small system sizes (see Figure 2 of Kispersky & White, 2008; top trace with $N=1$ ion channel) although the differences appear to be mitigated for $N\u227340$ channels (Anderson et al., 2015). In this article our MC simulations are based on 6000 Na$+$ and 1800 K$+$ channels (as in Goldwyn & Shea-Brown, 2011), and we too use the ISI distribution generated by a piecewise-constant propensity MC algorithm as our reference distribution. As shown in Table 3 and Figure 8, the computation time for MC is one order of magnitude larger than efficient methods such as Orio and Soudry (2012) and Dangerfield et al. (2012) and the $14\xd728$D model. The computational cost of the MC model increases dramatically as the number of ion channels grows; therefore, even the approximate MC algorithm is inapplicable for a large number of channels.

### 6.3 Langevin Models

It is worth pointing out that the accuracy of Fox and Lu's original Langevin equations has not been fully appreciated. In fact, Fox and Lu's model (Fox & Lu, 1994) gives an approximation to the MC model that is just as accurate as that of Orio and Soudry (2012) both in the gating variable statistics (Goldwyn & Shea-Brown, 2011) and the ISI distribution sense (see section 5), because, as we have established, these models are pathwise equivalent! However, the original implementation requires taking a matrix square root in every time step in the numerical simulation, which significantly reduces its computational efficiency.

Models based on modifications of Fox and Lu's (1994) work can be divided into three classes: the subunit model (Fox, 1997), effective noise models (Linaro et al., 2011; Güler, 2013b), and channel-based Langevin models (e.g., Goldwyn & Shea-Brown, 2011; Dangerfield et al., 2012; Orio & Soudry, 2012; Pezo et al., 2014; Huang et al., 2013).

#### 6.3.1 Subunit Model

The first modification of Fox and Lu's model is the subunit model (Fox, 1997), which keeps the original form of the HH model, and adds noise to the gating variables ($m,h,andn$) (Fox, 1997; Goldwyn & Shea-Brown, 2011). The subunit approximation model was widely used because of its fast computational speed. However, as Bruce (2009) and others pointed out, the inaccuracy of this model remains significant even for large number of channels. Moreover, Goldwyn and Shea-Brown (2011) and Huang et al. (2015) found that the subunit model fails to capture the statistics of the HH Na$+$ and K$+$ gates. In this article, we also observed that the subunit model is more efficient than channel-based Langevin models but tends to delay spike generation. As shown in Figure 7, the subunit model generates significantly longer ISIs than the MC model.

#### 6.3.2 Effective Noise Models

Another modification to Fox and Lu's algorithm is to add colored noise to the channel open fractions. Though colored noise models such as those of Linaro et al. (2011) and Güler (2013b) are not included in our model comparison, Huang et al. (2015) found that both of these effective noise models generate shorter ISIs than the MC model with the same parameters. Though the comparison we provided in section 5 only include the Fox and Lu 94, Fox97, Goldwyn, Dangerfield, Orio, SS, and the $14\xd728$D models, combining the results from Goldwyn and Shea-Brown (2011) and Huang et al. (2015), the $14\xd728$D model could be compared to a variety of models (e.g., Fox & Lu, 1994; Fox, 1997; Goldwyn & Shea-Brown, 2011; Linaro et al., 2011; Orio & Soudry, 2012; Dangerfield et al., 2012; Huang et al., 2013; Güler, 2013b).

#### 6.3.3 Channel-based Langevin Models

The main focus of this article is the modification based on the original Fox and Lu's matrix decomposition method, namely, the channel-based (or conductance-based) Langevin models. We proved in section 4 that under the same boundary conditions, Fox and Lu's original model, Orio's model, and our $14\xd728$D model are pathwise equivalent, which was also verified from our numerical simulations in sections 4 and 5. In section 4 we discussed channel-based Langevin models (Fox & Lu, 1994; Goldwyn & Shea-Brown, 2011; Dangerfield et al., 2012; Orio & Soudry, 2012; Fox, 2018). We excluded Fox's more recent implementation (Fox, 2018) in section 5 for two reasons. First, the algorithm is pathwise equivalent to others considered there. Moreover, the method is vulnerable to numerical instability when the Cholesky decomposition is performed. Specifically, some of the elements in the $S$ matrix from the Cholesky decomposition in Fox (2018) involve square roots of differences of several quantities, with no guarantee that the differences will result in nonnegative terms—even with strictly positive value of the gating variables. Nevertheless, this model would be in the equivalence class and in any case would not be more efficient than the Orio model because of the noise dimension and complicated operations (involving taking multiple square roots) in each step.

### 6.4 Model Comparisons

If two random variables have similar distributions, then they will have similar moments, but not vice versa. Therefore, comparison of the full interspike-interval distributions produced by different simulation algorithms gives a more rigorous test than comparison of first and second moments of the ISI distribution. Most previous evaluations of competing Langevin approximations were based on the accuracy of low-order moments, for example, the mean and variance of channel state occupancy under voltage clamp, or the mean and variance of the interspike interval distribution under current clamp (Goldwyn et al., 2011; Goldwyn & Shea-Brown, 2011; Schmandt & Galán, 2012; Linaro et al., 2011; Dangerfield et al., 2012; Orio & Soudry, 2012; Huang et al., 2013, 2015). Here, we compare the accuracy of the different algorithms using the full ISI distribution but using the $L1$ norm of the difference (Wasserstein metric) and the $L\u221e$ norm (Kolmogorov-Smirnov test). Greenwood, McDonnell, and Ward (2015) previously compared the ISI distributions generated by the Markov chain (Gillespie algorithm) to the distribution generated by different types of Langevin approximations (LA), including the original Langevin models (Fox & Lu, 1994; Goldwyn & Shea-Brown, 2011), the channel-based LA with colored noise (Linaro et al., 2011; Güler, 2013b), and LA with a $14\xd714$ variant of the diffusion coefficient matrix $S$ (Orio & Soudry, 2012). They concluded that Orio and Soudry's method provided the best match to the Markov chain model, specifically, “Fox-Goldwyn, and Orio-Kurtz^{8} methods both generate ISI histograms very close to those of “Micro.”^{9} (Greenwood et al., 2015). We note that the comparison reported in this article simply superimposed plots of the ISI distributions, allowing a qualitative comparison, while our metric-space analysis is fully quantitative. In any case, their conclusions are consistent with our findings; we showed in section 4 that the Fox-Goldwyn and the Orio-Kurtz model are pathwise equivalent (when implemented with the same boundary conditions), which accounts for the similarity in the ISI histograms they generate. In fact, because of pathwise equivalence, we can conclude that the true distributions for these models are identical, and any differences observed just reflect finite sampling.

### 6.5 Stochastic Shielding Method

The stochastic shielding (SS) approximation (Schmandt & Galán, 2012) provides an efficient and accurate method for approximating the Markov process with only a subset of observable states. For conductance-based models, rather than aggregating ion channel states, SS affects dimension reduction by selectively eliminating those independent noise sources that have the least impact on current fluctuations. Recent work in (Pezo, Soudry, & Orio, 2014) compared previous methods (Gillespie, 1977; Orio & Soudry, 2012; Dangerfield et al., 2012; Huang et al., 2013; Schmandt & Galán, 2012) in accuracy, applicability, and simplicity, as well as computational efficiency. They concluded that for mesoscopic numbers of channels, stochastic shielding methods combined with diffusion approximation methods can be an optimal choice. Like Orio and Soudry (2012), the stochastic shielding method proposed by Pezo et al. (2014) also assumed a detailed balance of transitions between adjacent states and used edges that are directly connected to the open gates of HH Na$+$ and K$+$. We calculated the edge importance in section 3.3 and found that the 4 (out of 20) most important directed edges for the Na$+$ gates are not the 4 edges directly connected to the conducting state, as assumed in previous application of the SS method (Schmandt & Galán, 2012).

### 6.6 Which Model to Use?

Among all modifications of Fox and Lu's method considered here, Orio and Soudry's approach and our $14\xd728$D model provide the best approximation to the gold standard MC model, with the greatest computational efficiency. Several earlier models were studied in the review paper by Goldwyn and Shea-Brown (2011), where they rediscovered that the original Langevin model proposed by Fox and Lu is the best approximation to the MC model among those considered. Later work (Huang et al., 2015) further surveyed a wide range of Langevin approximations for the HH system (Fox & Lu, 1994; Fox, 1997; Goldwyn & Shea-Brown, 2011; Linaro et al., 2011; Güler, 2013b; Orio & Soudry, 2012; Huang et al., 2013) and explored models with different boundary conditions. Huang et al. (2015) concluded that the bounded and truncated-restored Langevin model (Huang et al., 2013) and the unbounded model (Orio & Soudry, 2012) provide the best approximation to the MC model.

As shown in sections 4 and 5, the $14\xd728$D Langevin model naturally derived from the channel structure is pathwise equivalent to the Fox94, Fox18, and Orio-Soudry models under the same boundary conditions. The $14\xd728$D model is more accurate than the reflecting boundary condition method of Dangerfield et al. (2012) and also better than the approximation method proposed by Goldwyn and Shea-Brown (2011) when the entire ISI distribution is taken into account. We note that Huang et al. (2015) treated Goldwyn's method (Goldwyn & Shea-Brown, 2011) as the original Fox and Lu model in their comparison; however, the simulation in Goldwyn and Shea-Brown (2011) uses the 4D multinomial submanifold to update gating variables. Our analysis and numerical simulations suggest that the original Fox and Lu model is indeed as accurate as the Orio-Soudry model, although the computational cost remains a major concern.

Though the $14\xd728$D model has similar efficiency and accuracy as that of Orio and Soudry (2012), it has several advantages. First, the rectangular $S$ matrix (in equations 1.2 and 1.3) in Orio's model merges the noise contributions of reciprocal pairs of edges. However, this dimension reduction assumes, in effect, that detailed balance holds along reciprocal edges, which our results show is not the case, under current clamp (see Figure 5). Moreover, the $14\xd728$D model arises naturally from the individual transitions of the exact evolution equations (see equations 3.1 and 3.2) for the underlying Markov chain model, which makes it conceptually easier to understand. In addition, our method for defining the $14\xd728$D Langevin model and finding the best SS model extends to channel-based models with arbitrary channel gating schemes beyond the standard HH model. Given any channel state transition graph, the Langevin equations may be read off from the transitions, and the edge importance under current clamp can be evaluated by applying the stochastic shielding method to investigate the contributions of noise from each individual directed edge. Finally, in exchange for a small reduction in accuracy, the stochastic shielding method affords a significant gain in efficiency. The $14\xd728$D model thus offers a natural way to quantify the contributions of the microscopic transitions to the macroscopic voltage fluctuations in the membrane through the use of stochastic shielding. For general ion channel models, extending a biophysically based Langevin model analogous to our $14\xd728$D HH model, together with the stochastic shielding method, may provide the best available tool for investigating how unobservable microscopic behaviors (such as ion channel fluctuations) affect the macroscopic variability in many biological systems.

### 6.7 Limitations

All Langevin models, including our proposed $14\xd728$D model, proceed from the assumption that the ion channel population is large enough (and the ion channel state transitions frequent enough) that the gaussian approximations by which the white noise forcing terms are derived are justified. Thus, when the system size is too small, no Langevin system will be appropriate. Fortunately the Langevin approximation appears quite accurate for realistic population sizes.

The $14\xd728$D model uses more noise sources than other approaches. However, stochastic shielding allows us to jettison noise sources that do not significantly affect the system dynamics (the voltage fluctuations and ISI distribution). Moreover, in order to compare the ISI distribution in detail among several variants of the Fox94’ model versus the Markov chain standard, we have considered a single value of the driving current, while other studies have compared parameterized responses such as the firing rate, ISI variance, or other moments, as a function of applied current. Accurate comparisons require large ensembles of independent trajectories, forcing a trade-off between precision and breadth; we opted here for precise comparisons at a representative level of the driving current.

From a conceptual point of view, a shortcoming of most Langevin models is the tendency for some channel state variables $x$ to collide with the domain boundaries $x\u2208[0,1]$ and to cross them during numerical simulations with finite time steps. We adopted the approach advocated by Orio and Soudry (2012) of using “free boundaries” in which gating variables can make excursions into the (unphysical) range $x<0$ or $x>1$. Practically, these excursions are always short if the time step is reasonably small, as they tend to be self-correcting.^{10} Another approach is to construct reflecting boundary conditions; different implementations of this idea were used in Dangerfield et al. (2010), Fox and Lu (1994), and Schmid, Goychuk, and Hänggi (2001). Dangerfield's method proved both slower and less accurate than the free boundary method. As an alternative method, one uses a biased rejection sampling approach, testing each gating variable of the 14D model on each time step and repeating the noise sample for any time step violating the domain conditions (Fox & Lu, 1994; Schmid et al., 2001). We found that this method had accuracy similar to that of Dangerfield's method ($L1$-Wasserstein difference $\u22484.4e-1$ msec; see Table 3) and run time similar to that of the Fox94 implementation, about four times slower than our 14D Langevin model.

Yet another approach that in principle can guarantee that the stochastic process remains within proscribed bounds, rederives a “best diffusional approximation” Fokker-Planck equation based on matching a master-equation birth-and -death description of a (binomial) population of two-state ion channels, leading to modified drift and diffusion terms (Goychuk, 2014). This method does not appear to extend readily to the 14D setting, with underlying multinomial structure of the ion channel gates, so we do not dwell on it further.

Table 3 gives the accuracies with which each model reproduces the ISI distribution, compared to a standard reference distribution generated through a large number of samples of the MC method. The mean $L1$ difference between a single sample and the reference sample is about 0.227 microseconds. For a nonnegative random variable $T\u22650$, the difference in the mean under two probability distributions is bounded above by the $L1$ difference in their cumulative distribution functions.^{11} Thus, the $L1$ norm gives an idea of the temporal accuracy with which one can approximate a given distribution by another. The mean difference between the ISI distribution generated by a single run of the full $14\xd728$D model is about 49 $\mu $sec, and the discrepancy produced by the (significantly faster) SS model is about 76 $\mu $sec. When would this level of accuracy matter for the function of a neuron within a network? The barn owl *Tyto alba* uses interaural time difference to localize its prey to within 1 to 2 degrees, a feat that requires encoding information in the precise timing of auditory system action potentials at the scale of 5 to 20 microseconds (Moiseff & Konishi, 1981; Gerstner, Kempter, Van Hemmen, & Wagner, 1996). For detailed studies of the effects of channel noise in this system, the superior accuracy of the MC model might be preferred. On the other hand, the timescale of information encoding in the human auditory nerve is thought to be in the millisecond range (Goldwyn, Shea-Brown, & Rubinstein, 2010), with precision in the feline auditory system reported as low as 100 $\mu $s (Imennov & Rubinstein, 2009; see also Woo, Miller, & Abbas, 2009). For these and other mammalian systems, the stochastic shielding approximation should provide sufficient accuracy.

### 6.8 Future Work

In section 3.3 we compared the ISI variance when noise was included one edge at a time and found that the edges making the greatest contribution to population fluctuations under voltage clamp were not identical to the edges having the largest effect on ISI variance when taken one at a time. However, the ISI, considered as a random variable determined through a first-passage time process, depends on the entire trajectory, not just on the occupancy of the conducting states. The HH dynamics are strongly nonlinear, producing a limit cycle in the deterministic case, and it is not immediately clear whether the effects of channel noise on ISI variability should be additive. In future work, we plan to address the question of the additive contribution of individual/molecular noise sources on ISI variability.

A principal motivation for using the stochastic shielding algorithm is to develop fast and accurate algorithms for ensemble simulations of forward models for parameter estimation in a data assimilation framework. We expect that our method may prove useful for such studies based on current-clamp electrophysiological data in the future.

## Appendix A: Table of Common Symbols and Notations

Symbol . | Meaning . |
---|---|

$C$ | Membrane capacitance ($\mu F/cm2$) |

$v$ | Membrane potential ($mV$) |

$VNa,VK,VL$ | Ionic reversal potential for Na$+$, K$+$and leak ($mV$) |

$Iapp$ | Applied current to the membrane ($nA/cm2$) |

$m,h,n$ | Dimensionless gating variables for Na$+$ and K$+$ channels |

$\alpha x,\beta xx\u2208{m,n,h}$ | Voltage-dependent rate constant ($1/msec$) |

$x$ | Vector of state variables |

$M=[M1,M2,\cdots ,M8]$ | Eight-component state vector for the Na$+$ gates |

$[m00,m10,m20,m30,$ | Components for the Na$+$gates |

$m01,m11,m21,m31]\u22ba$ | |

$N=[N1,N2,\cdots ,N5]$ | Five-component state vector for the K$+$ gates |

$[n0,n1,n2,n3,n4]\u22ba$ | Components for the K$+$ gates |

$Mtot,Ntot$ | Total number of Na$+$ and K$+$ channels |

$X$ | 4-dimensional manifold domain for 4D HH model |

$Y$ | 14-dimensional manifold domain for 14D HH model |

$\Delta k$ | $k$-dimensional simplex in $Rk+1$ given by $y1+\u2026+yk+1=1,yi\u22650$ |

$M$ | Multinomial submanifold within the 14D space |

$ANa$, $AK$ | State-dependent rate matrix |

$D$ | State diffusion matrix |

$S$, $S1$, $S2$, $SNa$, $SK$ | Noise coefficient matrices |

$\xi $ | Vector of independent $\delta $-correlated gaussian white noise with zero mean and unit variance |

$X=[X1,X2,\u2026,Xd]$ | A $d$-dimensional random variable for sample path |

$W=[W1,W2,\u2026,Wn]$ | A Wiener trajectory with $n$ components |

$\delta (\xb7)$ | The Dirac delta function |

$\delta ij$ | The Kronecker delta |

$Fn$ | Empirical cumulative distribution function with $n$ observations (in section 5, we use $m,n$ as sample sizes) |

Symbol . | Meaning . |
---|---|

$C$ | Membrane capacitance ($\mu F/cm2$) |

$v$ | Membrane potential ($mV$) |

$VNa,VK,VL$ | Ionic reversal potential for Na$+$, K$+$and leak ($mV$) |

$Iapp$ | Applied current to the membrane ($nA/cm2$) |

$m,h,n$ | Dimensionless gating variables for Na$+$ and K$+$ channels |

$\alpha x,\beta xx\u2208{m,n,h}$ | Voltage-dependent rate constant ($1/msec$) |

$x$ | Vector of state variables |

$M=[M1,M2,\cdots ,M8]$ | Eight-component state vector for the Na$+$ gates |

$[m00,m10,m20,m30,$ | Components for the Na$+$gates |

$m01,m11,m21,m31]\u22ba$ | |

$N=[N1,N2,\cdots ,N5]$ | Five-component state vector for the K$+$ gates |

$[n0,n1,n2,n3,n4]\u22ba$ | Components for the K$+$ gates |

$Mtot,Ntot$ | Total number of Na$+$ and K$+$ channels |

$X$ | 4-dimensional manifold domain for 4D HH model |

$Y$ | 14-dimensional manifold domain for 14D HH model |

$\Delta k$ | $k$-dimensional simplex in $Rk+1$ given by $y1+\u2026+yk+1=1,yi\u22650$ |

$M$ | Multinomial submanifold within the 14D space |

$ANa$, $AK$ | State-dependent rate matrix |

$D$ | State diffusion matrix |

$S$, $S1$, $S2$, $SNa$, $SK$ | Noise coefficient matrices |

$\xi $ | Vector of independent $\delta $-correlated gaussian white noise with zero mean and unit variance |

$X=[X1,X2,\u2026,Xd]$ | A $d$-dimensional random variable for sample path |

$W=[W1,W2,\u2026,Wn]$ | A Wiener trajectory with $n$ components |

$\delta (\xb7)$ | The Dirac delta function |

$\delta ij$ | The Kronecker delta |

$Fn$ | Empirical cumulative distribution function with $n$ observations (in section 5, we use $m,n$ as sample sizes) |

Symbol . | Meaning . | Value . |
---|---|---|

$C$ | Membrane capacitance | 1 $\mu F/cm2$ |

$g\xafNa$ | Maximal sodium conductance | 120 $\mu S/cm2$ |

$g\xafK$ | Maximal potassium conductance | 36 $\mu S/cm2$ |

$gleak$ | Leak conductance | 0.3 $\mu S/cm2$ |

$VNa$ | Sodium reversal potential for Na$+$ | 50 $mV$ |

$VK$ | Potassium reversal potential for K$+$ | $-$77 $mV$ |

$Vleak$ | Leak reversal potential | $-$54.4 $mV$ |

$Iapp$ | Applied current to the membrane | 10 $nA/cm2$ |

$A$ | Membrane area | $100\mu m2$ |

$Mtot$ | Total number of Na$+$ channels | 6,000 |

$Ntot$ | Total number of K$+$ channels | 18,00 |

Symbol . | Meaning . | Value . |
---|---|---|

$C$ | Membrane capacitance | 1 $\mu F/cm2$ |

$g\xafNa$ | Maximal sodium conductance | 120 $\mu S/cm2$ |

$g\xafK$ | Maximal potassium conductance | 36 $\mu S/cm2$ |

$gleak$ | Leak conductance | 0.3 $\mu S/cm2$ |

$VNa$ | Sodium reversal potential for Na$+$ | 50 $mV$ |

$VK$ | Potassium reversal potential for K$+$ | $-$77 $mV$ |

$Vleak$ | Leak reversal potential | $-$54.4 $mV$ |

$Iapp$ | Applied current to the membrane | 10 $nA/cm2$ |

$A$ | Membrane area | $100\mu m2$ |

$Mtot$ | Total number of Na$+$ channels | 6,000 |

$Ntot$ | Total number of K$+$ channels | 18,00 |

## Appendix B: Parameters and Transition Matrices

## Appendix C: Proof of Lemma 2

For the reader's convenience, we restate this lemma.

Let $X$ and $Y$ be the lower-dimensional and higher-dimensional Hodgkin-Huxley manifolds given by equation 2.12, and let $F$ and $G$ be the vector fields on $X$ and $Y$ defined by equations 2.1 to 2.4 and 2.7 to 2.9, respectively. Let $H:X\u2192M\u2282Y$ and $R:Y\u2192X$ be the mappings given in Tables 2 and 1, respectively, and define the multinomial submanifold $M=H(X)$. Then $M$ is forward-time-invariant under the flow generated by $G$. Moreover, the vector field $G$, when restricted to $M$, coincides with the vector field induced by $F$ and the map $H$. That is, for all $y\u2208M$, $G(y)=DxH(R(y))\xb7F(R(y)).$

The main idea of the proof is to show that for every $y\u2208Y$, $G(y)$ is contained in the span of the four vectors $\u2202H\u2202xi(R(y))i=14.$

## Appendix D: Diffusion Matrix of the 14D Model

## Notes

^{1}

That is, distributions indexed by a single open probability $n$, with the five states having probabilities $4ini(1-n)4-ifor0\u2264i\u22644$.

^{2}

That is, distributions indexed by two open probabilities $m$ and $h$, with the eight states having probabilities $3imi(1-m)3-ihj(1-h)1-j,for0\u2264i\u22643,and0\u2264j\u22641$.

^{3}

We annotate the stochastic population vector $M$ as either $[M00,M10,\u2026,M31]$ or as $[M1,\u2026,M8]$, whichever is more convenient. In either notation $M31\u2261M8$ is the conducting state of the Na$+$channel. For the K$+$channel, $N4$ denotes the conducting state.

^{4}

We write $eiNa$ and $eiK$ for the $i$th standard unit vector in $R8$ or $R5$, respectively.

^{5}

The convergence of the discrete channel system to a Langevin system under voltage clamp is a special case of Kurtz's theorem (Kurtz, 1981).

^{6}

The forward Euler method is first-order accurate for ordinary differential equations, but the forward Euler-Maruyama method is only $O(dt)$ accurate for stochastic differential equations (Kloeden & Platen, 1999).

^{7}

Run times in Table 3, rounded to the nearest integer number of seconds, were obtained by averaging the run times on a distribution of heterogeneous compute nodes from Case Western Reserve University's high-performance computing cluster.

^{8}

We refer to this model as “Orio.”

^{9}

This is the model we refer to as the Markov chain model.

^{10}

To avoid complex entries, we use $|x|$ when calculating entries in the noise coefficient matrix.

^{11}

For a nonnegative random variable $T$ with cumulative distribution function $F(t)=P[T\u2264t]$, the mean satisfies $E[T]=\u222b0\u221e(1-F(t))dt$ (Grimmett & Stirzaker, 2005). Therefore, the difference in mean under two distributions $F1$ and $F2$ satisfies $|E1[T]-E2[T]|=\u222b0\u221eF1(t)-F2(t)dt\u2264\rho 1(F1,F2).$

## Acknowledgments

We thank David Friel (Case Western Reserve University) for illuminating discussions of the impact of channel noise on neural dynamics and the anonymous referees for many detailed and constructive comments.

This work was made possible in part by grants from the National Science Foundation (DMS-1413770 and DEB-1654989) and the Simons Foundation. P.T. thanks the Oberlin College Department of Mathematics for research support. This research has been supported in part by the Mathematical Biosciences Institute and the National Science Foundation under grant DMS-1440386. Large-scale simulations made use of the High Performance Computing Resource in the Core Facility for Advanced Research Computing at Case Western Reserve University.