Abstract

Control in the natural environment is difficult in part because of uncertainty in the effect of actions. Uncertainty can be due to added motor or sensory noise, unmodeled dynamics, or quantization of sensory feedback. Biological systems are faced with further difficulties, since control must be performed by networks of cooperating neurons and neural subsystems. Here, we propose a new mathematical framework for modeling and simulation of distributed control systems operating in an uncertain environment. Stochastic differential operators can be derived from the stochastic differential equation describing a system, and they map the current state density into the differential of the state density. Unlike discrete-time Markov update operators, stochastic differential operators combine linearly for a large class of linear and nonlinear systems, and therefore the combined effects of multiple controllable and uncontrollable subsystems can be predicted. Design using these operators yields systems whose statistical behavior can be specified throughout state-space. The relationship to Bayesian estimation and discrete-time Markov processes is described.

1.  Introduction

Biological control systems operate in an uncertain world. When a motor command is executed, the perceived result of the action depends on the plant dynamics, the effect of unobserved controllers and noise acting in the environment, and the quality of the sensory data. In most cases, the controller has only an approximate knowledge of these effects, but it must nevertheless choose an action that accomplishes the desired task. Because of the effect of unmeasured state and random noise, observed movement appears variable, and when the variability can be estimated, control inputs can be chosen to minimize the expected cost (Braun, Aertsen, Wolpert, & Mehring, 2009; Todorov, 2009). However, this optimization problem is mathematically difficult and can usually be solved only for specific types of systems, such as linear systems with gaussian added noise.

The impact of unobserved controllers is significantly heightened in neural control systems, because from the point of view of any single part of the brain, the control exerted by other parts of the brain may be only incompletely known or observable only after a delay. Nevertheless, each component of the full system must choose actions that optimize the overall behavior, even when presented with only partial knowledge of the actions chosen by other components. In the most extreme case, a single neuron must adapt its connectivity so that its effect on global behavior, however small, nevertheless contributes to desired performance. Current methods in nonlinear and stochastic control theory have not been applied to this problem, perhaps because of high complexity. Here, we present a mathematical framework that allows modeling of the effect of actions deep within a system on the stochastic behavior of the global system. The framework is based on stochastic differential operators, and we show that these operators exhibit linear superposition, so that the combined effect of multiple actions can be determined easily using linear operations. This has important consequences for modeling local and global system behavior, interpreting the effects of neurons within the system, and computing optimal actions.

In many control applications, uncertainty is modeled by the addition of noise. However, there are other types of uncertainty, including state uncertainty due to quantization and output uncertainty due to unmodeled effects of control variables. The uncertainty in the effects of controls may depend on the current state or the choice of control, so even an approximate model using additive noise would require the noise to be both state and control dependent. It is possible to write such models, but their explicit solution is usually impractical. Here, we will assume a model that includes environment dynamics and the dynamics of multiple control inputs that has the form of an Ito stochastic equation,
formula
1.1
where f(x) describes the environment dynamics, gi(x) is the effect of control ui on the state x, hi(x, ui) is the effect of the state and control on the noise, and dBi indicates the differential of a set of independent noise processes such as Brownian motion. Since this equation is stochastic, we might wish to know not just the expected or most likely sample path E[x(t)|U(t)], but rather the full probability density of the state at any time p(x(t)|U(t)) where U(t) indicates the full control trajectory up to time t.
Examination of equation 1.1 shows that terms gi(x)uidt contribute tunable dynamics in the form of a superposition of controllers. For example, if there are no environmental dynamics and no added noise so that f(x) = hi(x) = 0, then we have the ordinary differential equation,
formula
1.2
which is a linear superposition of a set of dynamics of the form . In this simplified case, the goal of control will be to find a set of functions of time ui(t) such that a time-varying superposition of the form approximates a desired closed-loop dynamics . In the following, we will extend this idea to the more general stochastic case in equation 1.1. These results will extend previous work on superposition of controllers (Sanger, 2010a, 2010b), by showing the relation to Bayes’ rule and Markov processes, examining basic properties including commutativity of operators, using an inner-product space for the set of differential operators to derive preliminary results on feedback control and optimization, and showing the application of operator superposition to systems with second-order dynamics.

2.  Background

Much recent work has examined the effect of sensory uncertainty on motor control. Bayes’ rule has been used to model the combination of prior knowledge of state with measurement, and the effect on control is determined by the effect of the residual uncertainty in state on the variability of control (Toussaint & Storkey, 2006). It is tempting to consider whether Bayes’ rule could be used to model the effects of other forms of uncertainty in control. Here we will show that Bayes’ rule is not adequate for this purpose, and discrete-time or continuous-time Markov processes must be used. Our goal in this section is to illustrate the links among Bayes’ rule, discrete-time Markov processes, and continuous-time Markov processes described by stochastic differential equations. The underlying mathematics are well known, but the explicit description of the links will be helpful to motivate the use of stochastic differential operators in the next section.

2.1.  Bayes’ Rule and Markov Processes

To show the differences between Bayes’ rule and Markov processes, it is helpful to write both as matrix equations. We use p(x) to indicate a vector of probabilities p(x = i) or a probability density function (d/dx)Prob(Xx). Therefore, p(x) ⩾ 0 and ∑ip(x = i) = 1, or ∫p(x) dx = 1. Then Bayes’ rule is written as
formula
2.1
and it is straightforward to interpret this equation as a mapping from the prior density p(x) to the posterior density p(x|y). This mapping describes the instantaneous effect of a measurement of a particular value of y on the density of x. If the variable x takes on a finite number of states, then we can write equation 2.1 as
formula
2.2
or, in matrix notation with ai(y) = p(y|x = i)/p(y),
formula
2.3
where the matrix A has diagonal elements Aii = ai(y), px = (p(x = 1), …, p(x = m)) is the vector of prior probabilities, and px = (p(x = 1|y), …, p(x = m|y)) is the vector of posterior probabilities.
Compare equation 2.3 to the update equation for a discrete-time, finite-state Markov process:
formula
2.4
In this case, the linear operator does not map the prior into the posterior distribution in the sense of Bayes, but rather it maps the previous probability density of the state into the density at the next point in time. In order for this to preserve the properties of densities, the elements of M must be nonnegative and the column sums must all be 1. In contrast, while the Bayes update matrix must have nonnegative elements, the column sums are almost never 1 (each column of the Bayes matrix 2.3 has only a single element). The Markov update matrix can have nonzero elements off the diagonal. Such elements indicate the probability of transitions between states, and they allow the probability of being in a state at time t + 1 to depend on the probability of being in other states at time t. Bayes’ rule allows no such dependence, since the posterior density for any particular value of x depends on only the prior density for that value of x. Furthermore, in Bayes’ rule, if the prior probability of state i is zero, then so is the posterior probability, whereas for a Markov update, the probability of being in state i at time t + 1 can be nonzero even if state i was impossible at time t.

Note that operators of the form 2.2 are diagonal and thus commute, whereas operators of the form 2.4 may or may not commute. Therefore, the order of estimates or state updates is important in the Markov formulation. When estimates commute, then they can be made in any order and produce the same result. When estimates do not commute, the interpretation of the second estimate depends on the outcome of the first. This commonly occurs (even in nonquantum systems) when estimation requires time or occurs at different points in time. For example, the change in position predicted by measuring an accelerometer will be highly dependent on whether a prior gyroscope measurement indicated a change in heading.

2.2.  Stochastic Differential Equations

Consider the Ito stochastic differential equation,
formula
2.5
where dB is the differential of unit variance Brownian motion. Brownian motion is random, so x is a random variable as well, with a probability density p(x) that evolves through time as described by the Fokker-Planck partial differential equation (Kac, 1949):
formula
2.6
We can write equation 2.6 as
formula
2.7
where is the linear operator,
formula
2.8
We will refer to this type of operator as a stochastic differential operator, and it is important to realize that while such operators describe the behavior of nonlinear stochastic systems, equation 2.7 is a linear and deterministic partial differential equation (Van Kampen, 2007). It is known that the Fokker-Planck equation (the second-order approximation to the Kramers-Moyal expansion derived from the master equation) maintains the properties of probability densities, including nonnegativity and integration to 1 (Risken & Vollmer, 1987).
We wish to compare the stochastic differential operator equation 2.7 to the Markov update rule 2.4. Brockett (2008) and others have pointed out that the sample paths of a continuous-time discrete state-space Markov process can be described by the stochastic equation
formula
2.9
where x takes on integer values and Nij is a set of Poisson-distributed counting processes with rates λij that trigger jumps from state j to state i. It is assumed that two processes never jump at exactly the same instant. From equation 2.9, we can construct a master equation for the change in probability over a short interval,
formula
2.10
where p(Nij) indicates the probability that the ijth process has a jump during interval Δt. Dividing by Δt and simplifying the notation slightly,
formula
2.11
As Δt → 0, we obtain
formula
2.12
where we have set λii ≐ −∑jiλji. Writing this in matrix notation,
formula
2.13
where L is a square matrix and p is a vector of elements Prob(x = i). (A more rigorous derivation can be found in Skorokhod, 1989.) Two important properties of L are that all off-diagonal elements are nonnegative, and the column sums are zero (therefore, the diagonal elements are nonpositive). These two properties are necessary and sufficient to ensure that if p(x, t = 0) is any valid initial probability density, then p(x,t) will remain a valid probability density (with nonnegative elements that sum to 1). We will refer to matrices with these properties as stochastic differential matrices. Note that the class of stochastic differential matrices is closed under addition and multiplication by nonnegative scalars.
If we start at an initial state p(0), then after time Δt, we have
formula
2.14
so if we identify M = eLΔt and Δt = 1, then we recover the Markov update rule p(1) = Mp(0). The Markov matrix M must be a stochastic matrix with all nonnegative elements and column sums of 1 in order to preserve probability densities.

3.  Operator Superposition

In this section, we consider the combined effect of two stochastic control systems acting on the same plant. A fundamental assumption is that the noise in the two systems is uncorrelated. We discuss this assumption further below.

3.1.  Summation of Operators

Suppose we have two stochastic processes, and the effect of each on the state (when the other is clamped at zero) is given by
formula
3.1
Assume further that the combined effect of both processes operating simultaneously is given by the sum
formula
3.2
which is associated with the Fokker-Planck equation,
formula
3.3
formula
3.4
if dB1 and dB2 are uncorrelated. Then equation 3.4 can be written as
formula
3.5
where
formula
3.6
which shows that linear combination of stochastic differential equations yields a linear combination of the partial differential equation for the time evolution of the probability density.

Linear superposition of the stochastic differential operators for physical processes with additive effects will hold in very general circumstances, requiring only differentiability of fp, twice differentiability of ggTp, and uncorrelated noise sources dBi. Note that the Fokker-Planck equation is linear in p, which means that for any single dynamics, linear combinations of initial state density have linear effects on the future state density. The importance of summation of operators is that the system is also linear in the dynamics, so that linear combinations of dynamic systems have linear effects on the state density. This fact is essential in the development below, because it allows linear methods to be applied to the design and optimization of dynamic systems.

The discrete-time Markov update rule p(t + 1) = Mp(t) does not have this linear superposition property, since the sum of the two probability densities p1(x(t + 1)) + p2(x(t + 1)) is not meaningful and could exceed 1. Therefore the discrete-time state density for a superposed process cannot be easily calculated from the individual discrete-time processes. Linear superposition is a property of differential equations of the form , but not difference equations of the form p(t + 1) = Mp(t).

In the discrete-space case, an additive superposition in the differential equation given by the matrix L1 + L2 yields a Markov update rule . If L1 and L2 commute, then
formula
3.7
showing that for commutative operators, addition in the differential domain yields composition of operators in the discrete-time domain. When operators do not commute, then , and this reflects the physical reality that simultaneous operation of L1 and L2 is not equivalent to L1 performed for Δt followed by L2 performed for Δt. Therefore, the semigroup of operators generated by sequential application of discrete-time Markov operators is not equivalent to the space of operators generated by instantaneous superposition of continuous-time stochastic operators, unless the continuous-time operators commute.

If stochastic operators are not superimposed, then all possible processes are described by sequential compositions of Markov operators , and this generates the usual operator semigroup. But if simultaneous superposition of operators is allowed in the differential domain, then this expands the operator semigroup by including additional operators of the form exp(∑Li). Interpreting this the other way, if operators cannot be sequenced (or selected in a time-dependent manner), then all possible processes are described by the sums ∑Li. But if sequencing is allowed in the discrete domain, then this expands the operator space by including operators in the (noncommutative) exponentiated space. By simultaneously considering the description of operators in the differential and discrete domain, it is much easier to model systems that have superimposed controllers but also discrete sequences of changes in control parameters.

3.2.  Space of Differential Operators

In order to use stochastic differential operators for control, we must be able to create flexible superpositions that achieve the desired dynamics. Implemented in a neural system, this would mean that subsystems individually implement specific dynamic behaviors, each described by an operator , and global behavior is created by choosing a weighted combination of the individual dynamic behaviors to yield a particular desired combined behavior.

Superpositions of operators yield sums of the form , where are operators defined on the space of square-integrable probability densities (nonnegative square-integrable functions on a compact set). To allow flexibility of control, we would like to extend this to a linear space with elements where αi is a set of nonnegative real weighting coefficients. The Fokker-Planck equation,
formula
3.8
corresponds to a stochastic differential equation of the form
formula
3.9
which means that linear scaling,
formula
3.10
yields a system in which the magnitude of the noise term dBi increases proportional to the square root of the control signal αi. (Equivalently, the variance of the noise increases proportional to αi.) We can use a weighted superposition of a large group of systems,
formula
3.11
to generate a flexible set of dynamics. If we do this, the total variance of the noise will be the sum of the control signals αi. For this type of controller, signal dependence of the noise arises directly from the use of control signals αi that weight the superposition of stochastic dynamic operators.

To maintain consistency, we will restrict αi ⩾ 0. This yields a linear half-space generated by the set of Li. Note that all elements of the space continue to satisfy the consistency requirement that the column sums of Li are zero. (In the continuous-state case, we require the integral of the density to remain 1 or, equivalently, the time derivative of the integral of the density to be zero, so ).

We now define a suitable inner product on the space of linear operators. We choose a family of inner products parameterized by the density p0, so that
formula
3.12
in the discrete-space case and
formula
3.13
in the continuous-space case. p0 will usually be the initial condition or a known intermediate state of the dynamics. This inner product reflects the extent to which two differential operators modify this particular initial condition in the same direction.
The induced norm is the 2-norm,
formula
3.14
in both discrete and continuous space.
With these definitions, it is now possible to use Gram-Schmidt orthogonalization to create an orthonormal basis Λi (with respect to a particular inner product) for the set of differential stochastic operators. For any desired operator L*, we can project onto the basis set to calculate
formula
3.15
where αi = <L*, Λi>. Using the p0 inner product will find a (nonunique) approximation to the desired dynamics starting from this particular initial condition.
Typically this will be used to determine the initial direction of movement from a sharp initial density. For example, if the initial state x0 is known, then in continuous state-space, p0(x) = δ(xx0). If the linear operators can be written in terms of linear kernels Ki(x, x′) so that
formula
3.16
then
formula
3.17
which in the discrete-space (matrix) case is just the inner product of the corresponding columns of the stochastic differential matrix.

A different matrix inner product <L1, L2>f can be defined for matrices with elements mij and nij as the element-wise dot product ∑ijmijnij, and the corresponding induced norm is the Frobenius norm ‖Lf. We will refer to <L1, L2>f as the Frobenius inner product.

3.3.  Closed-Loop Control

Suppose that we have a desired discrete-space behavior specified by a matrix operator L* with elements λ*ij. For each sharp initial condition p0(x) = ej (where ej is a vector with a 1 in the jth position and zeros elsewhere), the jth column of L* specifies the desired vector . Using standard methods from linear algebra, we can find a set of coefficients αij that minimize , where
formula
3.18
This set of coefficients forms a useful approximation only when starting from the initial condition ej. For a different initial condition, ek, for example, we would need a different set of coefficients αik. The natural solution is to make the mixing coefficients dependent on the current state, so that
formula
3.19
This equation indicates that the mixture of the operators Λi changes with the state x. Since the jth column of L is the response to the sharp initial condition x = j, equation 3.18 creates by forming a matrix each of whose columns is a different linear combination of the corresponding columns of Λi. The jth column of is a linear combination of the jth columns of the matrices Λi with the columns weighted by the factors αij. This new matrix has the approximate desired behavior for all sharp initial conditions x = j. The behavior of the closed-loop system is described by the density evolution .
Note that is not a linear combination of the basis operators Λi since each column is weighted differently. We refer to this as closed-loop control because the controller depends on measurement of the current state. In contrast, for open-loop control, the mixing factors will not be dependent on x, and therefore
formula
3.20
describes open-loop behavior, where βi are constant coefficients. Open-loop dynamics (see equation 3.20) are thus confined to the linear span of the Λi’s, whereas closed-loop dynamics (see equation 3.19) can be selected from the larger set of operators formed by different linear combinations of each column of the Λi’s. The coefficients βi of the best open-loop approximation to a desired dynamics L* are obtained by projecting L* onto the space of basis operators Λi using the Frobenius inner product.

In the continuous state-space case, closed-loop dynamics are given by equations of the form of 3.19 and 3.20. However, in this case, determination of the feedback functions α(x) will usually be based on the desired response to only a finite set of sharp initial conditions δ(xxk). Therefore, αi(x) is not unique. This means that the functions αi(x) can be chosen to lie within the span of a set of desired basis functions, which permits us to choose these functions to have smoothness or other computationally desirable properties.

3.4.  Feedback Control

As an example of the benefit of linear superposition of stochastic differential operators, consider an open-loop control problem with a single scalar control variable u:
formula
3.21
describes the unforced dynamics, and describes the controllable portion of the dynamics. For example, might be the dynamics determined by a neural subsystem that can be modulated by a control input u, while is the dynamics of all other components of the system.
Suppose that there is a known cost function V(x) on states x, so that the expected value of the cost is given by
formula
3.22
If we wish to reduce the cost over time, we want to choose dynamics such that
formula
3.23
is less than zero. Since all operators are linear, at any point in time with fixed and known p(x,t), the change in d/dtE[V] with changes in u is given by
formula
3.24
so if we know the sign of the right-hand side of equation 3.24, then we can increase or decrease u to reduce d/dtE[V]. Note that this equation applies only to the instantaneous change in expected cost for a measured value of p(x,t); over time, p(x,t) will depend on the choice of u, so that at future times.
Now suppose that we have a superposition of a large number of controllers, described by
formula
3.25
which means that the control inputs ui act by weighting the superposition of multiple controllers. In this case,
formula
3.26
so if we set
formula
3.27
for some positive gains ki, then the cost is guaranteed to decrease. There are many other choices of control ui that will decrease the cost, although the choice in equation 3.27 (with all ki’s equal) is the gradient of the instantaneous change in cost d/dtE[V] with respect to u and thus will maximize the rate of decrease of cost over all choices of control for which ∑u2i is constrained.

In order to use this for a controller, the probability of state p(x) needs to be estimated at each point in time, and p(x,t) is then used in equation 3.27 to calculate the controller weights. In general, the cost function can depend on time as well, so that we have V(x,t). For a neural control system, negative values of ui will generally not be permitted, and in many cases, negative values might lead to unstable dynamics of . Therefore, in practice in such cases, negative values of ui will usually be set to zero. E[V] will still decrease so long as there is at least one controller with a nonnegative value of ui. If there is no such controller, then it is not possible to decrease the expected cost with only positive control weights ui for this set of controllers.

In the discrete-space case, note that V(x,t) is a vector v(t), p(x,t) is a vector p(t), and the expected cost is given by the vector inner product E[V] = vTp. Therefore, equation 3.26 becomes
formula
3.28
and we can guarantee reduction in cost by choosing the feedback control
formula
3.29
where Li is the stochastic differential matrix that describes the effect of the ith controller.

Comparing this formalism to standard formulations of feedback control, we see that the cost function V(x,t) takes the role of the reference trajectory, and p(x,t) takes the role of the state (see Figure 1). For instance, in linear control systems, the control variable would be proportional to the difference between the reference trajectory and the actual trajectory x*(t) − x(t), whereas here the control variable is given by equation 3.27. The role of the subtraction x*(t) − x(t) is replaced by a kernel or matrix operation (see equations 3.27 or 3.29). The feedback gain ki has the same function.

To show the direct relationship between standard formulations of feedback control using a scalar or vector reference trajectory and feedback control using stochastic differential operators, consider a control system with scalar reference trajectory x*(t) and quadratic cost function:
formula
3.30
Suppose we have an exact measurement of the true state x0(t) with no uncertainty. Then p(x, t) = δ(xx0(t)). If the effect of control input u is to increase x linearly, so that the system is , then (from equation 2.6) the equivalent stochastic differential operator is given by the shift operator . Inserting into equation 3.27 gives
formula
3.31
formula
3.32
formula
3.33
which is the standard form for linear feedback control. So the same control u can be derived from either the scalar or the operator versions. The difference is that the operator version allows much more flexibility, since the cost function V(x,t) can be arbitrary, the state x may not be known with certainty, and the operator L can represent either linear or nonlinear dynamics.

3.5.  Optimization

In general, linear superposition allows us to consider differentials of the form
formula
3.34
and, more important, differentials of linear functionals of such as the expectation of cost. In principle, this allows us to find minima of functionals such as . In equation 3.23, we used
formula
3.35
and derived the direction of change of u that is required to reduce . However, the optimal value of u can in principle be found by solving
formula
3.36
This optimum will be true only at time t, since it depends on the current value of p(x,t). A time-varying control U(t) can be calculated at each point in time in order to maximize the rate of decrease of . Such a controller descends the gradient of cost and can be used to minimize the cost of the final state. Note that if cost depends on state transitions, then it will be a function of both x and , both of which will need to be included in the probability density and the stochastic differential operator.
There is a direct analogy to estimation of the value of a hidden parameter α. Suppose the system can be described as L0 + u(α)L1. Then observation of the change in the system dynamics for a change in α yields
formula
3.37
and with knowledge of L1 and u, it may be possible to make inferences about the value of α. Note that we infer a change in α from a change in the system dynamics. More general formulations are possible, in which the dynamics depend nonlinearly or stochastically on α. In all cases, inferences are made based on the change in system dynamics either at one point in time or by comparison to known baseline dynamics for a known value of α (often α = 0 at baseline). This type of reasoning could be used to estimate parameters in a biological control system based on observation of the dynamics. Alternatively, an experimentally induced change in a parameter will lead to an observable change in dynamics, and this could be used to determine the function of a region of the brain that is subjected to an experimentally induced increase or decrease in excitability. Further exploration of the use of stochastic differential operators for electrophysiology research is an important topic that is beyond the scope of this article.

3.6.  Quantization

If the original variable x is encoded by the density p(x), then a quantization of x is a new density q(z) where z = r(x) is in general not invertible. If x evolves according to , then there is an induced evolution of z that can be written . If r(x) is invertible (so that z and x are one-to-one, but perhaps in different spaces), then . In general, this will not be the case, and will depend in complicated ways on the sample paths of x and the statistics of the unobserved components of x.

For example, in the discrete-state case, assume that we can write p(z) = Rp(x), where R is a matrix. Since R is deterministic, each column will have only a single nonzero element, and all the row sums will be 1. Now we can write , which gives the time evolution of p(z). But this equation cannot be expressed as a closed form in z alone because the additional state information in x is necessary to describe the evolution of z. Without that state information, an equation of the form will have additional uncertainty due to unobservability of the unquantized state x.

4.  Examples

Consider a system that can adjust the damping of a pendulum whose initial angle from vertical is unknown but lies somewhere between −π/4 and π/4. One goal might be to predict the possible positions of the pendulum in the future for any value of the applied damping. This would allow us to choose the least amount of damping that prevents the pendulum from having a particular angle at a particular time (perhaps to avoid a collision), no matter what the initial condition.

The equations of motion for an unforced damped pendulum with random disturbances in both position and velocity are
formula
4.1
formula
4.2
where for the simulation, we select damping b = 0.4, mass m = 1, length l = 5, and gravitational constant g = 10, and the standard deviations of the position and velocity noise terms dBx and dBv are both 0.1 radians. (Note that the velocity v is not equal to the time derivative of x due to the added noise term dBx.) This system can be considered to be the superposition of a noise-free pendulum, a constant damping, and a random walk in p, v-space. The system is initialized at rest, with the probability of the initial position uniformly distributed between −π/4 and π/4. The evolution of the joint probability density p(x, v, t) is shown in Figure 2. The marginal probability densities p(x,t) and p(v,t) are shown in Figure 3. Note that independent of initial condition, the pendulum tends to cross the midline at approximately the same time, but there is some dephasing due to the added noise.
Figure 1:

Comparison between linear feedback control and control using stochastic differential operators.

Figure 1:

Comparison between linear feedback control and control using stochastic differential operators.

Figure 2:

Simulation of damped pendulum. Each image shows the joint probability density p(x, v, t) at one point in time. Horizontal axis is position, vertical axis is velocity, and intensity indicates probability, with white being the highest probability.

Figure 2:

Simulation of damped pendulum. Each image shows the joint probability density p(x, v, t) at one point in time. Horizontal axis is position, vertical axis is velocity, and intensity indicates probability, with white being the highest probability.

Figure 3:

Simulation of damped pendulum. (Left) The marginal distribution p(x,t) with position (x) on the vertical axis and time (t) on the horizontal axis. (Right) The marginal distribution for velocity p(v,t).

Figure 3:

Simulation of damped pendulum. (Left) The marginal distribution p(x,t) with position (x) on the vertical axis and time (t) on the horizontal axis. (Right) The marginal distribution for velocity p(v,t).

The simulation uses 100 discrete values to represent x and v. Each update of the joint probability density p(x, v, t) → p(x, v, t + Δt) is accomplished by first calculating p(v, t + Δt|x) for each possible value of x and then calculating p(x, t + Δt|v) for each possible value of v. The conditional updates p(v, t|x) → p(v, t + Δt|x) and p(x, t|v) → p(x, t + Δt|v) are each performed using linear operators
formula
4.3
formula
4.4
Lx(v) and Lv(x) are matrix operators that are zero except near the diagonal. They can be implemented efficiently using first-difference operators. For example, the Ito-type stochastic differential equation for p(v|x) is given by
formula
4.5
which can be written as
formula
4.6
and the corresponding Fokker-Planck equation is
formula
4.7
and this can be approximated using first and second difference operations so long as the basic properties of stochastic differential matrices are respected. (Matlab code for the simulation shown in Figures 2 and 3 is available at www.sangerlab.net/StochasticPendulum.zip.) Note that this is an open-loop simulation, and a single time-invariant stochastic differential operator describes the complete behavior of the system.
Next, consider a system that must use a spring-like actuator to control the movement of a small mass. The equations of motion for a damped one-dimensional diffusion process for a particle of mass m = 1 at position x with velocity v are
formula
4.8
formula
4.9
where b = 1, and the standard deviations of dBx1 and dBv1 are both 10 units. For initial conditions with a gaussian distribution for position (standard deviation = 20) and velocity equal to zero (sharp), then Figure 4 shows the evolution over time of the marginal probability density for position. Suppose we now add a noisy controller that applies force through a spring-like actuator and assume that the goal of the controller is to stabilize around an equilibrium position of 25 units:
formula
4.10
formula
4.11
The rest position of the spring is x* = 25, and the spring constant is k = 1.5, so that the combined system is underdamped (damping ratio = 0.4). The standard deviations of the additional noise terms are also 10 units. This results in a combined process (the sum of the right-hand sides of equations 4.8 and 4.10 and equations 4.9 and 4.11) whose time evolution of the joint density p(x, v, t) is given in Figure 5 and the marginals for position and velocity are shown in Figure 6. For comparison with Figure 4, Figure 7 shows a three-dimensional plot of the time evolution of the density for position. Note that the time evolution of the density for position overshoots and eventually converges toward x*, as expected for an underdamped system, but that because of the continual added noise terms dBv and dBx, there is residual variance in both position and velocity. (Matlab code for the simulation shown in Figures 5 and 6 is available at www.sangerlab.net/StochasticSpring.zip.)
Figure 4:

Simulation of linear diffusion process. Time evolves down and to the right, and the height of the surface is the probability for each value of x. Broadening of the distribution of x over time is the expected result of the diffusion process.

Figure 4:

Simulation of linear diffusion process. Time evolves down and to the right, and the height of the surface is the probability for each value of x. Broadening of the distribution of x over time is the expected result of the diffusion process.

Figure 5:

Simulation of linear diffusion process with spring. Initial condition is gaussian in position. Final condition shows convergence around x = 25 with the velocity near zero (vertical axis is velocity, with zero in the middle of the axis).

Figure 5:

Simulation of linear diffusion process with spring. Initial condition is gaussian in position. Final condition shows convergence around x = 25 with the velocity near zero (vertical axis is velocity, with zero in the middle of the axis).

Figure 6:

Simulation of linear diffusion process with spring. Marginals for position (left) and velocity (right) as a function of time.

Figure 6:

Simulation of linear diffusion process with spring. Marginals for position (left) and velocity (right) as a function of time.

Figure 7:

Simulation of linear diffusion process with spring. Marginal for position as a function of time, showing the stable final probability density for position.

Figure 7:

Simulation of linear diffusion process with spring. Marginal for position as a function of time, showing the stable final probability density for position.

As in the previous example, it is important to realize that this is an open-loop system (by our method described above) since the full behavior is specified by a single time-invariant operator. This would be true even if the target position x* were time varying. This would become a feedback system (in our terminology) only if the spring-like controller itself were activated or deactivated based on the value of the state x. This example provides a very simple illustration of the application of these ideas to a second-order system. A more realistic application to a biological system would involve the superposition of multiple controllers, such as a combination of agonist and antagonist muscles, or a combination of multiple motor units within a muscle, in order to achieve more complex dynamics.

5.  Conclusion

In this article, we have introduced the use of linear stochastic differential operators and shown that these operators form a framework that links several different approaches to time-varying probability densities, including Bayesian estimation and discrete-time Markov processes. While it is well known that discrete-time operators converge to differential operators in the limit of small time intervals, the relation between the additive properties of differential operators and the multiplicative properties of discrete-time operators has not been widely noted. Stochastic differential operators arise naturally from stochastic calculus and the partial differential equations that are known to describe the evolution of probability densities. Therefore, the underlying mathematics is familiar. The novelty in this approach arises from the observation that linear properties of these operators permit linear operations on stochastic systems, including weighted combinations. This allows us to bring the full force of linear mathematics to bear on problems of the design and simulation of distributed stochastic control systems.

In many cases, the form of the stochastic differential operator can be directly inferred from the problem description, even when the operator is not the result of a second-order Fokker-Planck equation. The effects of quantization or change of variables can be directly modeled, and time-varying expected costs associated with state-dependent cost functions can be calculated. An important topic for further research will be to derive methods for optimal control of systems described by stochastic differential operators. The consequence for control is that a time-varying cost function takes the place of the reference trajectory, and the probability density of state takes the place of the state itself.

The use of probability densities as the fundamental objects can be considered to be a projection of state-space into a high-dimensional space (infinite-dimensional for continuous state space). Stochastic differential operators are linear operators on this projection space, and linear operators in the high-dimensional space can implement nonlinear operators in the original low-dimensional space. Therefore, the use of these operators is quite general and can implement a large class of nonlinear differential operators, including differential equations of the form dx = [f(x) + g(x)u] dt for differentiable nonlinear functions f and g. This allows the use of techniques from linear algebra (including additive superposition and linear projections).

The numerical effort required to simulate such systems is considerably greater than for deterministic control systems, because the entire joint probability density must be propagated forward in time, rather than just the most likely values of variables, or the mean and variance of gaussian-distributed variables (as is typically done in the Kalman filter or linear quadratic regulator approach). Although current computing power is adequate for low- dimensional systems, for high-dimensional systems, simulation may become rapidly intractable. An important topic for future research will be to find low-dimensional projections or other approximate or heuristic algorithms for integrating high-dimensional stochastic systems. Galerkin methods provide one possible approach using a finite-dimensional approximation to the probability density at each point in time (e.g., Chakravorty, 2006; Knezevic & Süli, 2008). It will be particularly important to seek parallel algorithms that can implement prediction or optimization in order to determine whether this type of computation could be performed explicitly by biological systems. Until such algorithms are available, this framework should be considered a description of the behavior of complex systems, but not necessarily a description of how that behavior is implemented.

One of the more interesting consequences of this approach is a change in the interpretation of control and observation. In this formalism, control of a system involves changing its dynamics from the unforced dynamics L0 to a new dynamics L0 + L1, where L1 is implemented by a controller acting in parallel with the plant (Todorov, 2009). Similarly, observation involves estimation of the change in a hidden parameter α by comparison of the system L0 + αL1 to the base system L0. Therefore, the observed or controlled quantities are not the output of the system, but rather the system dynamics themselves. The effect of a controller (or of a change in a hidden parameter) is the difference between the dynamics of the full system with and without the presence of the controller. This is a new conceptualization that may have significant power for the analysis of behavioral effects of individual neurons or neural circuits. The method proposed here provides a consistent framework for understanding the time-varying stochastic behavior of differential and discrete-time distributed systems, and it may provide insight that will permit further study of how cooperating groups of individual neurons or neural subsystems give rise to global dynamic behavior.

Acknowledgments

I thank Nick Bambos and Evangelos Theodorou for helpful conversations and comments on earlier manuscripts. This work was supported by the Don and Linda Carter Foundation, the Crowley Carter Foundation, and the James S. McDonnell Foundation.

References

Braun
,
D. A.
,
Aertsen
,
A.
,
Wolpert
,
D. M.
, &
Mehring
,
C.
(
2009
).
Learning optimal adaptation strategies in unpredictable motor tasks
.
J. Neurosci.
,
29
,
6472
6478
.
Brockett
,
R.
(
2008
).
Optimal control of observable continuous time Markov chains
. In
Proceedings of the 47th IEEE Conference on Decision and Control
(pp.
4269
4274
).
Piscataway, NJ
:
IEEE
.
Chakravorty
,
S.
(
2006
).
A homotopic Galerkin approach to the solution of the Fokker-Planck-Kolmogorov equation
. In
Proc. American Control Conference, Inst. Electrical and Electronics Engineers
.
Piscataway, NJ
:
IEEE
.
Kac
,
M.
(
1949
).
On distributions of certain Wiener functionals
.
Trans. Amer. Math. Soc.
,
65
,
1
13
.
Knezevic
,
D. J.
, &
Süli
,
E.
(
2008
).
Spectral Galerkin approximation of Fokker-Planck equations with unbounded drift
.
ESAIM: Mathematical Modelling and Numerical Analysis
,
43
(
3
),
445
485
.
Risken
,
H.
, &
Vollmer
,
H. D.
(
1987
).
On solutions of truncated Kramers-Moyal expansions: Continuum approximations to the Poisson process
.
Z Phys. B Condensed Matter
,
66
,
257
262
.
Sanger
,
T. D.
(
2010a
).
Controlling variability
.
Journal of Motor Behavior
,
42
,
401
407
.
Sanger
,
T. D.
(
2010b
).
Neuro-mechanical control using differential stochastic operators
.
Paper presented at the Engineering in Medicine and Biology Society, 2010 Annual International Conference of the IEEE
.
Skorokhod
,
A. V.
(
1989
).
Asymptotic methods in the theory of stochastic differential equations
.
Providence, RI
:
American Mathematical Society
.
Todorov
,
E.
(
2009
).
Efficient computation of optimal actions
.
PNAS
,
106
(
28
),
11478
11483
.
Toussaint
,
M.
, &
Storkey
,
A.
(
2006
).
Probabilistic inference for solving discrete and continuous state Markov decision processes
. In
Proc. 23rd Intl. Conf. Machine Learning
.
New York
:
ACM
.
Van Kampen
,
N. G.
(
2007
).
Stochastic processes in physics and chemistry
(3rd ed.).
New York
:
Elsevier
.