## Abstract

Control in the natural environment is difficult in part because of uncertainty in the effect of actions. Uncertainty can be due to added motor or sensory noise, unmodeled dynamics, or quantization of sensory feedback. Biological systems are faced with further difficulties, since control must be performed by networks of cooperating neurons and neural subsystems. Here, we propose a new mathematical framework for modeling and simulation of distributed control systems operating in an uncertain environment. Stochastic differential operators can be derived from the stochastic differential equation describing a system, and they map the current state density into the differential of the state density. Unlike discrete-time Markov update operators, stochastic differential operators combine linearly for a large class of linear and nonlinear systems, and therefore the combined effects of multiple controllable and uncontrollable subsystems can be predicted. Design using these operators yields systems whose statistical behavior can be specified throughout state-space. The relationship to Bayesian estimation and discrete-time Markov processes is described.

## 1. Introduction

Biological control systems operate in an uncertain world. When a motor command is executed, the perceived result of the action depends on the plant dynamics, the effect of unobserved controllers and noise acting in the environment, and the quality of the sensory data. In most cases, the controller has only an approximate knowledge of these effects, but it must nevertheless choose an action that accomplishes the desired task. Because of the effect of unmeasured state and random noise, observed movement appears variable, and when the variability can be estimated, control inputs can be chosen to minimize the expected cost (Braun, Aertsen, Wolpert, & Mehring, 2009; Todorov, 2009). However, this optimization problem is mathematically difficult and can usually be solved only for specific types of systems, such as linear systems with gaussian added noise.

The impact of unobserved controllers is significantly heightened in neural control systems, because from the point of view of any single part of the brain, the control exerted by other parts of the brain may be only incompletely known or observable only after a delay. Nevertheless, each component of the full system must choose actions that optimize the overall behavior, even when presented with only partial knowledge of the actions chosen by other components. In the most extreme case, a single neuron must adapt its connectivity so that its effect on global behavior, however small, nevertheless contributes to desired performance. Current methods in nonlinear and stochastic control theory have not been applied to this problem, perhaps because of high complexity. Here, we present a mathematical framework that allows modeling of the effect of actions deep within a system on the stochastic behavior of the global system. The framework is based on stochastic differential operators, and we show that these operators exhibit linear superposition, so that the combined effect of multiple actions can be determined easily using linear operations. This has important consequences for modeling local and global system behavior, interpreting the effects of neurons within the system, and computing optimal actions.

*f*(

*x*) describes the environment dynamics,

*g*(

_{i}*x*) is the effect of control

*u*on the state

_{i}*x*,

*h*(

_{i}*x*,

*u*) is the effect of the state and control on the noise, and

_{i}*dB*indicates the differential of a set of independent noise processes such as Brownian motion. Since this equation is stochastic, we might wish to know not just the expected or most likely sample path

_{i}*E*[

*x*(

*t*)|

*U*(

*t*)], but rather the full probability density of the state at any time

*p*(

*x*(

*t*)|

*U*(

*t*)) where

*U*(

*t*) indicates the full control trajectory up to time

*t*.

*g*(

_{i}*x*)

*u*contribute tunable dynamics in the form of a superposition of controllers. For example, if there are no environmental dynamics and no added noise so that

_{i}dt*f*(

*x*) =

*h*(

_{i}*x*) = 0, then we have the ordinary differential equation, which is a linear superposition of a set of dynamics of the form . In this simplified case, the goal of control will be to find a set of functions of time

*u*(

_{i}*t*) such that a time-varying superposition of the form approximates a desired closed-loop dynamics . In the following, we will extend this idea to the more general stochastic case in equation 1.1. These results will extend previous work on superposition of controllers (Sanger, 2010a, 2010b), by showing the relation to Bayes’ rule and Markov processes, examining basic properties including commutativity of operators, using an inner-product space for the set of differential operators to derive preliminary results on feedback control and optimization, and showing the application of operator superposition to systems with second-order dynamics.

## 2. Background

Much recent work has examined the effect of sensory uncertainty on motor control. Bayes’ rule has been used to model the combination of prior knowledge of state with measurement, and the effect on control is determined by the effect of the residual uncertainty in state on the variability of control (Toussaint & Storkey, 2006). It is tempting to consider whether Bayes’ rule could be used to model the effects of other forms of uncertainty in control. Here we will show that Bayes’ rule is not adequate for this purpose, and discrete-time or continuous-time Markov processes must be used. Our goal in this section is to illustrate the links among Bayes’ rule, discrete-time Markov processes, and continuous-time Markov processes described by stochastic differential equations. The underlying mathematics are well known, but the explicit description of the links will be helpful to motivate the use of stochastic differential operators in the next section.

### 2.1. Bayes’ Rule and Markov Processes

*p*(

*x*) to indicate a vector of probabilities

*p*(

*x*=

*i*) or a probability density function (

*d*/

*dx*)Prob(

*X*⩽

*x*). Therefore,

*p*(

*x*) ⩾ 0 and ∑

_{i}

*p*(

*x*=

*i*) = 1, or ∫

*p*(

*x*)

*dx*= 1. Then Bayes’ rule is written as and it is straightforward to interpret this equation as a mapping from the prior density

*p*(

*x*) to the posterior density

*p*(

*x*|

*y*). This mapping describes the instantaneous effect of a measurement of a particular value of

*y*on the density of

*x*. If the variable

*x*takes on a finite number of states, then we can write equation 2.1 as or, in matrix notation with

*a*(

_{i}*y*) =

*p*(

*y*|

*x*=

*i*)/

*p*(

*y*), where the matrix

*A*has diagonal elements

*A*=

_{ii}*a*(

_{i}*y*),

*p*= (

_{x}*p*(

*x*= 1), …,

*p*(

*x*=

*m*)) is the vector of prior probabilities, and

*p*′

_{x}= (

*p*(

*x*= 1|

*y*), …,

*p*(

*x*=

*m*|

*y*)) is the vector of posterior probabilities.

*M*must be nonnegative and the column sums must all be 1. In contrast, while the Bayes update matrix must have nonnegative elements, the column sums are almost never 1 (each column of the Bayes matrix 2.3 has only a single element). The Markov update matrix can have nonzero elements off the diagonal. Such elements indicate the probability of transitions between states, and they allow the probability of being in a state at time

*t*+ 1 to depend on the probability of being in other states at time

*t*. Bayes’ rule allows no such dependence, since the posterior density for any particular value of

*x*depends on only the prior density for that value of

*x*. Furthermore, in Bayes’ rule, if the prior probability of state

*i*is zero, then so is the posterior probability, whereas for a Markov update, the probability of being in state

*i*at time

*t*+ 1 can be nonzero even if state

*i*was impossible at time

*t*.

Note that operators of the form 2.2 are diagonal and thus commute, whereas operators of the form 2.4 may or may not commute. Therefore, the order of estimates or state updates is important in the Markov formulation. When estimates commute, then they can be made in any order and produce the same result. When estimates do not commute, the interpretation of the second estimate depends on the outcome of the first. This commonly occurs (even in nonquantum systems) when estimation requires time or occurs at different points in time. For example, the change in position predicted by measuring an accelerometer will be highly dependent on whether a prior gyroscope measurement indicated a change in heading.

### 2.2. Stochastic Differential Equations

*dB*is the differential of unit variance Brownian motion. Brownian motion is random, so

*x*is a random variable as well, with a probability density

*p*(

*x*) that evolves through time as described by the Fokker-Planck partial differential equation (Kac, 1949): We can write equation 2.6 as where is the linear operator, We will refer to this type of operator as a stochastic differential operator, and it is important to realize that while such operators describe the behavior of nonlinear stochastic systems, equation 2.7 is a linear and deterministic partial differential equation (Van Kampen, 2007). It is known that the Fokker-Planck equation (the second-order approximation to the Kramers-Moyal expansion derived from the master equation) maintains the properties of probability densities, including nonnegativity and integration to 1 (Risken & Vollmer, 1987).

*x*takes on integer values and

*N*is a set of Poisson-distributed counting processes with rates λ

_{ij}_{ij}that trigger jumps from state

*j*to state

*i*. It is assumed that two processes never jump at exactly the same instant. From equation 2.9, we can construct a master equation for the change in probability over a short interval, where

*p*(

*N*) indicates the probability that the

_{ij}*ij*th process has a jump during interval Δ

*t*. Dividing by Δ

*t*and simplifying the notation slightly, As Δ

*t*→ 0, we obtain where we have set λ

_{ii}≐ −∑

_{j≠i}λ

_{ji}. Writing this in matrix notation, where

*L*is a square matrix and

*p*is a vector of elements Prob(

*x*=

*i*). (A more rigorous derivation can be found in Skorokhod, 1989.) Two important properties of

*L*are that all off-diagonal elements are nonnegative, and the column sums are zero (therefore, the diagonal elements are nonpositive). These two properties are necessary and sufficient to ensure that if

*p*(

*x*,

*t*= 0) is any valid initial probability density, then

*p*(

*x*,

*t*) will remain a valid probability density (with nonnegative elements that sum to 1). We will refer to matrices with these properties as stochastic differential matrices. Note that the class of stochastic differential matrices is closed under addition and multiplication by nonnegative scalars.

*p*(0), then after time Δ

*t*, we have so if we identify

*M*=

*e*

^{LΔt}and Δ

*t*= 1, then we recover the Markov update rule

*p*(1) =

*Mp*(0). The Markov matrix

*M*must be a stochastic matrix with all nonnegative elements and column sums of 1 in order to preserve probability densities.

## 3. Operator Superposition

In this section, we consider the combined effect of two stochastic control systems acting on the same plant. A fundamental assumption is that the noise in the two systems is uncorrelated. We discuss this assumption further below.

### 3.1. Summation of Operators

*dB*

_{1}and

*dB*

_{2}are uncorrelated. Then equation 3.4 can be written as where which shows that linear combination of stochastic differential equations yields a linear combination of the partial differential equation for the time evolution of the probability density.

Linear superposition of the stochastic differential operators for physical processes with additive effects will hold in very general circumstances, requiring only differentiability of *fp*, twice differentiability of *gg ^{Tp}*, and uncorrelated noise sources

*dB*. Note that the Fokker-Planck equation is linear in

_{i}*p*, which means that for any single dynamics, linear combinations of initial state density have linear effects on the future state density. The importance of summation of operators is that the system is also linear in the dynamics, so that linear combinations of dynamic systems have linear effects on the state density. This fact is essential in the development below, because it allows linear methods to be applied to the design and optimization of dynamic systems.

The discrete-time Markov update rule *p*(*t* + 1) = *Mp*(*t*) does not have this linear superposition property, since the sum of the two probability densities *p*_{1}(*x*(*t* + 1)) + *p*_{2}(*x*(*t* + 1)) is not meaningful and could exceed 1. Therefore the discrete-time state density for a superposed process cannot be easily calculated from the individual discrete-time processes. Linear superposition is a property of differential equations of the form , but not difference equations of the form *p*(*t* + 1) = *Mp*(*t*).

*L*

_{1}+

*L*

_{2}yields a Markov update rule . If

*L*

_{1}and

*L*

_{2}commute, then showing that for commutative operators, addition in the differential domain yields composition of operators in the discrete-time domain. When operators do not commute, then , and this reflects the physical reality that simultaneous operation of

*L*

_{1}and

*L*

_{2}is not equivalent to

*L*

_{1}performed for Δ

*t*followed by

*L*

_{2}performed for Δ

*t*. Therefore, the semigroup of operators generated by sequential application of discrete-time Markov operators is not equivalent to the space of operators generated by instantaneous superposition of continuous-time stochastic operators, unless the continuous-time operators commute.

If stochastic operators are not superimposed, then all possible processes are described by sequential compositions of Markov operators , and this generates the usual operator semigroup. But if simultaneous superposition of operators is allowed in the differential domain, then this expands the operator semigroup by including additional operators of the form exp(∑*L _{i}*). Interpreting this the other way, if operators cannot be sequenced (or selected in a time-dependent manner), then all possible processes are described by the sums ∑

*L*. But if sequencing is allowed in the discrete domain, then this expands the operator space by including operators in the (noncommutative) exponentiated space. By simultaneously considering the description of operators in the differential and discrete domain, it is much easier to model systems that have superimposed controllers but also discrete sequences of changes in control parameters.

_{i}### 3.2. Space of Differential Operators

In order to use stochastic differential operators for control, we must be able to create flexible superpositions that achieve the desired dynamics. Implemented in a neural system, this would mean that subsystems individually implement specific dynamic behaviors, each described by an operator , and global behavior is created by choosing a weighted combination of the individual dynamic behaviors to yield a particular desired combined behavior.

_{i}is a set of nonnegative real weighting coefficients. The Fokker-Planck equation, corresponds to a stochastic differential equation of the form which means that linear scaling, yields a system in which the magnitude of the noise term

*dB*increases proportional to the square root of the control signal α

_{i}_{i}. (Equivalently, the variance of the noise increases proportional to α

_{i}.) We can use a weighted superposition of a large group of systems, to generate a flexible set of dynamics. If we do this, the total variance of the noise will be the sum of the control signals α

_{i}. For this type of controller, signal dependence of the noise arises directly from the use of control signals α

_{i}that weight the superposition of stochastic dynamic operators.

To maintain consistency, we will restrict α_{i} ⩾ 0. This yields a linear half-space generated by the set of *L _{i}*. Note that all elements of the space continue to satisfy the consistency requirement that the column sums of

*L*are zero. (In the continuous-state case, we require the integral of the density to remain 1 or, equivalently, the time derivative of the integral of the density to be zero, so ).

_{i}*p*

_{0}, so that in the discrete-space case and in the continuous-space case.

*p*

_{0}will usually be the initial condition or a known intermediate state of the dynamics. This inner product reflects the extent to which two differential operators modify this particular initial condition in the same direction.

_{i}(with respect to a particular inner product) for the set of differential stochastic operators. For any desired operator

*L**, we can project onto the basis set to calculate where α

_{i}= <

*L**, Λ

_{i}>. Using the

*p*

_{0}inner product will find a (nonunique) approximation to the desired dynamics starting from this particular initial condition.

*x*

_{0}is known, then in continuous state-space,

*p*

_{0}(

*x*) = δ(

*x*−

*x*

_{0}). If the linear operators can be written in terms of linear kernels

*K*(

_{i}*x*,

*x*′) so that then which in the discrete-space (matrix) case is just the inner product of the corresponding columns of the stochastic differential matrix.

A different matrix inner product <*L*_{1}, *L*_{2}>_{f} can be defined for matrices with elements *m _{ij}* and

*n*as the element-wise dot product ∑

_{ij}_{ij}

*m*, and the corresponding induced norm is the Frobenius norm ‖

_{ij}n_{ij}*L*‖

_{f}. We will refer to <

*L*

_{1},

*L*

_{2}>

_{f}as the Frobenius inner product.

### 3.3. Closed-Loop Control

*L** with elements λ*

_{ij}. For each sharp initial condition

*p*

_{0}(

*x*) =

*e*(where

_{j}*e*is a vector with a 1 in the

_{j}*j*th position and zeros elsewhere), the

*j*th column of

*L** specifies the desired vector . Using standard methods from linear algebra, we can find a set of coefficients α

_{ij}that minimize , where This set of coefficients forms a useful approximation only when starting from the initial condition

*e*. For a different initial condition,

_{j}*e*, for example, we would need a different set of coefficients α

_{k}_{ik}. The natural solution is to make the mixing coefficients dependent on the current state, so that This equation indicates that the mixture of the operators Λ

_{i}changes with the state

*x*. Since the

*j*th column of

*L*is the response to the sharp initial condition

*x*=

*j*, equation 3.18 creates by forming a matrix each of whose columns is a different linear combination of the corresponding columns of Λ

_{i}. The

*j*th column of is a linear combination of the

*j*th columns of the matrices Λ

_{i}with the columns weighted by the factors α

_{ij}. This new matrix has the approximate desired behavior for all sharp initial conditions

*x*=

*j*. The behavior of the closed-loop system is described by the density evolution .

_{i}since each column is weighted differently. We refer to this as closed-loop control because the controller depends on measurement of the current state. In contrast, for open-loop control, the mixing factors will not be dependent on

*x*, and therefore describes open-loop behavior, where β

_{i}are constant coefficients. Open-loop dynamics (see equation 3.20) are thus confined to the linear span of the Λ

_{i}’s, whereas closed-loop dynamics (see equation 3.19) can be selected from the larger set of operators formed by different linear combinations of each column of the Λ

_{i}’s. The coefficients β

_{i}of the best open-loop approximation to a desired dynamics

*L** are obtained by projecting

*L** onto the space of basis operators Λ

_{i}using the Frobenius inner product.

In the continuous state-space case, closed-loop dynamics are given by equations of the form of 3.19 and 3.20. However, in this case, determination of the feedback functions α(*x*) will usually be based on the desired response to only a finite set of sharp initial conditions δ(*x* − *x _{k}*). Therefore, α

_{i}(

*x*) is not unique. This means that the functions α

_{i}(

*x*) can be chosen to lie within the span of a set of desired basis functions, which permits us to choose these functions to have smoothness or other computationally desirable properties.

### 3.4. Feedback Control

*u*: describes the unforced dynamics, and describes the controllable portion of the dynamics. For example, might be the dynamics determined by a neural subsystem that can be modulated by a control input

*u*, while is the dynamics of all other components of the system.

*V*(

*x*) on states

*x*, so that the expected value of the cost is given by If we wish to reduce the cost over time, we want to choose dynamics such that is less than zero. Since all operators are linear, at any point in time with fixed and known

*p*(

*x*,

*t*), the change in

*d*/

*dtE*[

*V*] with changes in

*u*is given by so if we know the sign of the right-hand side of equation 3.24, then we can increase or decrease

*u*to reduce

*d*/

*dtE*[

*V*]. Note that this equation applies only to the instantaneous change in expected cost for a measured value of

*p*(

*x*,

*t*); over time,

*p*(

*x*,

*t*) will depend on the choice of

*u*, so that at future times.

*u*act by weighting the superposition of multiple controllers. In this case, so if we set for some positive gains

_{i}*k*, then the cost is guaranteed to decrease. There are many other choices of control

_{i}*u*that will decrease the cost, although the choice in equation 3.27 (with all

_{i}*k*’s equal) is the gradient of the instantaneous change in cost

_{i}*d*/

*dtE*[

*V*] with respect to

*u*and thus will maximize the rate of decrease of cost over all choices of control for which ∑

*u*

^{2}

_{i}is constrained.

In order to use this for a controller, the probability of state *p*(*x*) needs to be estimated at each point in time, and *p*(*x*,*t*) is then used in equation 3.27 to calculate the controller weights. In general, the cost function can depend on time as well, so that we have *V*(*x*,*t*). For a neural control system, negative values of *u _{i}* will generally not be permitted, and in many cases, negative values might lead to unstable dynamics of . Therefore, in practice in such cases, negative values of

*u*will usually be set to zero.

_{i}*E*[

*V*] will still decrease so long as there is at least one controller with a nonnegative value of

*u*. If there is no such controller, then it is not possible to decrease the expected cost with only positive control weights

_{i}*u*for this set of controllers.

_{i}*V*(

*x*,

*t*) is a vector

*v*(

*t*),

*p*(

*x*,

*t*) is a vector

*p*(

*t*), and the expected cost is given by the vector inner product

*E*[

*V*] =

*v*. Therefore, equation 3.26 becomes and we can guarantee reduction in cost by choosing the feedback control where

^{T}p*L*is the stochastic differential matrix that describes the effect of the

_{i}*i*th controller.

Comparing this formalism to standard formulations of feedback control, we see that the cost function *V*(*x*,*t*) takes the role of the reference trajectory, and *p*(*x*,*t*) takes the role of the state (see Figure 1). For instance, in linear control systems, the control variable would be proportional to the difference between the reference trajectory and the actual trajectory *x**(*t*) − *x*(*t*), whereas here the control variable is given by equation 3.27. The role of the subtraction *x**(*t*) − *x*(*t*) is replaced by a kernel or matrix operation (see equations 3.27 or 3.29). The feedback gain *k _{i}* has the same function.

*x**(

*t*) and quadratic cost function: Suppose we have an exact measurement of the true state

*x*

_{0}(

*t*) with no uncertainty. Then

*p*(

*x*,

*t*) = δ(

*x*−

*x*

_{0}(

*t*)). If the effect of control input

*u*is to increase

*x*linearly, so that the system is , then (from equation 2.6) the equivalent stochastic differential operator is given by the shift operator . Inserting into equation 3.27 gives which is the standard form for linear feedback control. So the same control

*u*can be derived from either the scalar or the operator versions. The difference is that the operator version allows much more flexibility, since the cost function

*V*(

*x*,

*t*) can be arbitrary, the state

*x*may not be known with certainty, and the operator

*L*can represent either linear or nonlinear dynamics.

### 3.5. Optimization

*u*that is required to reduce . However, the optimal value of

*u*can in principle be found by solving This optimum will be true only at time

*t*, since it depends on the current value of

*p*(

*x*,

*t*). A time-varying control

*U*(

*t*) can be calculated at each point in time in order to maximize the rate of decrease of . Such a controller descends the gradient of cost and can be used to minimize the cost of the final state. Note that if cost depends on state transitions, then it will be a function of both

*x*and , both of which will need to be included in the probability density and the stochastic differential operator.

*L*

_{0}+

*u*(α)

*L*

_{1}. Then observation of the change in the system dynamics for a change in α yields and with knowledge of

*L*

_{1}and

*u*, it may be possible to make inferences about the value of α. Note that we infer a change in α from a change in the system dynamics. More general formulations are possible, in which the dynamics depend nonlinearly or stochastically on α. In all cases, inferences are made based on the change in system dynamics either at one point in time or by comparison to known baseline dynamics for a known value of α (often α = 0 at baseline). This type of reasoning could be used to estimate parameters in a biological control system based on observation of the dynamics. Alternatively, an experimentally induced change in a parameter will lead to an observable change in dynamics, and this could be used to determine the function of a region of the brain that is subjected to an experimentally induced increase or decrease in excitability. Further exploration of the use of stochastic differential operators for electrophysiology research is an important topic that is beyond the scope of this article.

### 3.6. Quantization

If the original variable *x* is encoded by the density *p*(*x*), then a quantization of *x* is a new density *q*(*z*) where *z* = *r*(*x*) is in general not invertible. If *x* evolves according to , then there is an induced evolution of *z* that can be written . If *r*(*x*) is invertible (so that *z* and *x* are one-to-one, but perhaps in different spaces), then . In general, this will not be the case, and will depend in complicated ways on the sample paths of *x* and the statistics of the unobserved components of *x*.

For example, in the discrete-state case, assume that we can write *p*(*z*) = *Rp*(*x*), where *R* is a matrix. Since *R* is deterministic, each column will have only a single nonzero element, and all the row sums will be 1. Now we can write , which gives the time evolution of *p*(*z*). But this equation cannot be expressed as a closed form in *z* alone because the additional state information in *x* is necessary to describe the evolution of *z*. Without that state information, an equation of the form will have additional uncertainty due to unobservability of the unquantized state *x*.

## 4. Examples

Consider a system that can adjust the damping of a pendulum whose initial angle from vertical is unknown but lies somewhere between −π/4 and π/4. One goal might be to predict the possible positions of the pendulum in the future for any value of the applied damping. This would allow us to choose the least amount of damping that prevents the pendulum from having a particular angle at a particular time (perhaps to avoid a collision), no matter what the initial condition.

*b*= 0.4, mass

*m*= 1, length

*l*= 5, and gravitational constant

*g*= 10, and the standard deviations of the position and velocity noise terms

*dB*and

_{x}*dB*are both 0.1 radians. (Note that the velocity

_{v}*v*is not equal to the time derivative of

*x*due to the added noise term

*dB*.) This system can be considered to be the superposition of a noise-free pendulum, a constant damping, and a random walk in

_{x}*p*,

*v*-space. The system is initialized at rest, with the probability of the initial position uniformly distributed between −π/4 and π/4. The evolution of the joint probability density

*p*(

*x*,

*v*,

*t*) is shown in Figure 2. The marginal probability densities

*p*(

*x*,

*t*) and

*p*(

*v*,

*t*) are shown in Figure 3. Note that independent of initial condition, the pendulum tends to cross the midline at approximately the same time, but there is some dephasing due to the added noise.

*x*and

*v*. Each update of the joint probability density

*p*(

*x*,

*v*,

*t*) →

*p*(

*x*,

*v*,

*t*+ Δ

*t*) is accomplished by first calculating

*p*(

*v*,

*t*+ Δ

*t*|

*x*) for each possible value of

*x*and then calculating

*p*(

*x*,

*t*+ Δ

*t*|

*v*) for each possible value of

*v*. The conditional updates

*p*(

*v*,

*t*|

*x*) →

*p*(

*v*,

*t*+ Δ

*t*|

*x*) and

*p*(

*x*,

*t*|

*v*) →

*p*(

*x*,

*t*+ Δ

*t*|

*v*) are each performed using linear operators

*L*(

_{x}*v*) and

*L*(

_{v}*x*) are matrix operators that are zero except near the diagonal. They can be implemented efficiently using first-difference operators. For example, the Ito-type stochastic differential equation for

*p*(

*v*|

*x*) is given by which can be written as and the corresponding Fokker-Planck equation is and this can be approximated using first and second difference operations so long as the basic properties of stochastic differential matrices are respected. (Matlab code for the simulation shown in Figures 2 and 3 is available at www.sangerlab.net/StochasticPendulum.zip.) Note that this is an open-loop simulation, and a single time-invariant stochastic differential operator describes the complete behavior of the system.

*m*= 1 at position

*x*with velocity

*v*are where

*b*= 1, and the standard deviations of

*dB*

_{x1}and

*dB*

_{v1}are both 10 units. For initial conditions with a gaussian distribution for position (standard deviation = 20) and velocity equal to zero (sharp), then Figure 4 shows the evolution over time of the marginal probability density for position. Suppose we now add a noisy controller that applies force through a spring-like actuator and assume that the goal of the controller is to stabilize around an equilibrium position of 25 units: The rest position of the spring is

*x** = 25, and the spring constant is

*k*= 1.5, so that the combined system is underdamped (damping ratio = 0.4). The standard deviations of the additional noise terms are also 10 units. This results in a combined process (the sum of the right-hand sides of equations 4.8 and 4.10 and equations 4.9 and 4.11) whose time evolution of the joint density

*p*(

*x*,

*v*,

*t*) is given in Figure 5 and the marginals for position and velocity are shown in Figure 6. For comparison with Figure 4, Figure 7 shows a three-dimensional plot of the time evolution of the density for position. Note that the time evolution of the density for position overshoots and eventually converges toward

*x**, as expected for an underdamped system, but that because of the continual added noise terms

*dB*and

_{v}*dB*, there is residual variance in both position and velocity. (Matlab code for the simulation shown in Figures 5 and 6 is available at www.sangerlab.net/StochasticSpring.zip.)

_{x}As in the previous example, it is important to realize that this is an open-loop system (by our method described above) since the full behavior is specified by a single time-invariant operator. This would be true even if the target position *x** were time varying. This would become a feedback system (in our terminology) only if the spring-like controller itself were activated or deactivated based on the value of the state *x*. This example provides a very simple illustration of the application of these ideas to a second-order system. A more realistic application to a biological system would involve the superposition of multiple controllers, such as a combination of agonist and antagonist muscles, or a combination of multiple motor units within a muscle, in order to achieve more complex dynamics.

## 5. Conclusion

In this article, we have introduced the use of linear stochastic differential operators and shown that these operators form a framework that links several different approaches to time-varying probability densities, including Bayesian estimation and discrete-time Markov processes. While it is well known that discrete-time operators converge to differential operators in the limit of small time intervals, the relation between the additive properties of differential operators and the multiplicative properties of discrete-time operators has not been widely noted. Stochastic differential operators arise naturally from stochastic calculus and the partial differential equations that are known to describe the evolution of probability densities. Therefore, the underlying mathematics is familiar. The novelty in this approach arises from the observation that linear properties of these operators permit linear operations on stochastic systems, including weighted combinations. This allows us to bring the full force of linear mathematics to bear on problems of the design and simulation of distributed stochastic control systems.

In many cases, the form of the stochastic differential operator can be directly inferred from the problem description, even when the operator is not the result of a second-order Fokker-Planck equation. The effects of quantization or change of variables can be directly modeled, and time-varying expected costs associated with state-dependent cost functions can be calculated. An important topic for further research will be to derive methods for optimal control of systems described by stochastic differential operators. The consequence for control is that a time-varying cost function takes the place of the reference trajectory, and the probability density of state takes the place of the state itself.

The use of probability densities as the fundamental objects can be considered to be a projection of state-space into a high-dimensional space (infinite-dimensional for continuous state space). Stochastic differential operators are linear operators on this projection space, and linear operators in the high-dimensional space can implement nonlinear operators in the original low-dimensional space. Therefore, the use of these operators is quite general and can implement a large class of nonlinear differential operators, including differential equations of the form *dx* = [*f*(*x*) + *g*(*x*)*u*] *dt* for differentiable nonlinear functions *f* and *g*. This allows the use of techniques from linear algebra (including additive superposition and linear projections).

The numerical effort required to simulate such systems is considerably greater than for deterministic control systems, because the entire joint probability density must be propagated forward in time, rather than just the most likely values of variables, or the mean and variance of gaussian-distributed variables (as is typically done in the Kalman filter or linear quadratic regulator approach). Although current computing power is adequate for low- dimensional systems, for high-dimensional systems, simulation may become rapidly intractable. An important topic for future research will be to find low-dimensional projections or other approximate or heuristic algorithms for integrating high-dimensional stochastic systems. Galerkin methods provide one possible approach using a finite-dimensional approximation to the probability density at each point in time (e.g., Chakravorty, 2006; Knezevic & Süli, 2008). It will be particularly important to seek parallel algorithms that can implement prediction or optimization in order to determine whether this type of computation could be performed explicitly by biological systems. Until such algorithms are available, this framework should be considered a description of the behavior of complex systems, but not necessarily a description of how that behavior is implemented.

One of the more interesting consequences of this approach is a change in the interpretation of control and observation. In this formalism, control of a system involves changing its dynamics from the unforced dynamics *L*_{0} to a new dynamics *L*_{0} + *L*_{1}, where *L*_{1} is implemented by a controller acting in parallel with the plant (Todorov, 2009). Similarly, observation involves estimation of the change in a hidden parameter α by comparison of the system *L*_{0} + α*L*_{1} to the base system *L*_{0}. Therefore, the observed or controlled quantities are not the output of the system, but rather the system dynamics themselves. The effect of a controller (or of a change in a hidden parameter) is the difference between the dynamics of the full system with and without the presence of the controller. This is a new conceptualization that may have significant power for the analysis of behavioral effects of individual neurons or neural circuits. The method proposed here provides a consistent framework for understanding the time-varying stochastic behavior of differential and discrete-time distributed systems, and it may provide insight that will permit further study of how cooperating groups of individual neurons or neural subsystems give rise to global dynamic behavior.

## Acknowledgments

I thank Nick Bambos and Evangelos Theodorou for helpful conversations and comments on earlier manuscripts. This work was supported by the Don and Linda Carter Foundation, the Crowley Carter Foundation, and the James S. McDonnell Foundation.