Skip to Main Content
In this section, we formulate the same computation in terms of variational Bayesian inference and neural networks to demonstrate their correspondence. We first derive the form of a variational free energy cost function under a specific generative model, a Markov decision process.1 We present the derivations carefully, with a focus on the form of the ensuing Bayesian belief updating. The functional form of this update will reemerge later, when reverse engineering the cost functions implicit in neural networks. These correspondences are depicted in Figure 1 and Table 1. This section starts with a description of Markov decision processes as a general kind of generative model and then considers the minimization of variational free energy under these models.
Figure 1:

Comparison between an MDP scheme and a neural network. (A) MDP scheme expressed as a Forney factor graph (Forney, 2001; Dauwels, 2007) based on the formulation in Friston, Parr et al. (2017). In this BSS setup, the prior D determines hidden states st, while st determines observation ot through the likelihood mapping A. Inference corresponds to the inversion of this generative process. Here, D* indicates the true prior, while D indicates the prior under which the network operates. If D=D*, the inference is optimal; otherwise, it is biased. (B) Neural network comprising a singlelayer feedforward network with a sigmoid activation function. The network receives sensory inputs ot=(ot(1),,otNo)T that are generated from hidden states st=(st(1),,stNs)T and outputs neural activities xt=(xt1,,xtNx)T. Here, xtj should encode the posterior expectation about a binary state st(j). In an analogy with the cocktail party effect, st and ot correspond to individual speakers and auditory inputs, respectively.

Figure 1:

Comparison between an MDP scheme and a neural network. (A) MDP scheme expressed as a Forney factor graph (Forney, 2001; Dauwels, 2007) based on the formulation in Friston, Parr et al. (2017). In this BSS setup, the prior D determines hidden states st, while st determines observation ot through the likelihood mapping A. Inference corresponds to the inversion of this generative process. Here, D* indicates the true prior, while D indicates the prior under which the network operates. If D=D*, the inference is optimal; otherwise, it is biased. (B) Neural network comprising a singlelayer feedforward network with a sigmoid activation function. The network receives sensory inputs ot=(ot(1),,otNo)T that are generated from hidden states st=(st(1),,stNs)T and outputs neural activities xt=(xt1,,xtNx)T. Here, xtj should encode the posterior expectation about a binary state st(j). In an analogy with the cocktail party effect, st and ot correspond to individual speakers and auditory inputs, respectively.

Close modal
Table 1:
Correspondence of Variables and Functions.
Neural Network FormationVariational Bayes Formation
Neural activity xtjst1(j) State posterior 
Sensory inputs otot Observations 
Synaptic strengths Wj1sig-1A11(·,j)  
 W^j1sig(Wj1)A11(·,j) Parameter posterior 
Perturbation term ϕj1lnD1(j) State prior 
Threshold hj1ln1-A11(·,j)·1+lnD1(j)  
Initial synaptic strengths λj1W^j1inita11(·,j) Parameter prior 
Neural Network FormationVariational Bayes Formation
Neural activity xtjst1(j) State posterior 
Sensory inputs otot Observations 
Synaptic strengths Wj1sig-1A11(·,j)  
 W^j1sig(Wj1)A11(·,j) Parameter posterior 
Perturbation term ϕj1lnD1(j) State prior 
Threshold hj1ln1-A11(·,j)·1+lnD1(j)  
Initial synaptic strengths λj1W^j1inita11(·,j) Parameter prior 
Close Modal

or Create an Account

Close Modal
Close Modal