In this section, we formulate the same computation in terms of variational Bayesian inference and neural networks to demonstrate their correspondence. We first derive the form of a variational free energy cost function under a specific generative model, a Markov decision process.1 We present the derivations carefully, with a focus on the form of the ensuing Bayesian belief updating. The functional form of this update will reemerge later, when reverse engineering the cost functions implicit in neural networks. These correspondences are depicted in Figure 1 and Table 1. This section starts with a description of Markov decision processes as a general kind of generative model and then considers the minimization of variational free energy under these models.
Figure 1:

Comparison between an MDP scheme and a neural network. (A) MDP scheme expressed as a Forney factor graph (Forney, 2001; Dauwels, 2007) based on the formulation in Friston, Parr et al. (2017). In this BSS setup, the prior $D$ determines hidden states $st$, while $st$ determines observation $ot$ through the likelihood mapping $A$. Inference corresponds to the inversion of this generative process. Here, $D*$ indicates the true prior, while $D$ indicates the prior under which the network operates. If $D=D*$, the inference is optimal; otherwise, it is biased. (B) Neural network comprising a singlelayer feedforward network with a sigmoid activation function. The network receives sensory inputs $ot=(ot(1),…,otNo)T$ that are generated from hidden states $st=(st(1),…,stNs)T$ and outputs neural activities $xt=(xt1,…,xtNx)T$. Here, $xtj$ should encode the posterior expectation about a binary state $st(j)$. In an analogy with the cocktail party effect, $st$ and $ot$ correspond to individual speakers and auditory inputs, respectively.

Figure 1:

Comparison between an MDP scheme and a neural network. (A) MDP scheme expressed as a Forney factor graph (Forney, 2001; Dauwels, 2007) based on the formulation in Friston, Parr et al. (2017). In this BSS setup, the prior $D$ determines hidden states $st$, while $st$ determines observation $ot$ through the likelihood mapping $A$. Inference corresponds to the inversion of this generative process. Here, $D*$ indicates the true prior, while $D$ indicates the prior under which the network operates. If $D=D*$, the inference is optimal; otherwise, it is biased. (B) Neural network comprising a singlelayer feedforward network with a sigmoid activation function. The network receives sensory inputs $ot=(ot(1),…,otNo)T$ that are generated from hidden states $st=(st(1),…,stNs)T$ and outputs neural activities $xt=(xt1,…,xtNx)T$. Here, $xtj$ should encode the posterior expectation about a binary state $st(j)$. In an analogy with the cocktail party effect, $st$ and $ot$ correspond to individual speakers and auditory inputs, respectively.

Close modal
Table 1:
Correspondence of Variables and Functions.
Neural Network FormationVariational Bayes Formation
Neural activity $xtj$$⟺$$st1(j)$ State posterior
Sensory inputs $ot$$⟺$$ot$ Observations
Synaptic strengths $Wj1$$⟺$$sig-1A11(·,j)$
$W^j1≡sig(Wj1)$$⟺$$A11(·,j)$ Parameter posterior
Perturbation term $ϕj1$$⟺$$lnD1(j)$ State prior
Threshold $hj1$$⟺$$ln1→-A11(·,j)·1→+lnD1(j)$
Initial synaptic strengths $λj1⊙W^j1init$$⟺$$a11(·,j)$ Parameter prior
Neural Network FormationVariational Bayes Formation
Neural activity $xtj$$⟺$$st1(j)$ State posterior
Sensory inputs $ot$$⟺$$ot$ Observations
Synaptic strengths $Wj1$$⟺$$sig-1A11(·,j)$
$W^j1≡sig(Wj1)$$⟺$$A11(·,j)$ Parameter posterior
Perturbation term $ϕj1$$⟺$$lnD1(j)$ State prior
Threshold $hj1$$⟺$$ln1→-A11(·,j)·1→+lnD1(j)$
Initial synaptic strengths $λj1⊙W^j1init$$⟺$$a11(·,j)$ Parameter prior
Close Modal