In this section, we analyze the impact of the encoder/decoder architecture on the generation quality of considered models. The generation quality experiment of section 5 is repeated on the fMNIST and MNIST data set, where the architecture and hyperparameters are adapted from Dupont (2018). From Table 5 and Figure 9, we see that the overall FID scores and generation quality have improved; however, the relative scores among the models did not change significantly.
Table 5:

FID Scores Computed on Randomly Generated 8000 Images When Trained with Architecture and Hyperparameters.

St-RKMVAE$β$-VAEFactorVAEInfoGAN
MNIST 24.63 (0.22) 36.11 (1.01) 42.81 (2.01) 35.48 (0.07) 45.74 (2.93)
fMNIST 61.44 (1.02) 73.47 (0.73) 75.21 (1.11) 69.73 (1.54) 84.11 (2.58)
St-RKMVAE$β$-VAEFactorVAEInfoGAN
MNIST 24.63 (0.22) 36.11 (1.01) 42.81 (2.01) 35.48 (0.07) 45.74 (2.93)
fMNIST 61.44 (1.02) 73.47 (0.73) 75.21 (1.11) 69.73 (1.54) 84.11 (2.58)

Notes: Lower is better with standard deviations. Adapted from Dupont (2018).

Table 6:

Computing the Diagonalization Scores (see Figure 3).

ModelsdSprites3DShapes3D cars
St-RKM-sl ($σ=10-3$, $U★$0.17 (0.05) 0.23 (0.03) 0.21 (0.04)
St-RKM ($σ=10-3$, $U★$0.26 (0.05) 0.30 (0.10) 0.31 (0.09)
St-RKM ($σ=10-3$, random $U$0.61 (0.02) 0.72 (0.01) 0.69 (0.03)
ModelsdSprites3DShapes3D cars
St-RKM-sl ($σ=10-3$, $U★$0.17 (0.05) 0.23 (0.03) 0.21 (0.04)
St-RKM ($σ=10-3$, $U★$0.26 (0.05) 0.30 (0.10) 0.31 (0.09)
St-RKM ($σ=10-3$, random $U$0.61 (0.02) 0.72 (0.01) 0.69 (0.03)

Notes: Denote $M=1|C|∑i∈CU★⊤∇ψ(yi)∇ψ(yi)⊤U★,withyi=PUϕθ(xi)$ (cf. equation 3.6). Then we compute the score as $M-diag(M)F/MF$, where $diag:Rm×m↦Rm×m$ sets the off-diagonal elements of matrix to zero. The scores are computed for each model over 10 random seeds and show the mean (standard deviation). Lower scores indicate better diagonalization.

Figure 8:

Samples of randomly generated batch of images used to compute FID scores and SWD scores (see Figure 4).

Figure 8:

Samples of randomly generated batch of images used to compute FID scores and SWD scores (see Figure 4).

Close modal
Figure 9:

Samples of randomly generated images used to compute the FID scores. See Table 5.

Figure 9:

Samples of randomly generated images used to compute the FID scores. See Table 5.

Close modal
Figure 10:

(a) Loss evolution ($log$ plot) during the training of equation A.2 over 1000 epochs with $ɛ=10-5$ once with Cayley ADAM optimizer (green curve) and then without (blue curve). (b) Traversals along the principal components when the model was trained with a fixed $U$, that is, with the objective given by equation A.2 and $ɛ=10-5$. There is no clear isolation of a feature along any of the principal components, indicating further that optimizing over $U$ is key to better disentanglement.

Figure 10:

(a) Loss evolution ($log$ plot) during the training of equation A.2 over 1000 epochs with $ɛ=10-5$ once with Cayley ADAM optimizer (green curve) and then without (blue curve). (b) Traversals along the principal components when the model was trained with a fixed $U$, that is, with the objective given by equation A.2 and $ɛ=10-5$. There is no clear isolation of a feature along any of the principal components, indicating further that optimizing over $U$ is key to better disentanglement.

Close modal
Close Modal