FID Scores Computed on Randomly Generated 8000 Images When Trained with Architecture and Hyperparameters.
. | St-RKM . | VAE . | -VAE . | FactorVAE . | InfoGAN . |
---|---|---|---|---|---|
MNIST | 24.63 (0.22) | 36.11 (1.01) | 42.81 (2.01) | 35.48 (0.07) | 45.74 (2.93) |
fMNIST | 61.44 (1.02) | 73.47 (0.73) | 75.21 (1.11) | 69.73 (1.54) | 84.11 (2.58) |
. | St-RKM . | VAE . | -VAE . | FactorVAE . | InfoGAN . |
---|---|---|---|---|---|
MNIST | 24.63 (0.22) | 36.11 (1.01) | 42.81 (2.01) | 35.48 (0.07) | 45.74 (2.93) |
fMNIST | 61.44 (1.02) | 73.47 (0.73) | 75.21 (1.11) | 69.73 (1.54) | 84.11 (2.58) |
Notes: Lower is better with standard deviations. Adapted from Dupont (2018).
Computing the Diagonalization Scores (see Figure 3).
Models . | dSprites . | 3DShapes . | 3D cars . |
---|---|---|---|
St-RKM-sl (, ) | 0.17 (0.05) | 0.23 (0.03) | 0.21 (0.04) |
St-RKM (, ) | 0.26 (0.05) | 0.30 (0.10) | 0.31 (0.09) |
St-RKM (, random ) | 0.61 (0.02) | 0.72 (0.01) | 0.69 (0.03) |
Models . | dSprites . | 3DShapes . | 3D cars . |
---|---|---|---|
St-RKM-sl (, ) | 0.17 (0.05) | 0.23 (0.03) | 0.21 (0.04) |
St-RKM (, ) | 0.26 (0.05) | 0.30 (0.10) | 0.31 (0.09) |
St-RKM (, random ) | 0.61 (0.02) | 0.72 (0.01) | 0.69 (0.03) |
Notes: Denote (cf. equation 3.6). Then we compute the score as , where sets the off-diagonal elements of matrix to zero. The scores are computed for each model over 10 random seeds and show the mean (standard deviation). Lower scores indicate better diagonalization.
Samples of randomly generated batch of images used to compute FID scores and SWD scores (see Figure 4).
Samples of randomly generated batch of images used to compute FID scores and SWD scores (see Figure 4).
Samples of randomly generated images used to compute the FID scores. See Table 5.
Samples of randomly generated images used to compute the FID scores. See Table 5.
(a) Loss evolution ( plot) during the training of equation A.2 over 1000 epochs with once with Cayley ADAM optimizer (green curve) and then without (blue curve). (b) Traversals along the principal components when the model was trained with a fixed , that is, with the objective given by equation A.2 and . There is no clear isolation of a feature along any of the principal components, indicating further that optimizing over is key to better disentanglement.
(a) Loss evolution ( plot) during the training of equation A.2 over 1000 epochs with once with Cayley ADAM optimizer (green curve) and then without (blue curve). (b) Traversals along the principal components when the model was trained with a fixed , that is, with the objective given by equation A.2 and . There is no clear isolation of a feature along any of the principal components, indicating further that optimizing over is key to better disentanglement.