Model . | Hyperparameters . |
---|---|
doclink | We set β to be a symmetric vector where each cell βi = 0.01 for all word types of all the languages, and use the MALLET implementation for training (McCallum 2002). To enable consistent comparison, we disable hyperparameter optimization provided in the package. |
c-bilda | Following the experiment results from Heyman, Vulic, and Moens (2016), we set χ = 2 to make the results more competitive to doclink. The rest of the settings are the same as for doclink. |
softlink | We use the document-wise thresholding approach for calculating the transfer distributions. The focus threshold is set to 0.8. The rest of the settings are the same as for doclink. |
voclink | We set the scalar β′ = 0.01 for hyperparameter β(r,ℓ) from the root to both internal nodes or leaves. For those from internal nodes to leaves, we set β′′ = 100, following the settings in Hu et al. (2014b). |
Model . | Hyperparameters . |
---|---|
doclink | We set β to be a symmetric vector where each cell βi = 0.01 for all word types of all the languages, and use the MALLET implementation for training (McCallum 2002). To enable consistent comparison, we disable hyperparameter optimization provided in the package. |
c-bilda | Following the experiment results from Heyman, Vulic, and Moens (2016), we set χ = 2 to make the results more competitive to doclink. The rest of the settings are the same as for doclink. |
softlink | We use the document-wise thresholding approach for calculating the transfer distributions. The focus threshold is set to 0.8. The rest of the settings are the same as for doclink. |
voclink | We set the scalar β′ = 0.01 for hyperparameter β(r,ℓ) from the root to both internal nodes or leaves. For those from internal nodes to leaves, we set β′′ = 100, following the settings in Hu et al. (2014b). |