List of the embedding models used for the study, together with their hyperparameter settings.
Model . | Hyperparameters . |
---|---|
PPMI.w2 | 345K window-selected context words, window of width 2 weighted with Positive Pointwise Mutual Information (PPMI) reduced with Singular Value Decomposition (SVD) subsampling method from Mikolov et al. (2013). |
PPMI.synf | 345K syntactically filtered context words weighted with Positive Pointwise Mutual Information (PPMI) reduced with Singular Value Decomposition (SVD) subsampling method from Mikolov et al. (2013). |
PPMI.synt | 345K syntactically typed context words weighted with Positive Pointwise Mutual Information (PPMI) reduced with Singular Value Decomposition (SVD) subsampling method from Mikolov et al. (2013). |
GloVe | Window of width 2 subsampling method from Mikolov et al. (2013). |
SGNS.w2 | Skip-gram with negative sampling window of width 2, 15 negative examples trained with the word2vec library (Mikolov et al. 2013). |
SGNS.synf | Skip-gram with negative sampling syntactically-filtered context words, 15 negative examples trained with the word2vecf library (Levy and Goldberg 2014). |
SGNS.synt | Skip-gram with negative sampling syntactically-typed context words, 15 negative examples trained with the word2vecf library (Levy and Goldberg 2014). |
FastText | Skip-gram with negative sampling and subword information window of width 2, 15 negative examples trained with the fasttext library (Bojanowski et al. 2017). |
ELMo | Pretrained ELMo embeddings (Peters et al. 2018), available at https://allennlp.org/elmo, original model trained on the 1 Billion Word Benchmark (Chelba et al. 2013). |
BERT | Pretrained BERT-Large embeddings (Devlin et al. 2019) available at https://github.com/google-research/bert model trained on the concatenation of the Books corpus (Zhu et al. 2015) and the English Wikipedia. |
Model . | Hyperparameters . |
---|---|
PPMI.w2 | 345K window-selected context words, window of width 2 weighted with Positive Pointwise Mutual Information (PPMI) reduced with Singular Value Decomposition (SVD) subsampling method from Mikolov et al. (2013). |
PPMI.synf | 345K syntactically filtered context words weighted with Positive Pointwise Mutual Information (PPMI) reduced with Singular Value Decomposition (SVD) subsampling method from Mikolov et al. (2013). |
PPMI.synt | 345K syntactically typed context words weighted with Positive Pointwise Mutual Information (PPMI) reduced with Singular Value Decomposition (SVD) subsampling method from Mikolov et al. (2013). |
GloVe | Window of width 2 subsampling method from Mikolov et al. (2013). |
SGNS.w2 | Skip-gram with negative sampling window of width 2, 15 negative examples trained with the word2vec library (Mikolov et al. 2013). |
SGNS.synf | Skip-gram with negative sampling syntactically-filtered context words, 15 negative examples trained with the word2vecf library (Levy and Goldberg 2014). |
SGNS.synt | Skip-gram with negative sampling syntactically-typed context words, 15 negative examples trained with the word2vecf library (Levy and Goldberg 2014). |
FastText | Skip-gram with negative sampling and subword information window of width 2, 15 negative examples trained with the fasttext library (Bojanowski et al. 2017). |
ELMo | Pretrained ELMo embeddings (Peters et al. 2018), available at https://allennlp.org/elmo, original model trained on the 1 Billion Word Benchmark (Chelba et al. 2013). |
BERT | Pretrained BERT-Large embeddings (Devlin et al. 2019) available at https://github.com/google-research/bert model trained on the concatenation of the Books corpus (Zhu et al. 2015) and the English Wikipedia. |