. | training objective . | corpus (#words) . | output dimension . | basic unit . |
---|---|---|---|---|
word embeddings | ||||
word2vec | Predicting surrounding words | Google News (100B) | 300 | word |
GloVe | Predicting co-occurrence probability | Wikipedia + Gigaword 5 (6B) | 300 | word |
fastText | Predicting surrounding words | Wikipedia + UMBC + statmt.org (16B) | 300 | subword |
contextualized word embeddings | ||||
ELMo | Language model | 1B Word Benchmark (1B) | 1024 | character |
OpenAI GPT | Language model | BooksCorpus (800M) | 768 | subword |
BERT | Masked language model (Cloze) | BooksCorpus + Wikipedia (3.3B) | 768 | subword |
. | training objective . | corpus (#words) . | output dimension . | basic unit . |
---|---|---|---|---|
word embeddings | ||||
word2vec | Predicting surrounding words | Google News (100B) | 300 | word |
GloVe | Predicting co-occurrence probability | Wikipedia + Gigaword 5 (6B) | 300 | word |
fastText | Predicting surrounding words | Wikipedia + UMBC + statmt.org (16B) | 300 | subword |
contextualized word embeddings | ||||
ELMo | Language model | 1B Word Benchmark (1B) | 1024 | character |
OpenAI GPT | Language model | BooksCorpus (800M) | 768 | subword |
BERT | Masked language model (Cloze) | BooksCorpus + Wikipedia (3.3B) | 768 | subword |