Skip to Main Content
Table 2: 
Computational efficiency of the main competing models and their performance on five NLP benchmarks. Time is the overall training time in Days x Cards format. Batch is the maximal batch size per card. Params is the number of trainable parameters in millions. Due to the small test sizes for NER and SST-5, we report mean and standard deviation across three runs. Our approach (ELMo-C) exhibits better computational efficiency and shows comparable performance compared with ELMo, ELMo-A, and ELMo-Sub.
ELMoOrgBaseFastTextccELMoELMo-AELMo-SubELMo-C
Time − − − 14 x 3 5.7 x 4 3.9 x 4 2.5 x 4 
Batch − − − 128 256 320 768 
Params − − − 499M 196M 92M 76M 
 
SNLI 88.7 88.0 87.7 88.5 88.9 87.1 88.8 
Coref NA NA 68.90 72.9 72.9 72.4 72.9 
SST-5 54.7 51.4 51.30 ± 0.77 52.96 ± 2.26 53.58 ± 0.77 53.02 ± 2.08 53.80 ± 0.73 
NER 92.22 90.15 90.97 ± 0.43 92.51 ± 0.28 92.28 ± 0.20 92.17 ± 0.56 92.24 ± 0.10 
SRL 84.6 81.4 80.2 83.4 82.7 82.4 82.4 
ELMoOrgBaseFastTextccELMoELMo-AELMo-SubELMo-C
Time − − − 14 x 3 5.7 x 4 3.9 x 4 2.5 x 4 
Batch − − − 128 256 320 768 
Params − − − 499M 196M 92M 76M 
 
SNLI 88.7 88.0 87.7 88.5 88.9 87.1 88.8 
Coref NA NA 68.90 72.9 72.9 72.4 72.9 
SST-5 54.7 51.4 51.30 ± 0.77 52.96 ± 2.26 53.58 ± 0.77 53.02 ± 2.08 53.80 ± 0.73 
NER 92.22 90.15 90.97 ± 0.43 92.51 ± 0.28 92.28 ± 0.20 92.17 ± 0.56 92.24 ± 0.10 
SRL 84.6 81.4 80.2 83.4 82.7 82.4 82.4 
Close Modal

or Create an Account

Close Modal
Close Modal