Skip to Main Content
Table 4: 
Extrinsic evaluation results for baselines and LexSub. Setup 1 refers to the experiments without extrinsic model confounds such as character embeddings and further fine-tuning of the input embeddings. Setup 2 refers to the experiments in the original AllenNLP setting where the model for QQP, SQuAD, and, NER contains additional trainable character embeddings in the input layer, and the original NER model further fine-tunes the input embeddings. In both the setups, we see that LexSub outperforms the baselines on most of the extrinsic tasks. We hypothesize the relatively poor performance of LexSub compared to Vanilla on NER might be due to the task-specific fine-tuning of the embeddings.
ModelsNER(F1)SST-2(Acc)SNLI(Acc)SQuAD(EM)QQP(Acc)
Experiments with Setup 1 
Vanilla 87.88 87.31 85.00 64.23 87.08 
 
Retrofitting 86.16 88.58 84.68 64.01 87.01 
Counterfitting 80.09 86.77 84.99 62.86 87.10 
LEAR 83.20 88.08 83.74 63.10 86.06 
 
LexSub 88.06 88.91 85.00 64.65 87.31 
 
Experiments with Setup 2 
Vanilla 89.83 87.31 85.00 66.62 88.45 
 
Retrofitting 85.56 88.58 84.68 66.21 88.54 
Counterfitting 84.44 86.77 84.99 66.51 88.44 
LEAR 85.47 88.08 83.74 65.71 87.67 
 
LexSub 89.76 88.91 85.00 66.94 88.69 
 
State of the Art 93.50 95.60 91.60 88.95 90.10 
ModelsNER(F1)SST-2(Acc)SNLI(Acc)SQuAD(EM)QQP(Acc)
Experiments with Setup 1 
Vanilla 87.88 87.31 85.00 64.23 87.08 
 
Retrofitting 86.16 88.58 84.68 64.01 87.01 
Counterfitting 80.09 86.77 84.99 62.86 87.10 
LEAR 83.20 88.08 83.74 63.10 86.06 
 
LexSub 88.06 88.91 85.00 64.65 87.31 
 
Experiments with Setup 2 
Vanilla 89.83 87.31 85.00 66.62 88.45 
 
Retrofitting 85.56 88.58 84.68 66.21 88.54 
Counterfitting 84.44 86.77 84.99 66.51 88.44 
LEAR 85.47 88.08 83.74 65.71 87.67 
 
LexSub 89.76 88.91 85.00 66.94 88.69 
 
State of the Art 93.50 95.60 91.60 88.95 90.10 
Close Modal

or Create an Account

Close Modal
Close Modal