Table 4: 

Model performance on the generative tasks in terms of automatic metrics. R-L denotes Rouge-L and B-S denotes BERT-Score.

MethodModelIdiomSimile
R-LB-SR-LB-S
Zero-shot GPT2-XL 6.2 40.2 17.0 47.7 
GPT3 8.2 33.6 13.9 40.2 
Few-shot GPT3 12.8 51.2 23.1 56.1 
Supervised GPT2-XL 15.9 54.2 26.2 59.0 
T5-large 12.9 51.0 22.9 54.9 
BART-large 12.4 48.8 26.7 58.4 
Knowledge Context 15.4 52.6 20.5 55.1 
Enhanced Literal 13.6 51.4 28.9 59.1 
MethodModelIdiomSimile
R-LB-SR-LB-S
Zero-shot GPT2-XL 6.2 40.2 17.0 47.7 
GPT3 8.2 33.6 13.9 40.2 
Few-shot GPT3 12.8 51.2 23.1 56.1 
Supervised GPT2-XL 15.9 54.2 26.2 59.0 
T5-large 12.9 51.0 22.9 54.9 
BART-large 12.4 48.8 26.7 58.4 
Knowledge Context 15.4 52.6 20.5 55.1 
Enhanced Literal 13.6 51.4 28.9 59.1 
Close Modal

or Create an Account

Close Modal
Close Modal