Model performance on the generative tasks in terms of automatic metrics. R-L denotes Rouge-L and B-S denotes BERT-Score.
Method . | Model . | Idiom . | Simile . | ||
---|---|---|---|---|---|
R-L . | B-S . | R-L . | B-S . | ||
Zero-shot | GPT2-XL | 6.2 | 40.2 | 17.0 | 47.7 |
GPT3 | 8.2 | 33.6 | 13.9 | 40.2 | |
Few-shot | GPT3 | 12.8 | 51.2 | 23.1 | 56.1 |
Supervised | GPT2-XL | 15.9 | 54.2 | 26.2 | 59.0 |
T5-large | 12.9 | 51.0 | 22.9 | 54.9 | |
BART-large | 12.4 | 48.8 | 26.7 | 58.4 | |
Knowledge | Context | 15.4 | 52.6 | 20.5 | 55.1 |
Enhanced | Literal | 13.6 | 51.4 | 28.9 | 59.1 |
Method . | Model . | Idiom . | Simile . | ||
---|---|---|---|---|---|
R-L . | B-S . | R-L . | B-S . | ||
Zero-shot | GPT2-XL | 6.2 | 40.2 | 17.0 | 47.7 |
GPT3 | 8.2 | 33.6 | 13.9 | 40.2 | |
Few-shot | GPT3 | 12.8 | 51.2 | 23.1 | 56.1 |
Supervised | GPT2-XL | 15.9 | 54.2 | 26.2 | 59.0 |
T5-large | 12.9 | 51.0 | 22.9 | 54.9 | |
BART-large | 12.4 | 48.8 | 26.7 | 58.4 | |
Knowledge | Context | 15.4 | 52.6 | 20.5 | 55.1 |
Enhanced | Literal | 13.6 | 51.4 | 28.9 | 59.1 |