Skip to Main Content
Table 5: 

Percent of times that the generation from each of the models and human-written references was chosen as plausible (absolute) or preferred (comparative) by the majority of workers.

ModelAbsoluteComparative
IdiomSimileIdiomSimile
GPT2-XL 56 60 15 18.6 
+Context 68 68 45 16 
+Literal 48 76 13 46.7 
Human 80 88 − − 
All − − 12 
Neither − − 17 6.7 
ModelAbsoluteComparative
IdiomSimileIdiomSimile
GPT2-XL 56 60 15 18.6 
+Context 68 68 45 16 
+Literal 48 76 13 46.7 
Human 80 88 − − 
All − − 12 
Neither − − 17 6.7 
Close Modal

or Create an Account

Close Modal
Close Modal