Table 7: 

Average F1 scores for our automatic evaluation metrics, calculated for generated prompts compared to annotated prompts over all development sets in the rumour detection task.

BERTScoreROUGE-1ROUGE-2ROUGE-L
Dev F1 0.94 0.64 0.30 0.61 
BERTScoreROUGE-1ROUGE-2ROUGE-L
Dev F1 0.94 0.64 0.30 0.61 
Close Modal

or Create an Account

Close Modal
Close Modal