Average F1 scores for our automatic evaluation metrics, calculated for generated prompts compared to annotated prompts over all development sets in the rumour detection task.
Sign In or Create an Account