Skip Nav Destination
Close Modal
Update search
NARROW
Format
Journal
Date
Availability
1-2 of 2
Daniel Khashabi
Close
Follow your search
Access your saved searches in your account
Would you like to receive an alert when new items match your search?
Sort by
Journal Articles
Publisher: Journals Gateway
Transactions of the Association for Computational Linguistics (2021) 9: 1147–1162.
Published: 27 October 2021
FIGURES
| View All (4)
Abstract
View article
PDF
Despite the progress made in recent years in addressing natural language understanding (NLU) challenges, the majority of this progress remains to be concentrated on resource-rich languages like English. This work focuses on Persian language, one of the widely spoken languages in the world, and yet there are few NLU datasets available for this language. The availability of high-quality evaluation datasets is a necessity for reliable assessment of the progress on different NLU tasks and domains. We introduce ParsiNLU , the first benchmark in Persian language that includes a range of language understanding tasks—reading comprehension, textual entailment, and so on. These datasets are collected in a multitude of ways, often involving manual annotations by native speakers. This results in over 14.5 k new instances across 6 distinct NLU tasks. Additionally, we present the first results on state-of-the-art monolingual and multilingual pre-trained language models on this benchmark and compare them with human performance, which provides valuable insights into our ability to tackle natural language understanding challenges in Persian. We hope ParsiNLU fosters further research and advances in Persian language understanding. 1
Journal Articles
Publisher: Journals Gateway
Transactions of the Association for Computational Linguistics (2021) 9: 346–361.
Published: 26 April 2021
FIGURES
| View All (5)
Abstract
View article
PDF
A key limitation in current datasets for multi-hop reasoning is that the required steps for answering the question are mentioned in it explicitly . In this work, we introduce S trategy QA, a question answering (QA) benchmark where the required reasoning steps are implicit in the question, and should be inferred using a strategy . A fundamental challenge in this setup is how to elicit such creative questions from crowdsourcing workers, while covering a broad range of potential strategies. We propose a data collection procedure that combines term-based priming to inspire annotators, careful control over the annotator population, and adversarial filtering for eliminating reasoning shortcuts. Moreover, we annotate each question with (1) a decomposition into reasoning steps for answering it, and (2) Wikipedia paragraphs that contain the answers to each step. Overall, S trategy QA includes 2,780 examples, each consisting of a strategy question, its decomposition, and evidence paragraphs. Analysis shows that questions in S trategy QA are short, topic-diverse, and cover a wide range of strategies. Empirically, we show that humans perform well (87%) on this task, while our best baseline reaches an accuracy of ∼ 66 % .