Skip to Main Content
Table 2: 

Statistics on various subsets of thedataset.

TaskAttributeStatistic
Reading Comprehension # of instances 1300 
avg. question length (tokens) 6.3 
avg. paragraph length (tokens) 94.6 
avg. answer length (tokens) 7.6 
 
Multiple-Choice QA # of instances 2460 
% of ‘literature’ questions 834 
% of ‘common-knowledge’ questions 949 
% of ‘math & logic’ questions 677 
avg. # of candidates 4.0 
 
Sentiment Analysis # of instances 2423 
% of ‘food & beverages’ reviews 1917 
% of ‘movie’ reviews 506 
avg. length of reviews (words) 22.01 
# of annotated pairs of (aspect, sentiment) 2539 
 
Textual Entailment # of instances 2,700 
% of ‘natural’ instances 1,370 
% of ‘mnli’ instances 1,330 
avg. length of premises (tokens) 23.4 
avg. length of hypotheses (tokens) 11.8 
 
Question Paraphrasing # of instances 4,644 
% of ‘natural’ instances 2,521 
% of ‘qqp’ instances 2,123 
avg. length of Q1 (tokens) 10.7 
avg. length of Q2 (tokens) 11.0 
 
Machine Translation # of instances 47,745 
% of ‘QP’ subset 489 
% of ‘Quran’ subset 6,236 
% of ‘Bible’ subset 31,020 
% of ‘Mizan’ subset (eval. only) 10,000 
TaskAttributeStatistic
Reading Comprehension # of instances 1300 
avg. question length (tokens) 6.3 
avg. paragraph length (tokens) 94.6 
avg. answer length (tokens) 7.6 
 
Multiple-Choice QA # of instances 2460 
% of ‘literature’ questions 834 
% of ‘common-knowledge’ questions 949 
% of ‘math & logic’ questions 677 
avg. # of candidates 4.0 
 
Sentiment Analysis # of instances 2423 
% of ‘food & beverages’ reviews 1917 
% of ‘movie’ reviews 506 
avg. length of reviews (words) 22.01 
# of annotated pairs of (aspect, sentiment) 2539 
 
Textual Entailment # of instances 2,700 
% of ‘natural’ instances 1,370 
% of ‘mnli’ instances 1,330 
avg. length of premises (tokens) 23.4 
avg. length of hypotheses (tokens) 11.8 
 
Question Paraphrasing # of instances 4,644 
% of ‘natural’ instances 2,521 
% of ‘qqp’ instances 2,123 
avg. length of Q1 (tokens) 10.7 
avg. length of Q2 (tokens) 11.0 
 
Machine Translation # of instances 47,745 
% of ‘QP’ subset 489 
% of ‘Quran’ subset 6,236 
% of ‘Bible’ subset 31,020 
% of ‘Mizan’ subset (eval. only) 10,000 
Close Modal

or Create an Account

Close Modal
Close Modal