Skip to Main Content
Table 2: 
Dataset statistics for the machine translation, code generation, and image captioning tasks. Length represents the average number of tokens for target sentences of the training set.
DatasetTrainDevTestLength
WMT16 Ro-En 620k 2000 2000 26.48 
WMT18 En-Tr 207k 3007 3000 25.81 
KFTT En-Ja 405k 1166 1160 27.51 
Django 16k 1000 1801 8.87 
MS-COCO 567k 5000 5000 12.52 
DatasetTrainDevTestLength
WMT16 Ro-En 620k 2000 2000 26.48 
WMT18 En-Tr 207k 3007 3000 25.81 
KFTT En-Ja 405k 1166 1160 27.51 
Django 16k 1000 1801 8.87 
MS-COCO 567k 5000 5000 12.52 
Close Modal

or Create an Account

Close Modal
Close Modal