Skip to Main Content
Table 1: 

Details for the three datasets in the S2-VLUE benchmark.

GROTOAP2DocBankS2-VL
Train / Dev / Test Pages 83k / 18k / 18k 398k / 50k / 50k 1.3k1 
Annotation Method Automatic Automatic Human Annotation 
Scientific Discipline Life Science Math / Physics / CS 19 Disciplines 
Visual Layout Group PDF parsing Vision model Gold Label / Detection methods 
Number of Categories 22 12 15 
 
Average Token Count2 1203 (591) 838 (503) 790 (453) 
Average Text Line Count 90 (51) 60 (34) 64 (54) 
Average Text Block Count 12 (16) 15 (8) 22 (36) 
GROTOAP2DocBankS2-VL
Train / Dev / Test Pages 83k / 18k / 18k 398k / 50k / 50k 1.3k1 
Annotation Method Automatic Automatic Human Annotation 
Scientific Discipline Life Science Math / Physics / CS 19 Disciplines 
Visual Layout Group PDF parsing Vision model Gold Label / Detection methods 
Number of Categories 22 12 15 
 
Average Token Count2 1203 (591) 838 (503) 790 (453) 
Average Text Line Count 90 (51) 60 (34) 64 (54) 
Average Text Block Count 12 (16) 15 (8) 22 (36) 
1

This is the total number of pages in the S2-VL dataset; we use 5-fold cross-validation for training and testing.

2

We report the average token, text line, and text block count per page, with standard deviations in parentheses.

Close Modal

or Create an Account

Close Modal
Close Modal