Skip to Main Content
Table 2: 

Performance of baseline and I-VILA models on the scientific document extraction task. I-VILA provides consistent accuracy improvements over the baseline LayoutLM model on all three benchmark datasets.

GROTOAP2DocBankS2-VL1
Macro F1 H(G)Macro F1 H(G)Macro F1 H(G)
LayoutLMBASE (Xu et al., 2020) 92.34 0.78 91.06 2.64 82.69(6.04) 4.19(0.25) 
LayoutLMBASE + Sentence Breaks 91.83 0.78 91.44 2.62 82.81(5.21) 4.21(0.55) 
 
LayoutLMBASE + I-VILA(Text Line) 92.37 0.73 92.79 2.17 83.77(5.75)2 3.28(0.35) 
LayoutLMBASE + I-VILA(Text Block) 93.38 0.53 92.00 2.10 83.44(6.48) 2.83(0.34) 
GROTOAP2DocBankS2-VL1
Macro F1 H(G)Macro F1 H(G)Macro F1 H(G)
LayoutLMBASE (Xu et al., 2020) 92.34 0.78 91.06 2.64 82.69(6.04) 4.19(0.25) 
LayoutLMBASE + Sentence Breaks 91.83 0.78 91.44 2.62 82.81(5.21) 4.21(0.55) 
 
LayoutLMBASE + I-VILA(Text Line) 92.37 0.73 92.79 2.17 83.77(5.75)2 3.28(0.35) 
LayoutLMBASE + I-VILA(Text Block) 93.38 0.53 92.00 2.10 83.44(6.48) 2.83(0.34) 
1

For S2-VL, we show averaged scores with standard deviation in parentheses across the 5-fold cross validation subsets.

2

In this table, we report S2-VL results using VILA structures detected by visual layout models. When the ground-truth VILA structures are available, both I-VILA and H-VILA models can achieve better accuracy, shown in Table 6.

Close Modal

or Create an Account

Close Modal
Close Modal