Skip to Main Content
Table 6 
Human evaluation using 50 samples from the WikiTablePara data set. The fluency, adequacy, and coherence scores are averaged across evaluators and instances. Evaluator correlation is the Pearson correlation, which shows the agreement between evaluators.
SystemFluencyAdequacyCoherence
WikiBioModel 1.44 1.24 1.08 
WebNLGModel 2.04 2.05 1.66 
Proposed 3.29 4.20 3.72 
 
GOLD-standard 4.53 4.78 4.59 
 
Evaluator Correlation 0.74 0.80 0.76 
SystemFluencyAdequacyCoherence
WikiBioModel 1.44 1.24 1.08 
WebNLGModel 2.04 2.05 1.66 
Proposed 3.29 4.20 3.72 
 
GOLD-standard 4.53 4.78 4.59 
 
Evaluator Correlation 0.74 0.80 0.76 
Close Modal

or Create an Account

Close Modal
Close Modal