Human fluency, relevance, and faithfulness scores of abstractiveness-controlled models on Newsroom-b. The Krippendorf’s α inter-rater agreement for these metrics are 0.51, 0.37, and 0.40.
Bin . | Method . | Flu. . | Rel. . | Faithful. . |
---|---|---|---|---|
1 | PG+CMDP | 4.79 | 3.43 | 98% |
D.GPT2+CMDP | 4.75 | 3.34 | 96% | |
2 | PG+CMDP | 4.52 | 2.34 | 58% |
D.GPT2+CMDP | 4.57 | 3.14 | 66% | |
3 | PG+CMDP | 4.47 | 2.00 | 52% |
D.GPT2+CMDP | 4.60 | 2.99 | 66% |
Bin . | Method . | Flu. . | Rel. . | Faithful. . |
---|---|---|---|---|
1 | PG+CMDP | 4.79 | 3.43 | 98% |
D.GPT2+CMDP | 4.75 | 3.34 | 96% | |
2 | PG+CMDP | 4.52 | 2.34 | 58% |
D.GPT2+CMDP | 4.57 | 3.14 | 66% | |
3 | PG+CMDP | 4.47 | 2.00 | 52% |
D.GPT2+CMDP | 4.60 | 2.99 | 66% |