. | Grammaticality . | Logicality . | ||||||
---|---|---|---|---|---|---|---|---|
Models . | Win (%) . | Lose (%) . | Tie (%) . | κ . | Win (%) . | Lose (%) . | Tie (%) . | κ . |
Ours vs. Fusion | 50.0** | 27.0 | 23.0 | 0.421 | 57.0** | 28.0 | 15.0 | 0.455 |
Ours vs. DSRL | 58.0** | 24.0 | 18.0 | 0.441 | 58.0** | 29.0 | 12.0 | 0.475 |
Ours vs. GPT-2 (Scratch) | 54.0** | 24.5 | 21.5 | 0.385 | 54.0** | 26.0 | 20.0 | 0.304 |
Ours vs. GPT-2 (Pretrain) | 52.0** | 31.5 | 16.5 | 0.483 | 56.5** | 32.5 | 11.0 | 0.493 |
Ours vs. GPT-2 (Fine-tune) | 42.0** | 28.0 | 30.0 | 0.344 | 51.0** | 27.5 | 21.5 | 0.371 |
Ours vs. Ours w/o Pretrain | 51.0** | 31.0 | 18.0 | 0.378 | 56.0** | 28.0 | 16.0 | 0.375 |
Ours vs. Ours w/o Knowledge | 46.0** | 23.0 | 21.0 | 0.289 | 48.0** | 29.0 | 23.0 | 0.314 |
Ours vs. Ours w/o Multi-task | 37.5 | 31.0 | 31.5 | 0.313 | 48.5** | 25.5 | 26.0 | 0.297 |
. | Grammaticality . | Logicality . | ||||||
---|---|---|---|---|---|---|---|---|
Models . | Win (%) . | Lose (%) . | Tie (%) . | κ . | Win (%) . | Lose (%) . | Tie (%) . | κ . |
Ours vs. Fusion | 50.0** | 27.0 | 23.0 | 0.421 | 57.0** | 28.0 | 15.0 | 0.455 |
Ours vs. DSRL | 58.0** | 24.0 | 18.0 | 0.441 | 58.0** | 29.0 | 12.0 | 0.475 |
Ours vs. GPT-2 (Scratch) | 54.0** | 24.5 | 21.5 | 0.385 | 54.0** | 26.0 | 20.0 | 0.304 |
Ours vs. GPT-2 (Pretrain) | 52.0** | 31.5 | 16.5 | 0.483 | 56.5** | 32.5 | 11.0 | 0.493 |
Ours vs. GPT-2 (Fine-tune) | 42.0** | 28.0 | 30.0 | 0.344 | 51.0** | 27.5 | 21.5 | 0.371 |
Ours vs. Ours w/o Pretrain | 51.0** | 31.0 | 18.0 | 0.378 | 56.0** | 28.0 | 16.0 | 0.375 |
Ours vs. Ours w/o Knowledge | 46.0** | 23.0 | 21.0 | 0.289 | 48.0** | 29.0 | 23.0 | 0.314 |
Ours vs. Ours w/o Multi-task | 37.5 | 31.0 | 31.5 | 0.313 | 48.5** | 25.5 | 26.0 | 0.297 |