Knowledge and consistency results for the baseline, BERT base, and our model. The results are averaged over the 25 test relations. Underlined: best performance overall, including ablations. Bold: Best performance for BERT-ft and the two baselines (BERT-base, majority).
Model . | Accuracy . | Consistency . | Consistent-Acc . |
---|---|---|---|
majority | 24.4±22.5 | 100.0±0.0 | 24.4±22.5 |
BERT-base | 45.6±27.6 | 58.2±23.9 | 27.3±24.8 |
BERT-ft | 47.4 ±27.3 | 64.0 ±22.9 | 33.2 ±27.0 |
-consistency | 46.9±27.6 | 60.9±22.6 | 30.9±26.3 |
-typed | 46.5±27.1 | 62.0±21.2 | 31.1±25.2 |
-MLM | 16.9±21.1 | 80.8 ±27.1 | 9.1±11.5 |
Model . | Accuracy . | Consistency . | Consistent-Acc . |
---|---|---|---|
majority | 24.4±22.5 | 100.0±0.0 | 24.4±22.5 |
BERT-base | 45.6±27.6 | 58.2±23.9 | 27.3±24.8 |
BERT-ft | 47.4 ±27.3 | 64.0 ±22.9 | 33.2 ±27.0 |
-consistency | 46.9±27.6 | 60.9±22.6 | 30.9±26.3 |
-typed | 46.5±27.1 | 62.0±21.2 | 31.1±25.2 |
-MLM | 16.9±21.1 | 80.8 ±27.1 | 9.1±11.5 |