Skip to Main Content
Table 4: 

Results of the deterministic baselines and neural models on the test set. We report three metrics: the precision, recall, and F1 of the overall relation predictions. The first row is an estimated human agreement on 10% of the data, and not over the entire test set, thus marked with an asterisk. Note that the first and second parts of the table are not directly comparable, since in the Deterministic results, the preposition labels is given by an oracle, whereas in the Pretrained results, it is predicted by the models.

ModelPrecisionRecallF1
 Human* 94.8 94.0 94.4 
 
Deterministic Title-First 25.6 4.1 7.1 
Title-Last 29.1 4.7 8.0 
Title-Random 27.1 4.3 7.4 
Adj-Forward 21.2 3.4 5.8 
Adj-Backward 31.6 5.1 8.7 
Surface 43.5 3.3 6.2 
Surface-Expand 14.4 37.8 20.8 
Combined 15.4 44.1 22.8 
Combined-Coref 16.4 54.7 25.2 
 
Pretrained Decoupled-static 10.1 58.8 17.2 
Decoupled-frozen-base 9.6 55.5 16.3 
Decoupled-frozen-large 9.7 56.2 16.5 
Decoupled-base 11.8 68.5 20.1 
Decoupled-large 12.0 69.9 20.5 
Coupled-static 59.6 14.4 23.2 
Coupled-frozen-base 60.1 8.6 15.1 
Coupled-frozen-large 58.4 11.5 19.2 
Coupled-base 60.4 41.5 49.2 
Coupled-large 65.8 43.5 52.4 
ModelPrecisionRecallF1
 Human* 94.8 94.0 94.4 
 
Deterministic Title-First 25.6 4.1 7.1 
Title-Last 29.1 4.7 8.0 
Title-Random 27.1 4.3 7.4 
Adj-Forward 21.2 3.4 5.8 
Adj-Backward 31.6 5.1 8.7 
Surface 43.5 3.3 6.2 
Surface-Expand 14.4 37.8 20.8 
Combined 15.4 44.1 22.8 
Combined-Coref 16.4 54.7 25.2 
 
Pretrained Decoupled-static 10.1 58.8 17.2 
Decoupled-frozen-base 9.6 55.5 16.3 
Decoupled-frozen-large 9.7 56.2 16.5 
Decoupled-base 11.8 68.5 20.1 
Decoupled-large 12.0 69.9 20.5 
Coupled-static 59.6 14.4 23.2 
Coupled-frozen-base 60.1 8.6 15.1 
Coupled-frozen-large 58.4 11.5 19.2 
Coupled-base 60.4 41.5 49.2 
Coupled-large 65.8 43.5 52.4 
Close Modal

or Create an Account

Close Modal
Close Modal