Progress on classification results over approximately 1 year, evaluated on a fixed holdout set of 9,708 examples. In parallel with these various iterations on the classification algorithms, the training data was raised from 30,665 (initial evaluation with BidGRU) to 38,925 examples (last evaluation with SciBERT) via an active learning approach.
Approach . | F-score . | ||
---|---|---|---|
Contrasting . | Supporting . | Mentioning . | |
BidGRU | .206 | .554 | .964 |
BidGRU + metaclassifier | .260 | .590 | .964 |
BidGRU + ELMo | .405 | .590 | .969 |
BidGRU + ELMo + ensemble (10 classifiers) | .460 | .605 | .972 |
SciBERT | .590 | .648 | .973 |
Observed distribution | 0.8% | 6.5% | 92.6% |
Approach . | F-score . | ||
---|---|---|---|
Contrasting . | Supporting . | Mentioning . | |
BidGRU | .206 | .554 | .964 |
BidGRU + metaclassifier | .260 | .590 | .964 |
BidGRU + ELMo | .405 | .590 | .969 |
BidGRU + ELMo + ensemble (10 classifiers) | .460 | .605 | .972 |
SciBERT | .590 | .648 | .973 |
Observed distribution | 0.8% | 6.5% | 92.6% |