Skip to Main Content
Table 9.

Approach to training data by corpus: count (proportion). Totals may not equal 100% due to rounding.

 Life Sciences & BiomedicalPhysical & Environmental SciencesSocial Sciences & HumanitiesAll corpora
Original human-labeled data 12 (26.7%) 13 (25.0%) 13 (29.5%) 38 (26.95%) 
External human-labeled data 20 (44.4%) 20 (38.5%) 18 (40.9%) 58 (41.1%) 
Machine-labeled data 12 (26.7%) 15 (28.8%) 11 (25.0%) 38 (26.95%) 
Unsure 1 (2.2%) 4 (7.7%) 2 (4.5%) 7 (5.0%) 
Subtotal: ML classifier papers 45 (100%) 52 (100%) 34 (100%) 141 (100%) 
(No ML classifier/NA) 15 18 26 59 
Grand total 60 70 60 200 
 Life Sciences & BiomedicalPhysical & Environmental SciencesSocial Sciences & HumanitiesAll corpora
Original human-labeled data 12 (26.7%) 13 (25.0%) 13 (29.5%) 38 (26.95%) 
External human-labeled data 20 (44.4%) 20 (38.5%) 18 (40.9%) 58 (41.1%) 
Machine-labeled data 12 (26.7%) 15 (28.8%) 11 (25.0%) 38 (26.95%) 
Unsure 1 (2.2%) 4 (7.7%) 2 (4.5%) 7 (5.0%) 
Subtotal: ML classifier papers 45 (100%) 52 (100%) 34 (100%) 141 (100%) 
(No ML classifier/NA) 15 18 26 59 
Grand total 60 70 60 200 
Close Modal

or Create an Account

Close Modal
Close Modal