Table 3

Distribution of entity types across the TAB corpus, along with their corresponding identifier type (direct identifier, quasi-identifier, or no need to mask) and confidential status. The parentheses in the first column refers to the proportion of entities of that type, in percent. The parentheses in the three other columns refer to the percentage of entities within this entity type that are respectively labeled as direct identifier, quasi-identifier, or have a confidential status.

Entity type# mentions# direct# quasi# confidential
DATETIME 53,668 (34.6) 23 (0.04) 48,086 (89.6) 530 (0.99) 
ORG 40,695 (26.3) 20 (0.05) 12,880 (31.6) 866 (2.1) 
PERSON 24,322 (15.7) 4,182 (17.2) 15,839 (65.1) 413 (1.7) 
LOC 9,982 (6.4) 1 (0.01) 6,908 (69.2) 19 (0.2) 
DEM 8,683 (5.6) 1 (0.01) 4,166 (48.0) 2,278 (26.2) 
MISC 7,044 (4.5) 28 (0.4) 3,437 (48.8) 1,125 (16.0) 
CODE 6,471 (4.2) 2,484 (38.4) 3,558 (55.0) 18 (0.3) 
QUANTITY 4,141 (2.7) 0 (0.0) 3,370 (81.4) 87 (2.1) 
 
Total 155,006 (100.0) 6,739 (4.4) 98,244 (63.4) 5,336 (3.4) 
Entity type# mentions# direct# quasi# confidential
DATETIME 53,668 (34.6) 23 (0.04) 48,086 (89.6) 530 (0.99) 
ORG 40,695 (26.3) 20 (0.05) 12,880 (31.6) 866 (2.1) 
PERSON 24,322 (15.7) 4,182 (17.2) 15,839 (65.1) 413 (1.7) 
LOC 9,982 (6.4) 1 (0.01) 6,908 (69.2) 19 (0.2) 
DEM 8,683 (5.6) 1 (0.01) 4,166 (48.0) 2,278 (26.2) 
MISC 7,044 (4.5) 28 (0.4) 3,437 (48.8) 1,125 (16.0) 
CODE 6,471 (4.2) 2,484 (38.4) 3,558 (55.0) 18 (0.3) 
QUANTITY 4,141 (2.7) 0 (0.0) 3,370 (81.4) 87 (2.1) 
 
Total 155,006 (100.0) 6,739 (4.4) 98,244 (63.4) 5,336 (3.4) 
Close Modal

or Create an Account

Close Modal
Close Modal